copy-from performance with lots of chunks

We have a node process that generates a lot of data[^1] and we need to insert it into postgres. Doing an insert per item was terrible (no suprise there), creating big inserts with many rows is better but still very bad. copy-from is better still but not as much of an improvement as I thought it would be. None of them come close to piping the data into a psql child process.

I created a bench to show the problem here: https://github.com/byrney/node-pg-copy-streams/commit/1086f50ee9cc29675053aee786d6f8e74544f95b#diff-1b32c28cb38c05480eccc1bd60ff97029b57a05c96718b96dad7e9d84894f549

Like our process, this calls `stream.write(...)` once per row and then pipes that to copy-from. Our real process sends much wider rows, but we see the same effect.

As a workaround we can pipe through a stream which batches up the chunks before sending them to copy-from. Making a very small number of calls to `copy-from.write` (at the cost of memory) helps a lot with speed. 

A potential improvement to copy-from is in that branch too: https://github.com/byrney/node-pg-copy-streams/tree/private/copy-from-performance. It moves the batching into the copy-from itself.

It needs a bit of a tidy-up[^2] but if you think the approach is sound and you're open to a PR, I can do that and submit it.


[^1]: This is a gross simplification but that's the gist of it
[^2]: There is still some debugging logic in there to count the number of calls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

copy-from performance with lots of chunks #127

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

copy-from performance with lots of chunks #127

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions