-
Notifications
You must be signed in to change notification settings - Fork 42
Description
We have a node process that generates a lot of data1 and we need to insert it into postgres. Doing an insert per item was terrible (no suprise there), creating big inserts with many rows is better but still very bad. copy-from is better still but not as much of an improvement as I thought it would be. None of them come close to piping the data into a psql child process.
I created a bench to show the problem here: byrney@1086f50#diff-1b32c28cb38c05480eccc1bd60ff97029b57a05c96718b96dad7e9d84894f549
Like our process, this calls stream.write(...) once per row and then pipes that to copy-from. Our real process sends much wider rows, but we see the same effect.
As a workaround we can pipe through a stream which batches up the chunks before sending them to copy-from. Making a very small number of calls to copy-from.write (at the cost of memory) helps a lot with speed.
A potential improvement to copy-from is in that branch too: https://github.com/byrney/node-pg-copy-streams/tree/private/copy-from-performance. It moves the batching into the copy-from itself.
It needs a bit of a tidy-up2 but if you think the approach is sound and you're open to a PR, I can do that and submit it.