Skip to content

copy-from performance with lots of chunks #127

@byrney

Description

@byrney

We have a node process that generates a lot of data1 and we need to insert it into postgres. Doing an insert per item was terrible (no suprise there), creating big inserts with many rows is better but still very bad. copy-from is better still but not as much of an improvement as I thought it would be. None of them come close to piping the data into a psql child process.

I created a bench to show the problem here: byrney@1086f50#diff-1b32c28cb38c05480eccc1bd60ff97029b57a05c96718b96dad7e9d84894f549

Like our process, this calls stream.write(...) once per row and then pipes that to copy-from. Our real process sends much wider rows, but we see the same effect.

As a workaround we can pipe through a stream which batches up the chunks before sending them to copy-from. Making a very small number of calls to copy-from.write (at the cost of memory) helps a lot with speed.

A potential improvement to copy-from is in that branch too: https://github.com/byrney/node-pg-copy-streams/tree/private/copy-from-performance. It moves the batching into the copy-from itself.

It needs a bit of a tidy-up2 but if you think the approach is sound and you're open to a PR, I can do that and submit it.

Footnotes

  1. This is a gross simplification but that's the gist of it

  2. There is still some debugging logic in there to count the number of calls

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions