Skip to content

Reduce maximum memory usage while reading SRA data? #97

@jgans

Description

@jgans

Using the C VDB API (and following the fasterq-dump utility strategy for accessing SRA records) for reading SRA data can consume a significant amount of RAM while reading an SRA record. This can be an issue when using attempting to minimize the amount of Cloud computing resources (i.e. instance RAM) when processing a large number of SRA records.

The maximum amount of RAM used while reading (as measured with /usr/bin/time -v) depends on the record:
image

While periodically calling VCursorRelease() and VCursorOpen() to force the VDB interface to deallocate RAM offers a minor reduction in the maximum amount of RAM used (about 25%), this strategy significantly slows down the rate at which an SRA record is read.

Is it possible/feasible to limit memory consumption using the VDB C API to sub-gigabyte levels, independent of the number of reads? The goal is to read through an SRA record once, as quickly as possible and using as little RAM as possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions