Skip to content

Batching and parallelising inference calls for one receptor. #289

@BenWallner7

Description

@BenWallner7

Hi,

I am trying to run DiffDock on c.70k ligands for the same receptor, currently my runtime on an A100 GPU is 7 hours for 1000 ligands and I am trying to improve this.

Firstly, I have looked at caching the ESM embedding and then loading it for each ligand so it isn't recalculated each time.

I have tried to increase the --batch_size argument in the inference call but I am still only working on one ligand at a time how do I actually get it to perform this batching process? Will batching based on SMILE length or molecular weight work as the batching is based on graph input size?

How can I run these inference calls in parallel, I am seeing a lockup due to spawning multiple Docker/Singularity shells and microbamba environments so using parallel to run processes in parallel

Does anyone have experience with this type of implementation and best practices?

Thanks for any support or advice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions