Uni-Dock-Benchmarks

The Uni-Dock-Benchmarks repository provides a comprehensive collection of datasets for benchmarking the Uni-Dock docking system's performance and accuracy. The datasets include prepared structures and input files for both Uni-Dock V1 and V2 for benchmarks.

Data

Benchmark data within the repository is categorized into two primary sections:

molecular_docking
virtual_screening

Molecular Docking Benchmarks

Under the molecular_docking directory, you will find several well-known benchmark datasets:

We performed the following preparation steps for the proteins and ligands in the datasets.

After obtaining the protein structures from the RCSB database based on the PDB code, we retained the crystal waters that affect the binding mode and completed missing protein side chains and lost hydrogen atoms.
For ligands, we searched the RCSB database for the isomer SMILES corresponding to the PDB code and determined the correct protonation state according to the receptor pocket environment. Then, we generated 3D conformations for each ligand.

After excluding systems for covalent ligand bindings, problematic binding mechanisms and those with large natural products or polypeptide ligands, 69 systems from Astex, 271 systems from CASF-2016 and 396 systems from PoseBuster were used as benchmarks.

The correctness of protein side chain structure and hydrogen bond networks have crucial impact on ligand docking, and hence the structure preparation for both protein and ligand determines the difficultness of producing correct ligand docking poses. We use our internal tools to prepare the initial structures of receptor and ligands so that we can obtain better docking results. In addition, we also integrated the open-sourced version of structure preparation algorithms for Uni-Dock V2 into the unified protocol in the Uni-Dock V2 github repository.

We prepare the receptor structure in two versions, protein with co-crystallized water version and protein only version, to test the overall effect of the presence of water on ligand docking experiments.

The directory structure for each dataset is as follows:

<DataSetName>
├── <PDB_ID>
│   ├── <PDB_ID>_ligand.sdf                    # Ligand co-crystal structure processed in SDF format
│   ├── <PDB_ID>_protein_water_cleaned.pdb     # Prepared receptor structure with protein and crystallized water in PDB format
│   ├── <PDB_ID>_protein_cleaned.pdb           # Prepared receptor structure with only protein in PDB format
│   ├── ligand_prepared.sdf                    # Reprepared ligand 3D conformation used in docking test in SDF format
│   ├── unidock1_protein                       # Folder for input files of Uni-Dock V1, with protein only in the receptor structure
│   │   ├── ligand_prepared_torsion_tree.sdf   # Prepared ligand structure with torsion tree information used in Uni-Dock V1 input in SDF format
│   │   └── receptor.pdbqt                     # Prepared receptor structure used in Uni-Dock V1 input in PDBQT format
│   ├── unidock1_protein_water                 # Folder for input files of Uni-Dock V1, with protein and water in the receptor structure
│   │   ├── ligand_prepared_torsion_tree.sdf   # Prepared ligand structure with torsion tree information used in Uni-Dock V1 input in SDF format
│   │   └── receptor.pdbqt                     # Prepared receptor structure used in Uni-Dock V1 input in PDBQT format
│   ├── unidock2_protein                       # Folder for input files of Uni-Dock V2, with protein only in the receptor structure
│   │   ├── <PDB_ID>_unidock2.json             # Integrated JSON input file for Uni-Dock V2 docking engine
│   │   └── receptor_parameterized.dms         # Prepared and parameterized receptor structure in DMS format
│   └── unidock2_protein_water                 # Folder for input files of Uni-Dock V2, with protein and water in the receptor structure
│       ├── <PDB_ID>_unidock2.json             # Integrated JSON input file for Uni-Dock V2 docking engine
│       └── receptor_parameterized.dms         # Prepared and parameterized receptor structure in DMS format
└── pdb_center.csv                             # CSV file recording the protein pocket center with respect to the <PDB_ID> for each system

Virtual Screening Benchmarks

Under the virtual_screening directory, you will find several meticulously selected benchmark datasets:

The following table summarizes the statistics of the datasets:

Dataset	PDB ID	N_Actives	N_Inactives	N_Total
D4	5WIU	226	598	824
GBA	5LVX	286	458,205	458,491
NSP3	5RS7	65	3,515	3,580
PPARG	5Y2T	29	7,292	7,321
sigma2	7M94	228	596	824

The directory structure for each dataset is as follows:

<DataSetName>
├── docking_grid.json                         # JSON file recording the protein pocket center and the box sizes
├── <PDB_ID>_receptor.pdb                     # Original unprocessed receptor structure in PDB format
├── <PDB_ID>_protein_cleaned.pdb              # Prepared receptor structure with only protein in PDB format
├── actives_cleaned.sdf                       # Preprocessed and cleaned active molecules in SDF format
├── actives.sdf                               # Active molecules in SDF format
├── inactives_cleaned.sdf                     # Preprocessed and cleaned inactive molecules in SDF format
├── inactives.sdf                             # Inactive molecules in SDF format
├── unidock1_protein                          # Folder for input files of Uni-Dock V1, with protein only in the receptor structure
│   ├── actives_prepared_torsion_tree.sdf     # Prepared active molecule structure with torsion tree information used in Uni-Dock V1 input in SDF format
│   ├── inactives_prepared_torsion_tree.sdf   # Prepared inactive molecule structure with torsion tree information used in Uni-Dock V1 input in SDF format
│   └── receptor.pdbqt                        # Prepared receptor structure used in Uni-Dock V1 input in PDBQT format
└── unidock2_protein                          # Folder for input files of Uni-Dock V2, with protein only in the receptor structure
    ├── actives_unidock2.json                 # Integrated JSON input file of active molecules for Uni-Dock V2 docking engine
    ├── inactives_unidock2.json               # Integrated JSON input file of inactive molecules for Uni-Dock V2 docking engine
    └── receptor_parameterized.dms            # Prepared and parameterized receptor structure in DMS format

Important Note Due to the substantial number of inactive molecules in the GBA dataset, the directory contains several large files that exceed GitHub's size limits. These files have been moved to cloud storage. To obtain the complete GBA directory, please run the following command in your terminal:

./getGBA.sh

Scripts

Two scripts are provided to run benchmark tests on Uni-Dock executable binaries:

`run_test.py` - Single Benchmark Test

The main entry script for running a single benchmark test. It supports both Uni-Dock V1 and V2, and can run either molecular docking or virtual screening benchmarks.

Basic Usage

Molecular Docking:

# Uni-Dock V2 with receptor without water
python scripts/run_test.py --version 2 --bin ud2 --type molecular_docking --nowater --device 1 --savedir my_res --seed 121

# Uni-Dock V2 with receptor containing water (default)
python scripts/run_test.py --version 2 --bin ud2 --type molecular_docking --device 1 --savedir my_res --seed 121

# Uni-Dock V1 with receptor containing water
python scripts/run_test.py --version 1 --bin ud1 --type molecular_docking --device 1 --savedir my_res --seed 121

Virtual Screening:

python scripts/run_test.py --version 2 --bin ud2 --type virtual_screening --device 0 --savedir res_vs --seed 122

Parameters

--savedir <DIR> (required) - Output directory for the results
--bin <PATH> (required) - Path to the Uni-Dock executable binary
--version <1|2> (required) - Uni-Dock version (1 or 2)

NOTE: Specifying --version 2 automatically loads the file scripts/ud2.yaml as default configuration.
--type <molecular_docking|virtual_screening> (required) - Type of benchmark test
--device <ID> (optional, default: 0) - GPU device ID
--seed <INTEGER> (optional, default: 123) - Random seed for reproducibility
--nowater (optional) - Use receptor without water (only for molecular_docking). Default: uses water-containing receptor
--rootdir <PATH> (optional) - Root directory of the data (Uni-Dock-Benchmarks dir). If not provided, the script should be run from the Uni-Dock-Benchmarks directory

`submit_udbench.sh` - Batch Benchmark Submission

The script automatically submits 3 benchmark tasks, each using different GPU devices and seeds, which you can then average for more reliable results. All tasks run in the background (nohup) and will not block.

Basic Usage

Check detailed help information by running the script.

# Basic usage with default values
bash scripts/submit_udbench.sh res2_0.4.4.1_dock 0 1 2

# Specify all parameters
bash scripts/submit_udbench.sh res2_0.4.4.1_dock 0 1 2 --bin ud2_0.4.4.1 --version 2 --type molecular_docking

# Use --nowater parameter
bash scripts/submit_udbench.sh res2_0.4.4.1_dock 0 1 2 --bin ud2_0.4.4.1 --version 2 --type molecular_docking --nowater

# Virtual screening task
bash scripts/submit_udbench.sh res_vs 0 1 2 --bin ud2_v0.2 --version 2 --type virtual_screening

Managing Submitted Tasks

After submission, you can manage tasks using the recorded PID files.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
getGBA.sh		getGBA.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Uni-Dock-Benchmarks

Data

Molecular Docking Benchmarks

Virtual Screening Benchmarks

Scripts

`run_test.py` - Single Benchmark Test

Basic Usage

Parameters

`submit_udbench.sh` - Batch Benchmark Submission

Basic Usage

Managing Submitted Tasks

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

dptech-corp/Uni-Dock-Benchmarks

Folders and files

Latest commit

History

Repository files navigation

Uni-Dock-Benchmarks

Data

Molecular Docking Benchmarks

Virtual Screening Benchmarks

Scripts

run_test.py - Single Benchmark Test

Basic Usage

Parameters

submit_udbench.sh - Batch Benchmark Submission

Basic Usage

Managing Submitted Tasks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

`run_test.py` - Single Benchmark Test

`submit_udbench.sh` - Batch Benchmark Submission

Packages