Background
The PoseBusters dataset set is a new set of carefully-selected publicly-available crystal complexes from the PDB. It is a diverse set of recent high-quality protein–ligand complexes which contain drug-like molecules. It only contains complexes released since 2021 and therefore does not contain any complexes present in the PDBbind General Set v2020 used to train many of the methods.
Buttenschoen et al. outlines the steps used to select the 308 unique proteins and 308 unique ligands in the PoseBusters dataset. The complexes were downloaded from the PDB as MMTF files, and PyMOL was used to remove solvents and all occurrences of the ligand of interest. The proteins were saved with their cofactors in PDB format, while the ligands were saved in SDF format.
Benchmark description
This is a zero-shot benchmark that contains only a test set of 308 proteisn and ligands for evaluation.
Posebusters offers a series of ligand checkers, known as 'Posebuster Checkers,' to filter out undesired docked ligand conformers. It is recommended to apply these filters before uploading results to the Polaris Hub.
Only the extracted ligand from the docking output should be uploaded for evaluation, ensuring that the target protein coordinates have been aligned with the original crystal structure.
This benchmark uses the metric RMSD coverage (≤2Å), which calculates the percentage of molecules with an RMSD of less than 2Å compared to the reference. For more details, see the documentation on polaris. .
Data source
Other links