ASAP Discovery x OpenADMET CompetitionTake part in the first prospective benchmark on Polaris.

Dataset

plinder-org/runs-n-poses-dataset

A dataset with 2,600 high-resolution protein-ligand systems released after 30 September 2021

Created on: February 03, 2025Number of datapoints: 2,835
Public

Status

Uncertified

This artifact has not been certified by approved reviewers. It may contain issues related to data quality.

Learn more here.

Tags

pli
structural-biology
co-folding
protein
ligand

Modalities

PROTEIN_3D
MOLECULE_3D
MOLECULE

Related benchmarks

Details

README

Runs N' Poses Dataset

A dataset with 2,600 high-resolution protein-ligand systems released after 30 September 2021, the training cutoff used by AlphaFold 3, Protenix, Chai-1, and Boltz-1.

No. ofTotal
Systems2600
Ligands3,047
Ligands (incl. ions and artifacts)4,282
Multi-ligand systems401
Multi-protein systems790

See the Github repository for more details.

Benchmark

This dataset was primarily created to accompany the plinder-org/runs-n-poses benchmark.

Format

ColumnData TypeDescription
ligand_ccd_codestrThe ID of the ligand in the Chemical Component Dictionary
ligand_poseMolThe 3D structure of the ligand
ligand_smilesstrThe 2D graph structure of the ligand
plinder_idstrThe ID of the system in the Plinder dataset
plinder_metadatadictAdditional metadata of the system from the Plinder dataset
receptorAtomArrayThe 3D protein structure of the receptor, without the ligand
sequencesdictThe canonical sequence (SEQRES) of all the receptor's chains
similarity_to_trainfloatThe similarity to the closest system released published before the training cutoff, using sucos_shape_pocket_qcov
system_sequences_and_smilesdictThe description of the complete system, including the canonical sequence of the receptor's chains and the 2D graph structure of all bound ligands
system_structures_and_posesAtomArrayThe complete 3D structure of the bound system, including both receptor and ligand

The source data for this ML-ready dataset can be found on Zenodo: DOI

All systems in this dataset are derived from PLINDER's ingestion pipeline.

Citation

If you use this dataset in your research, please cite Runs N' Poses:

Å krinjar, P., Eberhardt, J., Durairaj, J., & Schwede, T. "Have protein-ligand co-folding methods moved beyond memorisation?" bioRxiv (2025): 2025-02