Unblinded Dataset for the Ligand Poses challenge
🏆 Antiviral Challenge
ASAP Discovery and Open ADMET organized a blind challenge on Polaris in early 2025. It consisted of three parts: ADMET, Potency and Ligand Poses. This is the unblinded dataset for the Ligand Poses challenge.
- Participants were tasked with predicting the bound Ligand Pose, given the CXSMILES, Chain A Sequence, Chain B Sequence and Protein Label.
- The complete evaluation logic has been published on Github.
- The final leaderboard for this challenge had a total of 39 submissions, which can be found here.
📊 Data
Column | Dtype | Description |
---|
Chain A Sequence | str | Primary structure of the protein's A chain: A linear sequence of amino acids. |
Chain B Sequence | str | Primary structure of the protein's B chain, if any: A linear sequence of amino acids. |
CXSMILES | str | Text representation of the 2D molecular structure |
Complex Structure | PDB (fastpdb.AtomArray ) | 3D system of the ligand bound to the protein, prepared using OESpruce and aligned to a reference Mpro |
Protein Structure | PDB (fastpdb.AtomArray ) | 3D system of just the protein structure, prepared using OESpruce and aligned to a reference Mpro |
Ligand Pose | SDF (rdkit.Chem.Mol ) | 3D conformation of the molecule, bound to the protein |
Protein Label | str | Either SARS-CoV-2 Mpro or MERS-CoV Mpro |
Set | Categorical | Whether this was part of the train or test set |
⚠️ Known Issues
We're publishing the full dataset for transparency and reproducibility, but please be aware that the dataset has some known issues that were surfaced during the challenge.
Duplicates
The dataset included duplicates for the following three reasons:
- Compounds showed variations in poses (n=6) so both variants were uploaded into individual entries
- Uploaded newer version or better data for a compound (n=2) so the newer variant is the one used in guiding design
- Simple duplication in uploading a second dataset (n=1)
In other words: For certain protein, ligand pairs, there were multiple possible poses. For those duplicates, we evaluated the best pose with the minimal RMSD for the challenge. See the code here.