Unblinded Dataset for the Potency challenge
🏆 Antiviral Challenge
ASAP Discovery and Open ADMET organized a blind challenge on Polaris in early 2025. It consisted of three parts: ADMET, Potency and Ligand Poses. This is the unblinded dataset for the Potency challenge.
- Participants were tasked with predicting the pIC50 for two targets, given the CXSMILES.
- The complete evaluation logic has been published on Github.
- The final leaderboard for this challenge had a total of 32 submissions, which can be found here.
📊 Data
Column | Dtype | Description |
---|
Molecule Name | str | Internal identifier at ASAP Discovery for this molecule |
CXSMILES | str | Text representation of the 2D molecular structure |
pIC50 (SARS-CoV-2 Mpro) | float | Negative log10 of the IC50 values of the dose-response curve |
pIC50 (MERS-CoV Mpro) | float | Negative log10 of the IC50 values of the dose-response curve |
Set | Categorical | Whether this was part of the train or test set |
Be aware that the data is sparse. As is common in real drug discovery settings, not every molecule has been tested in every assay.
⚠️ Known Issues
We're publishing the full dataset for transparency and reproducibility, but please be aware that the dataset has some known issues that were surfaced during the challenge.
Enantiomers
A number of enantiomers occurred in both the train and test set. These molecules share the same molecular formula and connectivity (i.e., the same number and type of atoms bound in the same sequence) and hence the same (CX)SMILES, but differ in the spatial arrangement of their atoms. If you're not familiar with stereochemistry, you can learn more about it here. Stereochemistry matters a lot in binding and such enantiomers actually have different readouts in the assay, but since we only provided the CXSMILES they were duplicates within the context of this challenge.
We therefore filtered out the following enantiomers at test time: 1036, 1039, 1219, 1225, 1306