Unblinded Dataset for the ADMET challenge
🏆 Antiviral Challenge
ASAP Discovery and Open ADMET organized a blind challenge on Polaris in early 2025. It consisted of three parts: ADMET, Potency and Ligand Poses. This is the unblinded dataset for the ADMET challenge.
- Participants were tasked with predicting 5 ADMET properties, given the CXSMILES.
- The complete evaluation logic has been published on Github.
- The final leaderboard for this challenge had a total of 39 submissions, which can be found here.
📊 Data
Column | Unit | Dtype | Description |
---|
Molecule Name | | str | Internal identifier at ASAP Discovery for this molecule |
CXSMILES | | str | Text representation of the 2D molecular structure |
MLM | uL/min/mg | float | MLM assay readouts for stability |
HLM | uL/min/mg | float | HLM assay readouts for stability |
KSOL | uM | float | KSOL assay readouts for solubility |
LogD | | float | LogD calculation |
MDR1-MDCKII | 10^-6 cm/s | float | MDR1-MDCKII assay readouts for permeability |
Set | | Categorical | Whether this was part of the train or test set |
Be aware that the data is sparse. As is common in real drug discovery settings, not every molecule has been tested in every assay.
⚠️ Known Issues
We're publishing the full dataset for transparency and reproducibility, but please be aware that the dataset has some known issues that were surfaced during the challenge.
Shifting bounds for CLint assays.
Our understanding of the data evolved with time. A retrospective analysis of % remaining against Clint, which assumed log linear decay, concluded that we could not confidently quote Clint values for measurements < 10. For all future measurements, we thus set this as a lower bound but prior calculations were not updated. For this challenge, some CLint values <10 µl/min/mg
were accidentally included in the test set and we decided to filter these out.
- For HLM, these are indices:
519, 524, 547, 553
- For MLM, these are indices:
515, 518, 521, 524, 525
See the associated CXSMILES
here.