ASAP Discovery x OpenADMET CompetitionTake part in the first prospective benchmark on Polaris.

Dataset

asap-discovery/antiviral-ligand-poses-2025-sample

Sample dataset for the ASAP Discovery x OpenADMET binding pose challenge. Represents a portion of the training data.

Created on: December 03, 2024Number of datapoints: 58
Public
V2

Status

Uncertified

This artifact has not been certified by approved reviewers. It may contain issues related to data quality.

Learn more here.

Tags

MERS-CoV
SARS-CoV-2
Mpro
ligand
poses

Modalities

MOLECULE
MOLECULE_3D
PROTEIN
PROTEIN_3D

Related benchmarks

No related benchmarks yet.

You're looking at a v2.0 dataset!

Our goal at Polaris is to build a universal format for ML-ready datasets in drug discovery. With our V2 implementation, we're drastically improving scalability, but there's still work to be done!

Details

README

Ligand Pose challenge (Sample Dataset)

This dataset is made available as part of the ASAP Discovery x OpenADMET competition. It's a small portion of the training data, made available already to let teams prepare data loaders and other utilities.

⚠️ Important notes

The format Polaris uses for PDB and SDF data requires some getting used to.

SDF: Serialized RDKit Mol

The SDF data is saved in a RDKit Mol object. To get the original SDF, you can for example use Datamol:

import polaris as po import datamol as dm dataset = po.load_dataset("asap-discovery/antiviral-ligand-poses-2025-sample") mol = dataset[0]["Ligand Pose"] dm.to_sdf(mol, "/path/to/mol.sdf")

PDB: Use of FastPDB

The PDB data is saved in a FastPDB (or Biotite) AtomArray object. To get the original SDF, you can for example use:

import polaris as po import fastpdb dataset = po.load_dataset("asap-discovery/antiviral-ligand-poses-2025-sample") atom_array = dataset[0]["Complex Structure"] out_file = fastpdb.PDBFile() out_file.set_structure(atom_array) out_file.write("path/to/another_file.pdb")

Structure

This dataset has the following columns:

ColumnDtypeDescription
Chain A SequencestrPrimary structure of the protein's A chain: A linear sequence of amino acids.
Chain B SequencestrPrimary structure of the protein's B chain, if any: A linear sequence of amino acids.
CXSMILESstrText representation of the 2D molecular structure
Complex StructurePDB (fastpdb.AtomArray)3D system of the ligand bound to the protein, prepared using OESpruce and aligned to a reference Mpro
Protein StructurePDB (fastpdb.AtomArray)3D system of just the protein structure, prepared using OESpruce and aligned to a reference Mpro
Reference StructurePDB (fastpdb.AtomArray)3D system of a reference structure of the protein
Ligand PoseSDF (rdkit.Chem.Mol)3D conformation of the molecule, bound to the protein

For the challenge, we will provide all these columns for the train set. At test time, we will only provide the Reference Structure, Chain A Sequence,Chain B Sequence and CXSMILES.

📦 Raw data package

We've sacrificed the completeness of the scientific data to improve ease of use. However, for those that are interested, you can also access the raw data package that this dataset has been created from here.

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.

AttributeValue
MethodX-ray crystallography