Ligand Pose challenge (Sample Dataset)
This dataset is made available as part of the ASAP Discovery x OpenADMET competition. It's a small portion of the training data, made available already to let teams prepare data loaders and other utilities.
⚠️ Important notes
The format Polaris uses for PDB and SDF data requires some getting used to.
SDF: Serialized RDKit Mol
The SDF data is saved in a RDKit Mol object. To get the original SDF, you can for example use Datamol:
import polaris as po
import datamol as dm
dataset = po.load_dataset("asap-discovery/antiviral-ligand-poses-2025-sample")
mol = dataset[0]["Ligand Pose"]
dm.to_sdf(mol, "/path/to/mol.sdf")
PDB: Use of FastPDB
The PDB data is saved in a FastPDB (or Biotite) AtomArray
object. To get the original SDF, you can for example use:
import polaris as po
import fastpdb
dataset = po.load_dataset("asap-discovery/antiviral-ligand-poses-2025-sample")
atom_array = dataset[0]["Complex Structure"]
out_file = fastpdb.PDBFile()
out_file.set_structure(atom_array)
out_file.write("path/to/another_file.pdb")
Structure
This dataset has the following columns:
Column | Dtype | Description |
---|
Chain A Sequence | str | Primary structure of the protein's A chain: A linear sequence of amino acids. |
Chain B Sequence | str | Primary structure of the protein's B chain, if any: A linear sequence of amino acids. |
CXSMILES | str | Text representation of the 2D molecular structure |
Complex Structure | PDB (fastpdb.AtomArray ) | 3D system of the ligand bound to the protein, prepared using OESpruce and aligned to a reference Mpro |
Protein Structure | PDB (fastpdb.AtomArray ) | 3D system of just the protein structure, prepared using OESpruce and aligned to a reference Mpro |
Reference Structure | PDB (fastpdb.AtomArray ) | 3D system of a reference structure of the protein |
Ligand Pose | SDF (rdkit.Chem.Mol ) | 3D conformation of the molecule, bound to the protein |
For the challenge, we will provide all these columns for the train set. At test time, we will only provide the Reference Structure
, Chain A Sequence
,Chain B Sequence
and CXSMILES
.
📦 Raw data package
We've sacrificed the completeness of the scientific data to improve ease of use. However, for those that are interested, you can also access the raw data package that this dataset has been created from here.