ASAP Discovery x OpenADMET CompetitionTake part in the first prospective benchmark on Polaris.

Competition

asap-discovery/antiviral-ligand-poses-2025

Since the rise of structure-informed drug discovery in the 1980s-1990s, structural biology is key to drug discovery. We challenge you to predict MERS-CoV Mpro poses using knowledge from the SARS-CoV-2 Mpro crystallography data that ASAP created.

Duration: about 2 monthsStart time: January 13, 2025 00:00
Starts in about 1 month

Tags

MERS-CoV
SARS-CoV-2
Mpro
ligand
poses

Modalities

MOLECULE_3D
PROTEIN_3D
MOLECULE
PROTEIN

Details

README

banner

This is the ligand pose challenge, part of the ASAP Discovery x OpenADMET competition.

Ligand Poses

Since the rise of structure-informed drug discovery in the 1980s-1990s, structural biology is key to drug discovery. The structural effect of small adjustments to molecules can now be rapidly shown with X-Ray crystallography. However, creating the right experimental conditions (the right protein construct, buffer concentrations, etc) is extremly difficult even for the most experienced structural biology team. At ASAP, the SARS-CoV-2 Mpro program was structurally enabled from the start of the consortium, but the MERS-CoV Mpro program was not. This led to a delay in MERS-CoV Mpro potency even though the proteins are highly similar. We will challenge you to predict MERS-CoV Mpro poses using knowledge from the SARS-CoV-2 Mpro crystallography data that ASAP created.

📊 Data

The training set will have the following variables:

ColumnDtypeDescription
Chain A SequencestrPrimary structure of the protein's A chain: A linear sequence of amino acids.
Chain B SequencestrPrimary structure of the protein's B chain, if any: A linear sequence of amino acids.
CXSMILESstrText representation of the 2D molecular structure
Complex StructurePDB (fastpdb.AtomArray)3D system of the ligand bound to the protein, prepared using OESpruce and aligned to a reference Mpro
Protein StructurePDB (fastpdb.AtomArray)3D system of just the protein structure, prepared using OESpruce and aligned to a reference Mpro
Reference StructurePDB (fastpdb.AtomArray)3D system of a reference structure of the protein
Ligand PoseSDF (rdkit.Chem.Mol)3D conformation of the molecule, bound to the protein

At test time, we will only provide the Reference Structure, Chain A Sequence,Chain B Sequence and CXSMILES.

You will be able to download the train and test set through Polaris as if it's any other benchmark.

⚠ī¸ Important notes

The format Polaris uses for PDB and SDF data requires some getting used to.

SDF: Serialized RDKit Mol

The SDF data is saved in a RDKit Mol object. To get the original SDF, you can for example use Datamol:

import datamol as dm mol = dataset[0]['Ligand Pose'] dm.to_sdf(mol, '/path/to/mol.sdf')

PDB: Use of FastPDB

The PDB data is saved in a FastPDB (or Biotite) AtomArray object. To get the original SDF, you can for example use:

import fastpdb atom_array = dataset[0]['Complex Structure'] out_file = fastpdb.PDBFile() out_file.set_structure(atom_array) out_file.write('path/to/another_file.pdb')

ℹī¸ Sample data and raw data

Through Polaris, we will provide a ML-ready dataset that can be easily used in ML applications. You can find a sample dataset for this challenge here. This allows teams to prepare dataloaders and other utilities.

import polaris as po po.load_dataset('asap-discovery/antiviral-ligand-poses-2025-sample')

We've sacrificed the completeness of the scientific data to improve ease of use. However, for those that are interested, you can also access the raw data package that this dataset has been created from here.

✂ī¸ Split

We will provide training data on SARS CoV-2 Mpro, but participants are free to use any data in the public domain. The test set will be comprised of both MERS-CoV Mpro and SARS-CoV-2 Mpro structures

📨 Prepare your submission

We welcome submissions of any kind, including machine learning and physics based approaches. You can employ pre-training approaches as you see fit. You are also free to reuse data from one portion of the challenge for others if it will assist you.

The format of this submission will be a list of rdkit.Chem.Mol objects with the binding poses.

[ rdkit.Chem.Mol(), rdkit.Chem.Mol(), ..., rdkit.Chem.Mol() ]

You will submit your predictions directly through the Polaris API. We will provide a complete code example when the competition launches.

⚠ī¸ Align your pose

We expect the binding pose to be aligned with the provided reference structure of the protein. We will provide a code example of how to do so.

✅ Evaluation criteria

The competition will be judged based on the judging criteria outlined here.

  • We will evaluate your submission using symmetry corrected heavy-atom ligand RMSD versus crystallographic pose.
  • You can enter as many times as desired, but we will only evaluate your last submission.
  • In the open science spirit of ASAP Discovery we would love to see open code showing how you created your submission if possible. If not, we require at least a written report.

ℹī¸ Overall competition winner

We will also elect an overall competition winner. This will be based on participants' performance on all subchallenges (entry to all required).

🏆 Prizes

For each sub challenge we will select a champion. To be eligible for ADMET subchallenge champion you must provide predictions for every endpoint.

In addition to eternal glory, the champions and winner will have the opportunity to present their work at the NIH AViDD ASAP Open Science Forum https://asapdiscovery.org/forum, one of the peak groups in antiviral drug discovery. Additionally we will be offering some Polaris merch packs. We will also be writing our conclusions up as a paper, to which all submitting teams are invited to share co-authorship.


About the ASAP Discovery x OpenADMET competition

ASAP Discovery is an NIH-funded consortium leveraging open science for antiviral drug discovery, with the goal of equitable and affordable global access to effective antivirals. ASAP has pursued several programs and targets, the most advanced being ASAP's dual SARS-CoV-2 and MERS-CoV main protease (Mpro) program, which has reached preclinical candidate nomination. You can see a full list of ASAP's programs on the website. ASAP Discovery is passionate about open science and has put a huge amount of effort into sharing its outputs in a digestible way with the community. For example, if you navigate to ASAP's website, the drug discovery pipeline is fully interactive for users. Clicking any filled box will navigate you to the continuously published data for those experiments, and experimental protocols used.

ASAP Discovery is approaching a patent disclosure for its preclinical candidates for its two coronavirus Mpro drug discovery programs see blogpost for a high-level overview. There is a batch of data in these projects that ASAP Discovery has not publicly disclosed at this point; this will be the blind test data of this challenge. The blind challenge will mirror some of the real-world drug discovery challenges that ASAP has had to overcome in the last three years: we would love to challenge the community with the same hurdles that we've had to overcome during this process - can you use your models to solve these problems better than we have? You will be working with active and real drug discovery data that is normally restricted to large pharmaceutical companies!

banner The ASAP Discovery Consortium group meeting in NYC May 2023

All subchallenges:

Timeline

  • Sample data released: December 3 (2024)
  • Challenge start: Jan 13 (2025)
  • Jan-Feb: Walk in online sessions (2025)
  • Challenge end: March 10 (2025)
  • Winners announced: March 25 (2025)

Endpoints included in this challenge

We have designed this challenge to let you experience a diverse set of computational drug discovery problems that are pivotal in pushing the pharmaceutical decision-making process forward. To understand the typical medicinal-chemistry way of thinking about making a preclinical candidate, it's best to start at the top. Target Candidate Profiles (TCPs) are internal documents that pharmaceutical companies draw up that set a series of goals or must-haves (and sometimes nice-to-haves) that the intended preclininical candidate must have. With ASAP, these are public. Our SARS-CoV-2/MERS-Mpro dual inhibitor TCP is available here. You'll see there are many goals: the set of goals and their values depend heavily on the target indication (the disease that we're trying to treat).

You'll also notice that potency (IC50 or Kd) is only a small part of this TCP. That is typical: in close-to-preclinical stages such as lead optimization, potency is not the main challenge anymore. Rather, the challenge is to balance a wide array of more complex parameters such as cell potency, formulation, pharmacokinetics/dynamics and safety. These are all part of the 'assay cascade': promisingly potent lead molecules are subjected to a first tier of affordable follow-up assays. Ones that come out of those assays as acceptable (i.e. within the bounds of the TCP requirements) are followed up on in subsequent assay tiers. In this way, lead molecules follow the cascade from simple biochemical potency assays all the way to more involved assays and ultimately animal studies.

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.

AttributeValue
MethodX-ray crystallography