Protein Design CompetitionRound 2 with Adaptyv Bio and Dimension. Win and present at NeurIPS!

Small Molecule Guidelines

Crafted by an industry steering committee. Read the guidelines on
method evaluation, method comparison, and dataset curation.

The Nuances of Benchmarking

Recognizing the unique challenges of applying ML to small-molecule, predictive modeling tasks in drug discovery—such as complex, limited datasets and the need for interdisciplinary expertise—we formed an industry steering committee to develop comprehensive guidelines and resources for the community.

By pooling insights from decades of experience, our steering committee aims to set new standards for method evaluation (e.g., dataset splitting, evaluation metrics), method comparison (e.g., statistical tests), and data curation.

We started with a call to action. In our letter, we outline common pitfalls and challenges in benchmarking that contribute to a growing gap between ML innovations and their practical impact on drug discovery programs. We believe that an open-science, cross-industry, and interdisciplinary effort is a crucial first step toward addressing these challenges, and we invite other experts to join us.

Read The Letter
Correspondence in Nature Machine Intelligence

Meet the Steering Committee

Pat Walters

Pat Walters

Relay Therapeutics
Jeremy Ash

Jeremy Ash

Johnson & Johnson
Alan Cheng

Alan Cheng

Merck
Cas Wognum

Cas Wognum

Valence Labs
Djork-Arné Clevert

Djork-Arné Clevert

Pfizer
Raquel Rodriguez-Perez

Raquel Rodriguez-Perez

Novartis
Daniel Price

Daniel Price

Nimbus
Ola Engkvist

Ola Engkvist

AstraZeneca
Cheng Fang

Cheng Fang

Blueprint
Matteo Aldeghi

Matteo Aldeghi

Bayer

Publication Timeline

This is what we’re starting with. Have some ideas? Let us know!

November 2024
Guideline Paper

Method Comparison

To contextualize the results of a new ML method, its performance is typically compared to the state of the art and baselines. This paper proposes guidelines for small-molecule, predictive modeling on how to do this comparison in a robust way such that you can expect your conclusions to generalize to similar datasets and real-world use cases.

April 2025
Guideline Paper

Splitting Methods

To prevent models from merely memorizing training data—a problem known as overfitting — it's crucial to ensure that the similarity between training and test sets reflects real-world applications. This paper provides guidelines for small-molecule predictive modeling on how to measure generalization in a way that aligns with practical use cases.

July 2025
Guideline Paper

Data Curation

Large industrial datasets are rarely published due to competitive and intellectual property concerns. Therefore, drug discovery benchmarks often rely on public databases like ChEMBL. Curating datasets from these sources requires deep expertise in data generation processes and data modalities. To address this challenge, we propose guidelines for curating small-molecule datasets.

We Want Your Feedback

Have thoughts on the papers? Think we’re missing something? We’d love to collaborate with you on how we can improve the state of benchmarking for small molecules. Reach out to the steering committee using the button below.

Want updates on the guidelines?

Sign up to get notified.

We care about your data. Read our privacy policy.