The Antiviral Competition ResultsSee the results of the Polaris competition, organized by ASAP Discovery and OpenADMET.

The benchmarking platform for drug discovery.

Polaris makes it easy for the machine learning in drug discovery community to share and access datasets & benchmarks.

Home Graphic

Featured datasets on Polaris.

December 11, 2024
222,601
To accompany OpenPhenom-S/16, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset, image embeddings computed with OpenPhenom-S/16, and associations between the included small molecules and genes. The dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community.
R
recursion
October 30, 2024
~99.3M
This is the dataset provided for the BELKA competition, which Leash Biosciences hosted on Kaggle in the summer of 2024. It is roughly 100M small molecules from DNA-encoded chemical libraries (DELs) screened against 3 protein targets (BRD4, EPHX2/sEH, and ALB/HSA) and includes binary binding labels. Leash is also providing the raw data from these experiments: raw sequencing counts and counts-per-billion of 3 replicates of 3 rounds of selection per protein, plus additional replicates of two negative controls, plus additional raw data from a smaller orthogonal DEL used as a private test set in the Kaggle competition. The raw dataset is some 4.25B physical measurements.
L
leash-bio
July 10, 2024
384 KB3,521
Assessing ADME properties helps understand a drug candidate’s interaction with the body in terms of absorption, distribution, metabolism, and excretion, essential for evaluating its efficacy, safety, and clinical potential. Fang et al. (2023) presented DMPK datasets gathered over 20 months, covering six in vitro ADME endpoints: human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. With 885 to 3087 measurements across endpoints, the dataset showcases chemical diversity in key properties like microsomal stability, plasma protein binding, permeability, and solubility.
B
biogen

Increase your impact.

Our aim is to improve the state of benchmarking so ML can have a greater impact on real-world drug discovery scenarios. To start, we hope to provide a single source of truth that aggregates and provides simple access to datasets & benchmarks.

Download a dataset from the Hub

import polaris as po

# Load the dataset from the Hub
dataset = po.load_dataset("polaris/my-first-dataset")

# Get information on the dataset size
dataset.size()

# Load a datapoint in memory
dataset.get_data(
    row=dataset.rows[0],
    col=dataset.columns[0],
)

# Or, similarly:
dataset[dataset.rows[0], dataset.columns[0]]

# Get an entire data point
dataset[0]

Evaluate your method on a benchmark

import polaris as po
import numpy as np

# Load the benchmark from the Hub
benchmark = po.load_benchmark("polaris/my-first-benchmark")

# Get the train and test data-loaders
train, test = benchmark.get_train_test_split()

# Use the training data to train your model
# Get the input as an array with 'train.inputs' and 'train.targets'  
# Or simply iterate over the train object.
for x, y in train:
    ...

# Work your magic to accurately predict the test set
predictions = [... for x in test]

# Evaluate your predictions
results = benchmark.evaluate(predictions)

# Submit your results
results.upload_to_hub(owner="dummy-user")

Guidelines for dataset curation and method evaluation & comparison.

Home Graphic - Cubes

Starting with small molecules

Through a unique, cross-industry collaboration involving representatives from Recursion Pharmaceuticals, AstraZeneca, Relay Therapeutics, Pfizer, Merck, Nimbus Therapeutics, Blueprint Medicines, Johnson & Johnson, Novartis, Bayer, and Valence Labs, we'll be releasing recommended benchmarks and datasets plus guidelines for dataset curation, method evaluation, and comparison.

Explore Today

Sign up for updates

Get the latest features, events, and Polaris news, straight to your inbox. You can opt out at any time