Guidelines for Method ComparisonRead the first pre-print from the Small Molecule Steering Committee

This dataset has been certified! Learn why this matters here.

Dataset

biogen/adme-fang-v1

Assessing ADME properties helps understand a drug candidate’s interaction with the body in terms of absorption, distribution, metabolism, and excretion, essential for evaluating its efficacy, safety, and clinical potential. Fang et al. (2023) presented DMPK datasets gathered over 20 months, covering six in vitro ADME endpoints: human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. With 885 to 3087 measurements across endpoints, the dataset showcases chemical diversity in key properties like microsomal stability, plasma protein binding, permeability, and solubility.

Created on: July 10, 2024Dataset size: 384 KBNumber of datapoints: 3,521
Public

Status

Certified

This artifact has been reviewed in line with our Dataset 101 guidelines and was found to meet all criteria.

Learn more here.

Tags

adme

Modalities

MOLECULE

Details

README

ADME

Background

The goal of accessing ADME properties is to understand how a potential drug candidate interacts with the human body, including absorption, distribution, metabolism, and excretion. This knowledge is crucial for evaluating efficacy, safety, and clinical potential, guiding drug development for optimal therapeutic outcomes. Fang et al. 2023 has disclosed DMPK datasets collected over 20 months across six ADME in vitro endpoints, which are human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. The dataset contains 885 to 3087 measures for the corresponding endpoints. The compounds show the chemical diversity across all ranges of the endpoints which are microsomal stability, plasma protein binding, permeability, and solubility.

Description of readout

  • Microsomal stability (human and rat): LOG HLM_CLint (mL/min/kg), LOG RLM_CLint (mL/min/kg)
  • Plasma protein binding (human and rat): LOG PLASMA PROTEIN BINDING (HUMAN) (% unbound), LOG PLASMA PROTEIN BINDING (RAT) (% unbound)
  • Permeability: LOG MDR1-MDCK ER (B-A/A-B)
  • Solubility: LOG SOLUBILITY PH 6.8 (ug/mL)
  • Number of molecules after curation: 3516

Data resource

Reference: Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective

Github: https://github.com/molecularinformatics/Computational-ADME

Raw data: https://github.com/molecularinformatics/Computational-ADME/blob/main/ADME_public_set_3521.csv

Data curation

To maintain consistency with other benchmarks in the Polaris Hub, a thorough data curation process is carried out to ensure the accuracy of molecular presentations.

The full curation and creation process is documented in 01_polaris_adme-fang-1_data_curation.ipynb.

Disclaimer

Here are some additional details that may be of use when deciding whether or not to use these datasets.

Some advantages include:

  • The assays were carried out by one group under a consistent set of conditions.
  • The dataset contains only a small number of molecules with unspecified stereocenters.
  • There are no duplicate structures in the dataset.
  • The data is based on a well-defined ADME endpoint

Some limitations to consider:

  • The size of the PPB datasets is small, making it challenging to determine a statistically significant difference between methods on these sets