Launch ShowcaseWin a Polaris merch box when you upload a dataset or benchmark! Learn more.



A DMPK datasets of six ADME in vitro endpoints from fang et al. 2023.

Created on: December 08, 2023Dataset size: 401 KBNumber of datapoints: 3,516

Explore the dataset columns


Modality: MoleculeDescription: Molecule SMILES string after cleaning and standardization.Type: object





The goal of accessing ADME properties is to understand how a potential drug candidate interacts with the human body, including absorption, distribution, metabolism, and excretion. This knowledge is crucial for evaluating efficacy, safety, and clinical potential, guiding drug development for optimal therapeutic outcomes. Fang et al. 2023 has disclosed DMPK datasets collected over 20 months across six ADME in vitro endpoints, which are human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. The dataset contains 885 to 3087 measures for the corresponding endpoints. The compounds show the chemical diversity across all ranges of the endpoints which are microsomal stability, plasma protein binding, permeability, and solubility.

Description of readout

  • Microsomal stability (human and rat): LOG HLM_CLint (mL/min/kg), LOG RLM_CLint (mL/min/kg)
  • Plasma protein binding (human and rat): LOG PLASMA PROTEIN BINDING (HUMAN) (% unbound), LOG PLASMA PROTEIN BINDING (RAT) (% unbound)
  • Permeability: LOG MDR1-MDCK ER (B-A/A-B)
  • Solubility: LOG SOLUBILITY PH 6.8 (ug/mL)
  • Number of molecules after curation: 3516

Data resource

Reference: Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective


Raw data:

Data curation

To maintain consistency with other benchmarks in the Polaris Hub, a thorough data curation process is carried out to ensure the accuracy of molecular presentations.

The full curation and creation process is documented here.

Related benchmarks

Note: It's recommanded to evaluate your methods agaisnt all the benchmarks related to this dataset.