Guidelines for Method ComparisonRead the first pre-print from the Small Molecule Steering Committee

This dataset has been certified! Learn why this matters here.

Dataset

polaris/drewry2017-pkis2-subset-v2

Kinases are essential drug targets due to their roles in cellular signaling and disease involvement, such as in cancer and inflammation. The PKIS2 dataset, with 645 kinase inhibitors from GSK, Takeda, and Pfizer, provides diverse chemotypes. The assays were carried out by one group under a consistent set of conditions. The data is based on well-defined biochemical endpoint. This dataset is a subset of PKIS2 dataset focuses on kinases EGFR, KIT, RET, LOK, and SLK, with potency assessed by % inhibition.

Created on: July 10, 2024Dataset size: 54 KBNumber of datapoints: 640
Public

Status

Certified

This artifact has been reviewed in line with our Dataset 101 guidelines and was found to meet all criteria.

Learn more here.

Tags

Kinase
HitDiscovery
Selectivity

Modalities

MOLECULE

Details

README

Backgroud:

Kinases play a crucial role in cellular signalling, making them important targets for drug development. Dysregulation of kinases is frequently implicated in diseases like cancer, inflammation, and neurodegenerative disorders. Therefore, targeting kinases with specific drugs has emerged as a crucial strategy in modern drug discovery. Kinase-related task includes inhibition prediction, selectivity prediction, or kinase-ligand binding affinity prediction. In the early release version of Polaris, benchmarks were established for kinases such as EGFR, KIT, and RET, along with their respective mutations, as well as for LOK and SLK.

An example of Kinase screening (image from here): kinase

Description of readout

  • Readouts: EGFR, KIT, RET, LOK, SLK
  • Bioassay readout: Percentage of inhibition (%).
  • Optimization objective: Higher potency (higher %inhibition).

Data resource:

PKIS2: A second chemogenomics set of kinase inhibitors from GSK, Takeda, and Pfizer was assembled as PKIS2. This set contained 645 inhibitors and included many additional chemotypes that were not represented in the original set.

Reference: https://www.ncbi.nlm.nih.gov/pubmed/28767711

Data curation

To maintain consistency with other benchmarks in the Polaris Hub, a thorough data curation process is carried out to ensure the accuracy of molecular presentations.

The full curation and creation process is documented here.

Disclaimer

Here are some additional details that may be of use when deciding whether or not to use this dataset.

Some advantages include:

  • The assays were carried out by one group under a consistent set of conditions.
  • The dataset contains only a small number of molecules with unspecified stereocenters.
  • There are no duplicate structures in the dataset.
  • The data is based on well-defined biochemical endpoint.

Some limitations to consider:

  • The assay endpoint is % inhibition, which is less desirable than a dose-response but similar to what is commonly encountered with HTS data.
  • The dataset is relatively small, containing only 640 compounds. This, combined with the fact that the data is highly clustered, will make it difficult to see statistically significant differences between methods. This will be highly acute when the splits are based on clusters or scaffolds.
  • The compounds are highly clustered with the largest cluster containing 50 compounds. The compounds are highly clustered with the largest cluster containing 50 compounds.

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.

AttributeValue
year2017