Background
KIT (Proto-oncogene c-KIT) receptor plays a crucial role in regulating cell growth, differentiation, and survival. It's particularly important in the development of blood cells, melanocytes (the cells that produce melanin, the pigment responsible for skin, hair, and eye color), and certain cells in the gut. Mutations in the KIT gene can lead to uncontrolled cell growth and contribute to the development of certain types of cancer, including gastrointestinal stromal tumors (GISTs) and some types of leukemia. KIT proto-oncogene, receptor tyrosine kinase, is a transmembrane receptor tyrosine kinase (PMID: 32214210) that binds the stem cell factor (SCF) ligand to activate PI3K, JAK/STAT, and MAPK pathways to promote cell survival and proliferation (PMID: 23181448, PMID: 29704617). Activating Kit mutations are driver mutations in a variety of cancers, particularly in gastrointestinal stromal tumors (PMID: 23127174, PMID: 29704617, PMID: 32091431), acute myeloid leukemia (PMID: 32008291), melanomas (PMID: 30707374, PMID: 32608199), and seminomas (PMID: 29704617).
Benchmarking
KIT wild type: In certain cancers, KIT signaling can be activated by other receptors or mutations upstream in the signaling pathway. Targeting these upstream factors can indirectly impact KIT signaling and downstream effects. An example of this is seen in some cases of acute myeloid leukemia (AML) where KIT is expressed without mutations, but other upstream mutations can lead to aberrant KIT activation.
The goal of this benchmark is to select the best predictive model for
- Optimization of the bioactivity % inhibition for KIT wile type.
- Discovery of potential hits in new chemical space.
Description of readout
- Readouts:
CLASS_KIT
- Bioassay readout: percentage of inhibition.
- Optimization objective: Higher inhibition
- Number of data points: train: 524 test: 116
- Thresholds: > 80
- Optimization objective: postive label (1)
Data resource:
Train/test split
Given the benchmarking goal, a scaffold-based splitting approach was applied to ensure training and test sets contain distinct chemical structures while maintaining the diversity of scaffolds.
Distribution of the train/test in the chemical space
Data resource:
PKIS2: A second chemogenomics set of kinase inhibitors from GSK, Takeda, and Pfizer was assembled as PKIS2. This set contained 645 inhibitors and included many additional chemotypes that were not represented in the PKIS1 set.
Related links
The full curation and creation process is documented -> notebook
Related benchmarks
- polaris/drewry_kit_wildtype_singletask_clf_v1
- polaris/drewry_kit_wt_t670i_v560g_multitask_clf_v1
- polaris/drewry_kit_wt_t670i_v560g_multitask_reg_v1
Note: It's recommanded to evaluate your methods agaisnt all the benchmarks related to this dataset.