Launch ShowcaseWin a Polaris merch box when you upload a dataset or benchmark! Learn more.



The ProteinGym protein fitness prediction benchmark. Includes ~2.8 million mutants, including both pathogenicity prediction for clinical variants and experimental property prediction for deep mutational scans.

Created on: March 13, 2024Dataset size: 126 MBNumber of datapoints: ~2.8M

Quick Links


Mutation Effect Prediction
Protein Fitness Prediction


No modalities found.

Related benchmarks

No related benchmarks yet.

Explore the dataset columns


Description: The triplet mutant describing the mutation applied to the wild type sequence ('N/A' for indel mutationsType: object




ProteinGym is an extensive set of Deep Mutational Scanning (DMS) assays and annotated human clinical variants curated to enable thorough comparisons of various mutation effect predictors in different regimes. Both the DMS assays and clinical variants are divided into 1) a substitution benchmark which currently consists of the experimental characterisation of ~2.7M missense variants across 217 DMS assays and 2,525 clinical proteins, and 2) an indel benchmark that includes ∼300k mutants across 74 DMS assays and 1,555 clinical proteins. This dataset contains all four components of the benchmark in one file. To separate the dataset into its separate benchmark, split it on the "benchmark" column. Each benchmark also contains multiple different experiments/wild-type proteins, whose variants we score and compute statistics for independently. These are denoted using the "experiment_id" column. For example, to score just one assay of the deep mutational scanning (DMS) substitutions benchmark, such as the A4_HUMAN assay by Seuma et al. (2022), you would take the rows where the benchmark column is "DMS_substitutions" and the experiment_id column is "A4_HUMAN_Seuma_2022".

Note that the "random_cv_split_index" and "continuous_label" columns only have values for deep mutational scans, not for clinical variants, as the variants are labeled as binary benign/pathogenic and we did not train supervised models on them in the original benchmark.

Additional information about all the assays and clinical variants in the benchmark, along with the details of how we compute statistics for all of them, are available in the publication here or at the github repository here.

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.