Protein Design CompetitionRound 2 with Adaptyv Bio and Dimension. Win and present at NeurIPS!

This dataset has not yet been certified by approved reviewers. It may contain issues related to data completeness and quality.

Dataset

graphium/pm6-subset-v1

Subset of quantum chemistry dataset which uses PM6 semi-empirical computation of the quantum properties.

Created on: July 21, 2024Dataset size: 2 GBNumber of datapoints: ~8.5M
Public

Tags

Graph
Quantum chemistry
LargeMix

Modalities

No modalities found.

Related benchmarks

No related benchmarks yet.

Details

README

Background

PM6_83M dataset is similar to the PCQM4M and comes from the same PubChemQC project. However, it uses the PM6 semi-empirical computation of the quantum properties, which is orders of magnitude faster than DFT computation at the expense of less accuracy.

This dataset covers 83M unique molecules, 62 graph-level tasks, and 7 node-level tasks. To our knowledge, this is the largest dataset available for training 2D-GNNs regarding the number of unique molecules. The various tasks come from four different molecular states, namely S0 for the ground state, T0 for the lowest energy triplet excited state, cation for the positively charged state, and anion for the negatively charged state. In total, there are 221M PM6 computations.

Here , pm6-subset-v1 is a subset of PM6_83M dataset which includes ~8.5M molecules and their quantum properties.

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.

AttributeValue
year2019