🧬 BEND Zero-shot prediction of disease variants from ClinVar
Predicting variant effects is a binary problem, where single-bp mutations are classified as either having an effect or not. Each variant is a genomic position with a mutation x∈A,C,G,T and a label y∈0,1 indicating whether it is pathogenic or benign. The adjacent 512 bp serve as context.
As this is a zero-shot task, we used the cosine distance in embedding space between a variant nucleotide and its reference nucleotide as the prediction score in BEND.