🧬 BEND Zero-shot prediction of expression variants (eQTL)
Predicting variant effects is a binary problem, where single-bp mutations are classified as either having an effect or not. Each variant is a genomic position with a mutation x∈A,C,G,T and a label y∈0,1 indicating whether it has an effect on gene expression (eQTL) or not (background variation). The adjacent 512 bp serve as context.
The data used was adapted from DeepSEA (Zhou & Troyanskaya, 2015)
As this is a zero-shot task, we used the cosine distance in embedding space between a variant nucleotide and its reference nucleotide as the prediction score in BEND.