Estimating protein flexibility via uncertainty quantification of structure prediction models

Published in Machine Learning for Structural Biology @ NeurIPS 2024, 2024

Deep learning architectures such as AlphaFold2, have effectively solved the protein structure prediction problem, however, they do not rigorously account for confor- mational variance in structures despite many proteins exhibiting flexible regions in which a single amino acid sequence may occupy a variety of conformations. In particular, using confidence metrics such as the pLDDT score, it is not readily possible to distinguish between regions of the protein structure where the prediction model is uncertain because the region is out-of-distribution or because the region is intrinsically flexible. Here, we use a novel approach to estimate protein flexibility via uncertainty quantification. Specifically, we reformulate the protein structure pre- diction problem as sampling a backbone function from a Gaussian process which enables us to cast flexibility estimation as aleatoric uncertainty quantification. We adapt the AlphaFold2 Structure Module architecture to produce such estimates of aleatoric uncertainty and compare these to existing proxies for conformational variance. We demonstrate the utility of our formalisation for approximating protein flexibility in a prediction framework, and our experiments demonstrate the promise of our method whilst emphasising the relationship between epistemic and aleatoric uncertainty in protein structure prediction.

Recommended citation: Quast & Sweeney et. al., (2024). "Estimating protein flexibility via uncertainty quantification of structure prediction models." Machine Learning for Structural Biology @ NeurIPS 2024. 1(1).
Download Paper