The amino acid sequence of a protein determines its three-dimensional structure, or 'fold'. Conversely, the three-dimensional structure is compatible with a large, but limited set of amino acid sequences. Enumerating the allowed sequences for a given fold is known as the 'inverse protein folding problem'. We are working to solve this problem for a large number of known protein folds (a representative subset: about 1500 folds). The most expensive step is to build a database of energy functions that describe all these structures. For each structure, we consider all possible sequences of amino acids. Surprisingly, this is computationally tractable, because our energy functions are sums over pairs of interactions. Once this is done, we can explore the space of amino acid sequences in a fast and efficient way, and retain the most favorable sequences. This large-scale mapping of protein sequence space will have applications for predicting protein structure and function, for understanding protein evolution, and for designing new proteins. By joining the project, you will help to build the database of energy functions and advance an important area of science with potential biomedical applications.
For more information, goto this link: Project Overview