Rosetta@home is a distributed computing project for protein structure prediction on the Berkeley Open Infrastructure for Network Computing (BOINC) platform, run by the Baker laboratory at the University of Washington. Rosetta@home also aims to predict protein-protein docking and design new proteins with the help of over 86,000 volunteered computers processing over 68 teraFLOPS on average as of September 7, 2008. Though much of the project is oriented towards basic research on improving the accuracy and robustness of the proteomics methods, Rosetta@home also does applied research on malaria, Alzheimer's disease and other pathologies.
Like all BOINC projects, Rosetta@home uses idle computer processing resources from volunteers' computers to perform calculations on individual workunits. Completed results are sent to a central project server where they are validated and assimilated into project databases. The project is cross-platform, and runs on a wide variety of hardware configurations. Users can view the progress of their individual protein structure prediction on the Rosetta@home screensaver.
In addition to disease-related research, the Rosetta@home network serves as a testing framework for new methods in structural bioinformatics. These new methods are then used in other Rosetta-based applications, like RosettaDock and the Human Proteome Folding Project, after being sufficiently developed and proven stable on Rosetta@home's large and diverse collection of volunteer computers. Two particularly important tests for the new methods developed in Rosetta@home are the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and Critical Assessment of Prediction of Interactions (CAPRI) experiments, biannual experiments which evaluate the state of the art in protein structure prediction and protein-protein docking prediction, respectively. Rosetta@home consistently ranks among the foremost docking predictors, and is one of the best tertiary structure predictors available.
A primary feature of the Rosetta@home graphical user interface (GUI) is a screensaver which shows a current workunit's progress during the simulated protein folding process. In the upper-left of the current screensaver, the target protein is shown adopting different shapes (conformations) in its search for the lowest energy structure. Depicted immediately to the right is the structure of the most recently accepted. On the upper right the lowest energy conformation of the current decoy is shown; below that is the true, or native, structure of the protein if it has already been determined. Three graphs are included in the screensaver. Towards the middle of it a graph for the accept model's free energy is displayed, which fluctuates as the accepted model changes. A graph of the accepted model's root mean square deviation (RMSD), which measures how structurally similar the accepted model is to the native model, is shown far right. On the right of the accepted energy graph and below the RMSD graph, the results from these two functions are used to produce an energy vs. RMSD plot as the model is progressively refined.
Like all BOINC projects, Rosetta@home runs in the background of the user's computer using idle computer power, either at or before logging in to an account on the host operating system. Rosetta@home frees resources from the CPU as they are required by other applications so that normal computer usage is unaffected. To minimize power consumption or heat production from a computer running at sustained capacity, the maximum percentage of CPU resources that Rosetta@home is allowed to use can be specified through a user's account preferences. The times of day during which Rosetta@home is allowed to do work can also be adjusted, along with many other preferences, through a user's account settings.
Rosetta, the software run on the Rosetta@home network, was rewritten in C++ to allow easier development than that offered by its original version, which was written in Fortran. This new version is object-oriented, and was released on February 8, 2008. Development of the Rosetta code is done by Rosetta Commons. The software is freely licensed to the academic community and available to pharmaceutical companies for a fee.
With the proliferation of genome sequencing projects, scientists can infer the amino acid sequence, or primary structure, of many proteins that carry out functions within the cell. To better understand a protein's function and aid in rational drug design, scientists need to know the protein's 3-dimensional, tertiary structure.
Protein 3D structures are currently determined experimentally through X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. The process is slow (it can take weeks or even months to figure out how to crystallize a protein for the first time) and comes at high cost (around $100,000 USD per protein). Unfortunately, the rate at which new sequences are discovered far exceeds the rate of structure determination – out of more than 6,600,000 protein sequences available in the NCBI non-redundant (nr) protein database, less than 48,000 proteins' 3D structures have been solved and deposited in the Protein Data Bank, the main repository for structural information on proteins. One of the main goals of Rosetta@home is to predict protein structures with the same accuracy as existing methods, but in a way that requires significantly less time and money. Rosetta@home also develops methods to determine the structure and docking of membrane proteins (e.g., GPCRs), which are exceptionally difficult to analyze with traditional techniques like X-ray crystallography and NMR spectroscopy, yet represent the majority of targets for modern drugs.
Progress in protein structure prediction is evaluated in the biannual Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, in which researchers from around the world attempt to derive a protein's structure from the protein's amino acid sequence. High scoring groups in this sometimes competitive experiment are considered the de facto standard-bearers for what is the state of the art in protein structure prediction. Rosetta, the program on which Rosetta@home is based, has been used since CASP 5 in 2002. In the 2004 CASP 6 experiment, Rosetta made history by being the first to produce a close to atomic-level resolution, ab initio protein structure prediction in its submitted model for CASP target T0281. Ab initio modeling is considered an especially difficult category of protein structure prediction, as it does not use information from structural homology and must rely on information from sequence homology and modeling physical interactions within the protein. Rosetta@home has been used in CASP since 2006, where it was among the top predictors in every category of structure prediction in CASP 7. These high quality predictions were enabled by the computing power made available by Rosetta@home volunteers. Increasing computational power allows Rosetta@home to sample more regions of conformation space (the possible shapes a protein can assume), which, according to Levinthal's paradox, is predicted to increase exponentially with protein length.
Rosetta@home is also used in protein docking prediction, which determines the structure of multiple complexed proteins, or quaternary structure. This type of protein interaction affects many cellular functions, including antigen-antibody and enzyme-inhibitor binding and cellular import and export. Determining these interactions is critical for drug design. Rosetta is used in the Critical Assessment of Prediction of Interactions (CAPRI) experiment, which evaluates the state of the protein docking field similar to how CASP gauges progress in protein structure prediction. The computing power made available by Rosetta@home's project volunteers has been cited as a major factor in Rosetta's performance in CAPRI, where its docking predictions have been among the most accurate and complete.
In early 2008, Rosetta was used to computationally design a protein with a function never before observed in nature. This was inspired in part by the retraction of a high-profile paper from 2004 which originally described the computational design of a protein with improved enzymatic activity compared to its natural form. The 2008 research paper from David Baker's group describing how the protein was made, which cited Rosetta@home for the computational resources it made available, represented an important proof of concept for this protein design method. This type of protein design could have future applications in drug discovery, green chemistry, and bioremediation.
In addition to basic research in predicting protein structure, docking and design, Rosetta@home is also used in immediate disease-related research. Numerous minor research projects are described in David Baker's Rosetta@home journal.
RosettaDock was also used to model docking between an antibody (immunoglobulin G) and a surface protein expressed by herpes simplex virus 1 (HSV-1) which serves to degrade the antiviral antibody. The protein complex predicted by RosettaDock closely agreed with the particularly difficult-to-obtain experimental models, leading researchers to conclude that the docking method has potential in addressing some of the problems that X-ray crystallography has with modeling protein-protein interfaces.
RosettaDesign, a computational approach to protein design based on Rosetta, began in 2000 with a study in redesigning the folding pathway of protein G. In 2002 RosettaDesign was used to design TOP7, a 93-amino acid long α/β protein that had an overall fold never before recorded in nature. This new conformation was predicted by Rosetta to within 1.2 Å RMSD of the structure determined by X-ray crystallography, representing an unusually accurate structure prediction. Rosetta and RosettaDesign earned widespread recognition by being the first to design and accurately predict the structure of a novel protein of such length, as reflected by the 2002 paper describing the dual approach prompting two positive letters in the journal Science, and being cited by more than 240 other scientific articles. The visible product of that research, TOP7, was featured as the Protein Data Bank's 'Molecule of the Month' in October 2006; a superposition of the respective cores (residues 60-79) of its predicted and X-ray crystal structures are also featured in the Rosetta@home logo.
Brian Kuhlman, who obtained his PhD under David Baker and now researches protein design with Rosetta in his own laboratory at the University of North Carolina, Chapel Hill, offers RosettaDesign as an online service.
Development of RosettaDock diverged into two branches for subsequent CAPRI rounds as Jeffrey Gray, who laid the groundwork for RosettaDock while at the University of Washington, continued working on the method in his new position at John Hopkins University. Members of the Baker laboratory further developed RosettaDock in Gray's absence. The two versions differed slightly in side-chain modeling, decoy selection and other areas. Despite these differences, both the Baker and Gray methods performed well in the second CAPRI assessment, placing fifth and seventh respectively out of 30 predictor groups. Jeffrey Gray's RosettaDock server is available as a free docking prediction service for non-commercial use.
In October 2006, RosettaDock was integrated into Rosetta@home. The method used a fast, crude docking model phase using only the protein backbone. This was followed by a slow full-atom refinement phase in which the orientation of the two interacting proteins relative to each other, and side-chain interactions at the protein-protein interface, were simultaneously optimized to find the lowest energy conformation. The vastly increased computational power afforded by the Rosetta@home network, in combination with revised "fold-tree" representations for backbone flexibility and loop modeling, made RosettaDock sixth out of 63 prediction groups in the third CAPRI assessment.
In modeling protein structure as of CASP 6, Robetta first searches for structural homologs using BLAST, PSI-BLAST, and 3D-Jury, then parses the target sequence into its individual domains, or independently folding units of proteins, by matching the sequence to structural families in the Pfam database. Domains with structural homologs then follow a "template-based model" (i.e., homology modeling) protocol. Here, the Baker laboratory's in-house alignment program, K*sync, produces a group of sequence homologs, and each of these is modeled by the Rosetta de novo method to produce a decoy (possible structure). The final structure prediction is selected by taking the lowest energy model as determined by a low-resolution Rosetta energy function. For domains that have no detected structural homologs, a de novo protocol is followed in which the lowest energy model from a set of generated decoys is selected as the final prediction. These domain predictions are then connected together to investigate inter-domain, tertiary-level interactions within the protein. Finally, side-chain contributions are modeled using a protocol for Monte Carlo conformational search.
In CASP 8, Robetta was augmented to use Rosetta's high resolution all-atom refinement method, the absence of which was cited as the main cause for Robetta being less accurate than the Rosetta@home network in CASP 7.
Other protein related distributed computing projects on BOINC include QMC@home, Docking@home, POEM@home, SIMAP, and TANPAKU. RALPH@home, the Rosetta@home alpha project which tests new application versions, work units, and updates before they move on to Rosetta@home, runs on BOINC as well.
Users are granted BOINC credits as a measure of their contribution. The credit granted for each workunit is the number of decoys produced for that workunit multiplied by the average claimed credit for the decoys submitted by all computer hosts for that workunit. This custom system was designed to address significant differences between credit granted to users with the standard BOINC client and an optimized BOINC client, and credit differences between users running Rosetta@home on Windows and Linux operating systems. The amount of credit granted per second of CPU work is lower for Rosetta@home than most other BOINC projects. Despite this disadvantage to BOINC users competing for rank, Rosetta@home is fifth out of over 40 BOINC projects in terms of total credit.
Rosetta@home users who predict protein structures submitted for the CASP experiment are acknowledged in scientific publications regarding their results. Users who predict the lowest energy structure for a given workunit are featured on the Rosetta@home homepage as 'Predictor of the Day', along with any team of which they are a member. A 'User of the Day' is chosen at random each day to be on the homepage as well from users who have made a Rosetta@home profile.
Online Rosetta services