Structure Prediction      Fragment Libraries      Alanine Scanning
[ Queue ] [ Submit ]      [ Queue ] [ Submit ]      [ Queue ] [ Submit ]
[ Register / Update ] [ Docs / FAQs ] [ News ] [ Software ] [ Login ]



     

Frequently Asked Questions


  • What is Robetta?
  • What are Fragment Libraries?
  • What is Interface Alanine Scanning?
  • What is Ginzu?
  • How do I submit a sequence to the structure prediction server?
  • How do I submit a sequence to the fragment server?
  • How do I submit a sequence to the interface alanine scanning server?
  • How do I remove my job from the structure prediction server?
  • What's a mirror?
  • How long does a job take to process?
  • Why am I only allowed one job at a time?
  • How much time do I have before my job is removed and my data is lost?
  • Can I run Rosetta myself?
  • Why can't commercial users submit jobs to Robetta?
  • Why are all my results visible to everyone?
  • How do I configure my browser to view structures with RasMol?
  • Why do I get more than one model?
  • What does the domain confidence mean?
  • How do I interpret the K*Sync detailed alignment?
  • Are there any issues with long sequences?
  • Are there any issues with short sequences?
  • What is the difference between Ab Initio and De Novo Modeling?
  • What is the difference between Comparative and Homology Modeling?


  • Q. What is Robetta?  

      Robetta is a full-chain protein structure prediction server. It parses protein chains into putative domains with the Ginzu protocol, and models those domains either by homology modeling or by ab initio modeling. Other services of the server, in addition to Domain Parsing and 3-D Modeling, include Fragment Library generation and Interface Alanine Scanning.


    Q. What are Fragment Libraries?  

      Fragment Libraries are the pieces of experimentally determined structures that Rosetta uses to guide the search of conformational space when predicting structures using the ab initio protocol, as well as longer loop conformations in homology models.


    Q. What is Interface Alanine Scanning?  

      Interface Alanine Scanning attempts to estimate the energetic contribution to the binding free energy provided by each residue at a protein-protein interface. Briefly, interface alanine scanning uses a simple physical model to score a series of protein-protein interfaces in which contact residues are individually replaced with alanine. After each computational alanine mutation, the resulting binding energy is calculated.

      The input consists of a three-dimensional structure of a protein-protein complex; output is a list of "hot spots," or amino acid side chains that are predicted to significantly destabilize the interface when mutated to alanine, analogous to the results of experimental alanine-scanning mutagenesis. 79% of hot spots and 68% of neutral residues were correctly predicted in a test of 233 mutations in 19 protein-protein complexes. A single interface can be analyzed in minutes. The computational methodology has been validated by the successful design of protein interfaces with new specificity and activity and has yielded new insights into the mechanisms of receptor specificity and promiscuity in biological systems.


    Q. What is Ginzu?  

      Ginzu is a protocol that attempts to determine the regions of a protein chain that will fold into globular units, called "domains". It scans the protein chain sequence with successively less confident methods of detection to determine any homologs with experimentally determined structures, starting with PDB-BLAST, and followed by the more remote fold-detection method HHSEARCH. After any homologs are identified, a search of remaining regions is done with HMMER against the Pfam-A protein family database. Lastly, the PSI-BLAST multiple sequence alignment is used to assign regions of increased likelihood of possessing a contiguous domain based on sequence clusters. The final step consists of selecting cut-points between the domains (and possibly defining new domains based on the strongest cutpoints for any remaining long stretches of the sequence that have not already matched a homolog with a structure or Pfam-A) using the PSI-BLAST MSA.


    Q. How do I submit a sequence to the structure prediction server?  

      To submit a sequence to the structure server, you must be a registered user. To register, click here. If you are a registered user, go to the Structure Server submission form and do the following:

      1. Select a prediction type.
      2. Enter your username or registered email address.
      3. Enter a target name for your sequence.
      4. Paste your fasta sequence in the text area or upload your fasta file.
      5. Fill out optional fields if desired.
      6. Click the Submit button.


    Q. How do I submit a sequence to the fragment server?  

      To submit a sequence to the fragment server, you must be a registered user. To register, click here. If you are a registered user, go to the Fragment Server submission form and do the following:

      1. Enter your username or registered email address.
      2. Enter a target name for your sequence.
      3. Paste your fasta sequence in the text area or upload your fasta file.
      4. Complete optional fields if desired. If you are uploading constraints data, be sure the formats are correct to prevent errors and wasted processor time.
      5. Click the Submit button.


    Q. How do I submit a sequence to the interface alanine scanning server?  

      To submit a sequence to the interface alanine scanning server, you must be a registered user. To register, click here. If you are a registered user, go to the Alanine Scanning Server submission form and do the following:

      1. Enter your username or registered email address.
      2. Enter a job name for your sequence.
      3. Upload your protein complex file (must be PDB format).
      4. Define the interface for alanine scanning by entering the chain ID's (as in the complex; case sensitive) involved in the interface and the interface partners to which they belong.
      5. Optional: If you want results for specific interface side-chains, you can upload a Mutations List. Click here for file format. If there is a format error, all interface side-chains will be considered.
      6. Click the Submit button.


    Q. How do I remove my job from the structure prediction server?  

      To remove one or more of your jobs from the structure prediction server, follow these instructions:

      1. Login with your username and password.
      2. Go to the structure prediction queue table.
      3. Select one or more checkboxes (far left column under "x") for jobs that you own and would like to remove.
      4. Click the "Update Job(s)" button on the bottom of the page.

      All data will be lost, so be sure you retrieve your data before doing this.

      Jobs may be removed one week after they complete to conserve disk space.


    Q. What is a mirror?  

      A Robetta mirror is a cluster of computers that have been hooked into the Robetta system to perform the actual work of processing the targets. There are currently two mirrors, generously provided by Charlie Strauss of the Los Alamos National Laboratory and Richard Bonneau of the Institute for Systems Biology.


    Q. How long does a job take to process?  

      Both Rosetta ab initio and comparative modeling must generate ensembles of decoys before selecting models from amongst the ensemble. This process takes a considerable amount of computational resources, and therefore one can expect that, once your job has finished the Ginzu step and entered the ensemble generation, each domain will take a few hours (e.g. a 150 residue domain will take about 4 hours). Other long waits of up to 12 hours may occur during the initial Ginzu step.


    Q. Why am I only allowed one job at a time?  

      Regrettably, due to the length of time it takes (on average) for jobs to complete, public users of the Robetta server are limited to one job at a time. We hope that this will prevent exceptionally long queues from occuring.


    Q. How much time do I have before my job is removed and my data is lost?  

      We may remove domain and stucture prediction jobs and all their data one week after the date of completion to free disk space. This will be done at our own descretion.


    Q. Can I run Rosetta myself?  

      Due to hardware resource limitations, we suggest that if you wish to run a large number of predictions and have ample hardware of your own, that you obtain the Rosetta suite of programs themselves (which are freely available to academic users by license, and may be commercially licensed as well).

      For more information click here.


    Q. Why can't commercial users submit jobs to Robetta?  

      In an effort to minimize overuse, we only permit non-commercial use of the Robetta server. Additionally, Robetta uses methods can only be provided free-of-charge to academic users. If you are a commercial user, and would like to use Rosetta, we suggest you pursue licensing Rosetta for use on your own hardware (see above).


    Q. Why are all my results visible to everyone?  

      As a public resource, we feel that the free-flow of information is important to maintain. Additionally, we cannot take responsibility for providing the security necessary to protect users' results. Lastly, there may be more than one person who is interested in a particular target, and we want to avoid duplicate jobs.


    Q. How do I configure my browser to view structures with RasMol?  

      Rasmol is a popular molecular graphics software package. If you do not have Rasmol, click here.

      In theory, you should be able to configure a web browser to use Rasmol as a helper application to view PDB structures through the internet by setting the appropriate MIME type. However, it is difficult, if not impossible, to set up Internet Explorer to do this. Therefore, we recommend using a Netscape browser and the following instructions.

      To configure your browser to view PDB structures    , do the following:

      1. Go to your browser's Applications Preferences.
      2. Add a new file type configured as:
        • Description: PDB files
        • MIME Type: chemical/x-pdb
        • Suffixes: .pdb,.ent
        • Application: xterm -e rasmol -pdb %s

      To configure your browser to view Rasmol scripts    , do the following:

      1. Go to your browser's Applications Preferences.
      2. Add a new file type configured as:
        • Description: RasMol scripts
        • MIME Type: application/x-rasmol
        • Suffixes: (leave blank)
        • Application: xterm -e rasmol -script %s

      In the examples above, we use 'xterm -e rasmol' as the helper application command that launches Rasmol in an xterm terminal window (Unix/Linux platforms). You may substitute this with 'rasmac' or 'raswin.exe' depending on what platform you are using.


    Q. Why do I get more than one model?  

      There are several models that the Robetta server produces when you submit a sequence. If Robetta determines that there is more than one domain (or if an ab initio portion of the sequence is too big to be modeled by the ab initio protocol), Robetta breaks up the query into putative domains and models each of them separately. After doing so, it assembles the models into one contiguous chain. This means you can examine your models either as a complete chain, or by clicking on the domain number in the Ginzu domain info box, you can examine the results for each individual domain.

      Within each domain, there are several models. In the case of ab initio predictions, the models are the cluster centers of the most populated clusters, with the exception of the last model, which is the lowest energy decoy that was not a member of the previously represented clusters. In the case of homology modeling predictions, the second model is the model produced by the default K*Sync alignment, with the first, third, fourth, and fifth selected from the decoy ensemble by various energy discrimination methods. In the case of twilight-zone reliability parent detections, the first 5 models are homology modeled, and models six-ten modeled using the de novo protocol.


    Q. What does the domain prediction confidence mean?  

      The domain prediction confidence has different meanings depending on the method used to detect the region. Those regions with a detected parent PDB structure are meant to have a confidence function that follows a similar trend. The confidence value is derived based on the detection method in the following way:

      PDB-BLASThomologyconf = -log(e-val)e.g. e=.001 -> conf=3.0 (strong detection threshold)
      HHSEARCHhomologyconf = hhsearch_prob/42.5e.g. prob=85.0 -> conf=2.0 (strong detection threshold)
      Pfamde novoconf = -log(e-val)e.g. e=.001 -> conf=3.0 (strong detection threshold)
      msade novoconf = block_depth +
          .001*block_occ +
          .000001*e_val_pref +
          .000000001*block_len
      note: dominated by nr50 block depth
      cutprefde novoconf = 0note: domain boundaries solely determined by sequence transitions, strongly predicted loop, occupancy, and distance from nearest block or terminus

      The general trend with the homology modeling detections allows one to discriminate likely correct parents from improbable ones. If the confidence for a parent PDB is >= 3.0, then it's almost certainly the right fold, and the model itself probably does a good job capturing the features of the structure. Between 2.0 and 3.0, it's usually the right fold, but the model quality is likely to be reduced. Between 1.0 and 2.0, the fold is still right more than half the time, but even so the models produced are often not as good as they could be in cases where the fold is correct, due to the difficulty that homology modeling faces at greater distance. Therefore, in this extreme twilight-zone regime, Robetta also provides de novo models for such domains.


    Q. How do I interpret the K*Sync detailed alignment?  

      The K*Sync detailed color alignment view indicates more than just which residues of the query align with which residues of the parent. The structural information that goes into the K*Sync alignments is also shown. Information for the query is shown on top, and that for the parent on the bottom. Aligned stretches are shown by blue bars between the query and the parent sequence. Identical residues are optionally shown in black, or alternatively given the color for aligned residues that are similar. The color scheme for residue classes is:

      light orangehydrophobic(A,V,L,I,M,F,W,P)
      dark orangering containing(F,Y,W,H)
      pinkhydrophilic(K,R,D,E,H,Q,N,S,T)
      bluebasic(K,R)
      redacidic(D,E)
      cyanturn and small(P,G,A,S)
      yellowsulfur and small oxygen(C,M,S,T)

      Additionally shown are the structural information terms used by K*Sync. They include:

      • ss_pred and ss_conf: the 3-class secondary structure predicted by the PSI-PRED program. The confidence of the prediction for that position is indicated by the height of the green bar. The secondary structure classes are helix=H, strand/sheet/extended=E, coil/loop=L.
      • ss_dssp: is the DSSP secondary structure read from the parent structure by DSSP and collapsed to 3-classes (where {H,G} -> H and {E,B} -> E, and everything else is L). Aligned positions that share the same secondary structure classification are colored with magenta for H, yellow for E, and cyan for L.
      • obl_msa and obl_str: highly occupied positions in a multiple sequence or multiple structural alignment are more likely obligate to the fold. The height of the orange bar in the obl_msa indicates the degree of this occupancy for the query and the parent PSI-BLAST multiple sequence alignments, and the obl_str the occupancy of the parent in a StrAD-Stack multiple structural alignment of the parent structure with other non-redundant experimental structures possessing the same fold.


    Q. Are there any issues with long sequences?  

      The Rosetta folding program itself is a FORTRAN program that has to be careful with its memory usage. Extremely long sequences cannot be fit into memory, so any domain level models are limited to no more than 250 residues for the de novo protocol, and 600 residues for the comparative modeling protocol.

      The de novo protocol suffers, as do all such methods, from a limitation in the ability to sample conformations available to the protein. Larger targets that are high contact-order are more difficult to sample in a reasonable amount of computer time, and probably require much larger decoy ensembles than the Robetta server can afford to generate. Therefore, we impose a de novo domain size limit of about 200 residues, which is clearly often incorrect, but necessary. It is hoped that in such cases, features of the target are still captured by the models.

      There is additionally a limit on the length of the full chain of about 1000 residues, so that the independently modeled domains may be assembled into a contiguous chain.


    Q. Are there any issues with short sequences?  

      The Fragment Library used by the Rosetta folding program is generated by using residue substitution profiles with PSI-BLAST to find similar fragment profiles. Very short sequences (less than about 40 residues) are difficult to detect sequence homologs for with reliable confidence with PSI-BLAST, and therefore the fragments library may suffer.

      Additionally, the Rosetta de novo protocol folds two divergent homologous sequences in addition to the query sequence. If PSI-BLAST cannot find any sequence homologs with confidence due to a query being exceptionally short, Rosetta cannot take advantage of multiple homologs, and the modeling may suffer.

      Perhaps most importantly in de novo modeling, the assumption is made that the target protein forms a soluble domain with a hydrophobic core. Short sequences often do not fold up in this fashion, so the energy function used in Rosetta may incorrectly bias the structures to be more compact than they should be for short targets.


    Q. What is the difference between Ab Initio and De Novo Modeling?  

      Ab initio structure prediction classically refers to structure prediction using nothing more than first-principles (i.e. physics). De Novo is a more general term that refers to the greater category of methods that do not use templates from homologous PDB structures. Since Rosetta uses fragments from existing PDB structures in order to guide the search in conjunction with energy functions, there is a semantic argument as to whether it is truly "ab initio" (although the same could be said for any statistically derived energy function). Long story short: call it what you want, but be prepared for a debate!


    Q. What is the difference between Comparative and Homology Modeling?  

      Comparative vs. Homology Modeling: same thing, namely model a significant fraction of your target using coordinates from a homologous parent PDB structure.
















    Robetta is available for NON-COMMERCIAL USE ONLY at this time
    [ Terms of Service ]
    Copyright © 2004-2007 University of Washington