ORION help

Input sequence


Paste the input protein sequence for prediction. Input must be a protein sequence encoded with regular amino-acids. ORION web server accepts two sequence formats (FASTA format and raw plain text format).
For FASTA format, sequence should begin with a single line '>sequence_name [sequence description (optional)]' before the raw sequence.
Fasta format example :
>sequenceA protein sequence of human hemoglobin
LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNAL
SALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
	
For raw plain text format, input should not contain any descriptions or informations. Only the raw sequence must be provided.
Raw plain text format example :
LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNAL
SALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
	
Protein sequence in FASTA or raw plain-text format can be submitted as a text-file by uploading the file using button.

Sequence must be no longer than 1000 residues and more than 15 residues couting regular amino-acids.
If you have multiple domain sequence, it is suitable to submit each protein domain to ORION web server, with different jobs.

Databases

Select the database(s) of template profile against which you want to compare the query profile. Theses databases are a collection of templates profiles formatted in ORION format
ORION template profiles are composed of a sequence profile obtained from the template sequence (AA profile) and a structure profile (PB profile) obtained form the protein structure.

ORION is optimzed for globular protein search. For transmembrane protein, please use the PDB95 or PDB70 databases which contain more transmembrane proteins.

  • PDB95 (Last update: 10 March 2016)

  • This database is a collection of 54540 protein templates based on the protein data bank (PDB) which contains all publicly available 3D structures of proteins filtered with a maximum sequence identity of 95%.
    For each protein chain in the PDB we build a multiple alignment with iterated PSI-BLAST searches and transform these alignments into a sequence profile (AA profile).
    For each protein chain we build a PB profile using the 3D structure of the chain. Pseudocounts corrections described by [Henikoff S, Henikoff JG 1994; Henikoff S, Henikoff JG 1996] are added to AA and PB profiles to improve sensitivity detection.
    ORION compares the query profile to the database of template profile based on PDB chains and generates query-template alignments.
    H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne (2000) The Protein Data Bank Nucleic Acids Research, 28: 235-242.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.


  • PDB70 (Last update: 05 April 2016)

  • This database is a collection of 38131 protein templates based on the protein data bank (PDB) filtered with a maximum sequence identity of 70% containing.
    For each protein chain in the PDB we build a multiple alignment with iterated PSI-BLAST searches and transform these alignments into a sequence profile (AA profile).
    For each protein chain we build a PB profile using the 3D structure of the chain. Pseudocounts corrections described by [Henikoff S, Henikoff JG 1994; Henikoff S, Henikoff JG 1996] are added to AA and PB profiles to improve sensitivity detection.
    ORION compares the query profile to the database of template profile based on PDB chains and generates query-template alignments.
    Bourne PE and al. The distribution and query systems of the RCSB Protein Data Bank. NAR 2004, 32:D223.


  • scop95 (Last update: SCOPe v 2.06)
  • The scop95 template database is built from sequences in SCOP in the same way as PDB95/PDB70.
    SCOP is a protein domains database manually checked. The SCOP annotation is hierarchical with the higher levels (family and superfamily levels) describing near evolutionary relationships; the fold level describes distant relationships and/or structural similarities.
    Each template of scop database is annotated with a SCOP code. For example b.1.18.2 stands for domain class b (all beta), fold 1 (Immunoglobulin-like beta-sandwich), superfamily 18 (E set domains), family 2 (E-set domains of sugar-utilizing enzymes).
    ORION scop95 templates database has been built from a sequence set filtered with the ASTRAL server, e.g. scop95_1.75B is version 1.75B of SCOP filtered to 95% maximum sequence identity and contains 17970 templates domains.
    Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247:536–540.
    Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C. SCOP: a structural classification of proteins database. NAR 2000, 28:257.


  • scop70 (Last update: SCOPe v 2.06)
  • The scop70 template database is built from sequences in SCOP in the same way as PDB95/PDB70.
    SCOP is a protein domains database manually checked. The SCOP annotation is hierarchical with the higher levels (family and superfamily levels) describing near evolutionary relationships; the fold level describes distant relationships and/or structural similarities.
    Each template of scop database is annotated with a SCOP code. For example b.1.18.2 stands for domain class b (all beta), fold 1 (Immunoglobulin-like beta-sandwich), superfamily 18 (E set domains), family 2 (E-set domains of sugar-utilizing enzymes).
    ORION scop70 templates database has been built from a sequence set filtered with the ASTRAL server, e.g. scop70_1.75 is version 1.75 of SCOP filtered to 70% maximum sequence identity and contains 11368 templates domains.
    Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247:536–540.
    Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C. SCOP: a structural classification of proteins database. NAR 2000, 28:257

  • HOMSTRAD (Last update: 14 February 2015)
  • The HOMSTRAD template database is constructed from HOMSTRAD structural alignments. This database regroups protein families through structural alignments, with 14677 protein structures regrouped into 14329 families manually checked.
    AA profiles were generated from the multiple sequence alignment of family members for each family.
    The PB profiles were obtained by translating the HOMSTRAD structural alignments into PB alignments using the atomic coordinates of the protein structures.
    Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci Publ Protein Soc 1998, 7:2469–2471.


    Alignment mode

    Three alignment modes are supported (gloloc,local and global).
    ORION is optimized for gloloc mode, this variant of the Smith and Waterman's (1981) local/local and Needleman and Wunsch's (1970) global/global algorithm, locally aligns the query against the whole template.
    In local mode, ORION will align the query profile with the database profiles locally with no begin/end gap penalties.
    In global mode ORION will align the query profile against the database profiles i.e. the query and template profiles are globally aligned.
    Gloloc and local mode are most suitable for a sensitive search. For a large query protein or a multi-domain query use local mode.


    Maximum number of hits displayed

    This parameter controls the number of hits that will be displayed in the page results.
    The number of hits displayed is limited to 500 to prevent heavy HTML pages.


    MODELLER

    The MODELLER license requires you to enter a MODELLER-key to use MODELLER.
    To obtain the MODELLER license key please fill out the License Agreement at : http://salilab.org/modeller/registration.shtml.
    This key is freely and easily available for academic users and non-commercial users.
    A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.


    ORION results

  • Hits results
  • ORION display results in eight sortable columns.
    - First column is the number of the hit.
    - Second column contains the hit name and description.
    - Third column contains the raw score produced by ORION (ranked).
    - Fourth column contains the hits length.
    - Fifth column contains the position of the start and the position of the end of the aligned part of the query.
    - Sixth column contains the position of the start and the position of the end of the aligned part of the template.
    - Seventh column contains the percentage of the query aligned.
    - Eighth column contains the percentage of the identity between the query sequence and the template sequence.

    Text results contains four more columns
    - An Ungaped_score score which corresponds to ORION score without the gap penalites.
    - A Pvalue_Q which is the Pvalue considering the query and the raw score (experimental).
    - A Pvalue_T which is the Pvalue considering the template and the raw score (experimental).
    - The Pscore which is a score combining the raw score and the PvalueQ.
    - The PQTscore which is the mean of the PValue_Q and Pvalue_T.


  • Alignment results
  • Alignment results are displayed by block of query-template alignments.
    Each block begin with the number of the hit, followed by the hit description.
    Then five alignments features are described:
    - The raw score of ORION (see Hits results).
    - The normalized score which is the raw score divided by the alignment length.
    - The Query coverage (see Hits results).
    - The percentage of identity (see Hits results).
    - The percenage of gaps in the alignments.

    The alignment of the query and the template is blocks of 60 characters of seven lines per block:
    - pbpred is the query PB sequence prediction.
    - psipred is the query secondary structure sequence predicted by PSIPRED.
    - Query is the aligned query amino-acid sequence.
    - Score is the score for each postion between the query and the template.
    - Template is the aligned template amino-acid sequence.
    - DSSP is the template secondary structure sequence assignation by DSSP.
    - PB is the template PB sequence assignation.
    Jones DT. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195-202.

    The protein Blocks (PB) encodes a structural alphabet defined by 16 folding patterns that describe the local conformation of the 3D protein structure.
    PB is assigned for a residue based on the dihedral angles computed from 5 residues. The two previous residues and the two next residues of the assigned residue.
    For example: for residue i, the assigned PB will depend on the dihedral angles of the residues i-2,i-1 and the residues i+1,i+2.
    Thus, the two first and last residues are not assigned.
    Residues i-2,i-1,i+1 and i+2 of missing residues in 3D structures are not assigned.
    PB explanation:
    m = central alpha-helix
    f,k,l = helix N-cap
    n,o,p = helix C-cap
    d = central beta-sheet
    b,c = sheet N-cap
    a,e,g,h,i,j = turns and coil
    Z = Unassigned position


    Example :

    Score scale : Low High

    No 1
    d1r1ha_ d.92.1.4 (A:) Neutral endopeptidase (neprilysin) {Human (Homo sapiens) [TaxId: 9606]}
    Score :  242.921  |  Normalized score :    0.299  |  Query coverage : 98.16%  |  Identity :     8.11%  |  Gaps :     4.51% 
    pbpred   :          mmmmmmmmmmcmmmmmcmmmmcbddddfklmmcmklmmmmmmopmbdcddddfklccmmm         
    psipred  :          HHHHHCCCCCCCCCCCCEECCCCEEEECCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCC         
    Query    :    14    NSWRRRGFAAFTPHTAARVTNGRRVVIDEAPSLPPHLLLLHMQRASSVHLLGDPNQIPAI       73
    Score    :                                                                                
    Template :     1    GICKSSDCIKSAARLIQNMDATTE----PCTDFFK-YACGGWLKRNV----IPETSSRYG       51
    DSSP     :          CBCCCHHHHHHHHHHHHHCCTTSC----TTTCHHH-HHHHHHHHHCC----CCTTCSEEE         
    PB       :          ZZfbfklmmmmmmmmmmpcfkbcf----klommlm-mmmmmmmmpcc----dfkbcbdcf         
    


    Modelling results

    The model is obtained using MODELLER running with the ORION query-template alignment selected
    A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.

  • View your model
  • You can view and interact with your model using your mouse or your trackpad.
    The viewer has been implemented with PV viewer. You can choose the style of the representation, by default the model is colored by rainbow colors on the secondary structure elements succession.

  • Alignment
  • The query-template alignment is displayed. Blue positions are the positions modelized.

  • Model quality estimation
  • DOPE is an atomic distance- dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE).
    DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures.
    Shen M, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Science : A Publication of the Protein Society. 2006;15(11):2507-2524. doi:10.1110/ps.062416606.

    Model quality estimation is performed using the DOPE score calculation.
    The DOPE energy is computed for all alpha carbon of the protein.
    A Zscore is computed from the score of 50 permutations of the model (decoys).
    The smaller these criteria are, the better the quality of the model is.

    For Zscore > -1 , the model quality is bad and colored in red.
    For -1 >= Zscore > -2 , the model quality is medium and colored in orange.
    For -2 > Zscore > -4 , the model quality is good and colored in green.
    For Zscore <= -4 , the model quality is very-good (near-native) and colored in blue.

    A DOPE score per residue is plotted in red, for each position of the alignment. This score is the mean value of the DOPE score per residue over a sliding windows of 15 residues.
    The gray line indicates the mean score per residue of -0.03 obtained from a large set of models with differents qualities.
    DOPE score residues under this treshold are good.


    Other tools for protein structure prediction

    -I-TASSER server : http://zhanglab.ccmb.med.umich.edu/I-TASSER/
    -Robetta server : http://robetta.bakerlab.org/
    -HHpred server : http://toolkit.tuebingen.mpg.de/hhpred
    -Ev-fold server : http://evfold.org/evfold-web/newprediction.do
    -RaptorX server : http://raptorx.uchicago.edu/StructurePrediction/predict/
    -SPARKS-X server : http://sparks-lab.org/yueyang/server/SPARKS-X/


    References

    • Ghouzam Y, Postic G, de Brevern AG, Gelly JC. Improving protein fold recognition with hybrid profiles combining sequence and structure evolution. Bioinformatics 2015, 31:3782-9.
    • Ghouzam Y, Postic G, Guerin PE, de Brevern AG, Gelly JC. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci Rep. 2016 Jun 20;6:28268.