Introduction

The tertiary structure of a protein chain is largely characterized by regularities involving repetitive local backbone conformations. A limited set of local conformations can be defined to approximate a complete protein backbone. Such a collection of local structural prototypes are called Structural Alphabets. Protein Blocks (PBs)(de Brevern et al. 2000; Joseph et al. 2010) is one SA involving 16 pentapeptide conformations (represented by alphabets a to p) characterized by backbone dihedral angles. PBs has been utilized to address a wide range of biological questions.

iPBA Method

PBs enables the representation of three dimensional structures of protein in one dimension, as a sequence. This reduces the problem of protein structural comparison to a classical sequence alignment. Needleman Wunsch (Needleman and Wunsch 1970) and Smith Waterman (Smith and Waterman 1981) algorithms were used earlier for PB alignment and a PB substitution matrix was generated for this purpose (Tyagi et al. 2006; Tyagi et al. 2008). This server provides an improved version of PB alignment using (i) specialized substitution matrices for pairwise alignment and database search and (ii) an anchor-based dynamic programming algorithm. The two protein chains are first translated into PB sequences. A set of local alignments (anchors) associated with these two sequences are obtained using SIM algorithm (Huang 1991). The segments between anchors (linkers) are then aligned using the Needleman-Wunsch algorithm. Specific set of affine gap penalties are used for the anchor and linker alignments. The structural integrity of anchors are checked using distance constraints. Anchor regions are highlighted in the alignment. The quality of pairwise PB alignments is quantified using different scores. 1) The dynamic programming alignment score: Aln_Score = Alignment score/ Alignment length 2) A score similar to GDT_TS for PB sequence alignment, derived using seven decreasing cut-offs of PB substitution scores (similar to distance cut-offs for GDT_TS (Zemla 2003)).

Where k corresponds to the total number of thresholds used, i.e. 7. Pj is the percentage of PB substitutions that are within the cut-off level j. The percentage is calculated either with respect to the alignment length (GDT_PB1) or with respect to a sum: S = Nsubs+Nlen+Ngap where Nsubs is the number of substituted PBs in the alignment, Nlen is the difference in length of the two sequences, Ngap is the number of gapped regions (series of gaps) in the alignment. This score (GDT_PB2) is sensitive to the distribution of gaps in the alignment, alignment with a few gapped stretches get a better score than that with many short stretches. The PB alignment in 1D is translated back to 3D and the alignment is refined in 3D using ProFit. The ProFit alignment reports the number of aligned residues within a distance cut-off (5Å) and the RMSD calculated on these residues. The server also reports a GDT_TS score is also calculated for the alignment. For finding structural homologues for a target protein, SCOP dataset refined at different sequence identity cut-offs, is used. User can choose this cut-off that decides the level of redundancy in the databank. The top 100 hits are reported based on the dynamic programming alignment score. This score is scaled to values between -13 and 17. Values greater than 1.5 are generally associated with high confidence. User is given the option to generate the list of hits based on the GDT_PB1 score. Like the GDT_TS, this score typically varies between 0 and 100 and scores above 35 have significant structural similarity.

References

  • de Brevern, A.G., Etchebest, C., and Hazout, S. 2000. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins 41: 271-287.[pubmed]
  • Huang, X., Miller, W. 1991. A time-efficient linear-space local similarity algorithm. Advances in Applied Mathematics 12: 337 - 357.[article]
  • Joseph, A.P., Agarwal, G., Mahajan, S., Gelly, J.C., Swapna, L.S., Offmann, B., and Cadet, F. 2010. A short survey on protein blocks. Biophys Rev 2: 137-145.[article]
  • Needleman, S.B., and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443-453.[pubmed]
  • Smith, T.F., and Waterman, M.S. 1981. Identification of common molecular subsequences. J Mol Biol 147: 195-197.[pubmed]
  • Tyagi, M., de Brevern, A.G., Srinivasan, N., and Offmann, B. 2008. Protein structure mining using a structural alphabet. Proteins 71: 920-937.[pubmed]
  • Tyagi, M., Gowri, V.S., Srinivasan, N., de Brevern, A.G., and Offmann, B. 2006. A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins 65: 32-39.[pubmed]
  • Zemla, A. 2003. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 31: 3370-3374.[pubmed]