Introduction

iPBA is an efficient method that can be used mine a database of structures to find similar (or partially similar) protein folds and to superimpose two protein structures. Hence two modes are available: (A) to superimpose two protein structures and (B) to mine SCOP databank. The hits obtained from mining can be also used to obtain pairwise 3D superposition. The main interest of iPBA is its better performance in terms of mining efficiency and alignment quality, when compared to other available methods.

A-Superimposition


1- Front page


iPBA takes two sets of coordinates of protein chains (in PDB format) and superimpose them.





Three options are given:

- directly using PDB files from local PDB databank based on user supplied PDB ID.

- uploading local PDB file by the user.

- it is also possible to use (i) one PDB chain from local PDB databank and (ii) uploading another one.



One important point is not to forget to give the PDB chain identifier

__________________________________________


2- Errors / computation




If a wrong PDB chain is given, iPBA computation fails.



If the PDB is not found in the local PDB databank, iPBA computation stops.



If the uploaded cordinate file is not a PDB file, iPBA computation stops with message: Please check your PDB file.




If the input files (or IDs) are valid, a message to wait till processing gets completed, is displayed.
A fixed url is also given, you can copy it to come back later (results are kept for only one week on the server).

__________________________________________


3- Results and Outputs





The complete output of the superimposition performed by iPBA can be divided into 4 main parts :

1. The Alignment Summary.

2. The downlodable files.

3. The PyMol images for visualization.

4. The JMol applet for 3D view of superposition.

We will describe the different part of the outputs:


__________________________________________


1. The Alignment Summary.





The important results are given in the upper part. (1) a color gradient (red to green) highlights the significance/quality of the alignment. Red color stand for a very poor similarity, green indicates a very close relationship (in terms of protein 3D structures, sequence identity could be quite divergant).

(2) gives the alignment scores, the root mean square deviation, the alignment length, the number of aligned residues, the fraction of residues aligned and the GDT_TS (see About for more details).
In this example, RMSD is low (1.22 A) with 92% of the residues aligned, the two proteins have related protein folds. As expected, GDT_TS is also high.

(3). The amino acid sequence alignment is given with the corresponding PB assignments. The first protein chain is in red and the second in green. PDB residue positions are also given. A scrollbar allows to see through the complete alignment length. The anchors (local regions of high similarity) are highlighted in bold.


__________________________________________


2. The downlodable files.





The superimposed PDB files can be download [click here: 1, 2].



A PyMol script is also present [click here] (see next section for more details).



A complete summary is also given in a flat file. It is very simple to read and parse the file [click here]:

- Amino acid sequences in Fasta format

- PB sequences in Fasta format

- The alignment of amino acid sequences)

- Different alignment scores.


__________________________________________


3. The PyMol images.



Four different views of the superimposed structures are given with rotations of 0°, 90°, 180° and 270°:



90°

180°

270°


Using the downloadable PyMol script, it is also possible to (a) visualize in local PyMol program, the superimposed structures and (ii) highlight some residues, ... or perform different rendering:

With the PyMol script, it is simple to look at the superimposed structures



PyMol on your personnal computer.



One ray tracing example.



Another ray tracing example.


__________________________________________


4. The JMol applet superimposition.



PyMol gives very nice rendering, but you can directly look at the superimpositon thtough the use of JMol applet. This applet needs to have Java installed (see http://www.java.com/fr/download/index.jsp).



Default rendering (cartoon).



Trace rendering.



Backbone rendering.

______________________________________________________________________________________________________________________________
______________________________________________________________________________________________________________________________


B-Mining the SCOP


1- Front page


iPBA takes one protein chain (in PDB format) as input and mine local SCOP databank to find similar / related folds. The default databank used if SCOP 70%, you can also choose SCOP 40% (faster), or 95% and 100% (takes more time). An expected run-time is below 10 mn.





Two options are given for input:

- using directly PDB files from PDB databank.

- uploading local PDB files by user.

One important point is not to forget to mention the PDB chain ID. Expected errors are the similar to that of pairwise superimposition.




If input file is valid, a waiting message is displayed.
A fixed url is also given, you can copy it to view the results later (results are kept for one week on the server).

__________________________________________


2- Results and Outputs





iPBa ranks you the 100 best related folds (by decreasing compatibilty score, a GDT_PB score is also provided). It is possible to have a more precise alignment in a second step (see section "A-superimposition")



__________________________________________


3- Examples of results


Mining is very interesting and useful as it helps to analyse different results. Here, we have aligned the 1st (best), 23th and the 90th hits with the query



1st: Alignment has a very low RMSD of 0.61 A (Normalized score of 559.57) and 99.4% of its residues are aligned. It is clearly the same fold (as seen below). GDT TS equals to 92.54.







23rd: It has a low RMSD of 1.48 A (Normalized score of 146.20), and 71.2% of its residues are aligned. It is a related fold (as seen below). GDT TS equals to 69.42, but some local variation can be also found.





90th: The RMSD of 2.68 A is not significant as only 8.11% (27 residues / 333) are aligned and thus highlighted in red.



______________________________________________________________________________________________________________________________
______________________________________________________________________________________________________________________________


References

  • de Brevern, A.G., Etchebest, C., and Hazout, S. 2000. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins 41: 271-287.[pubmed]
  • Huang, X., Miller, W. 1991. A time-efficient linear-space local similarity algorithm. Advances in Applied Mathematics 12: 337 - 357.[article]
  • Joseph, A.P., Agarwal, G., Mahajan, S., Gelly, J.C., Swapna, L.S., Offmann, B., and Cadet, F. 2010. A short survey on protein blocks. Biophys Rev 2: 137-145.[article]
  • Needleman, S.B., and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443-453.[pubmed]
  • Smith, T.F., and Waterman, M.S. 1981. Identification of common molecular subsequences. J Mol Biol 147: 195-197.[pubmed]
  • Tyagi, M., de Brevern, A.G., Srinivasan, N., and Offmann, B. 2008. Protein structure mining using a structural alphabet. Proteins 71: 920-937.[pubmed]
  • Tyagi, M., Gowri, V.S., Srinivasan, N., de Brevern, A.G., and Offmann, B. 2006. A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins 65: 32-39.[pubmed]
  • Zemla, A. 2003. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 31: 3370-3374.[pubmed]