PTM Structural Database

General remarks
Post-Translational Modifications can be in charge of many biological modulation and therefore could be implicated in various pathologies. They induce significant changes in protein dynamics (Xin F. et Radivojac P., 2012).

Many databases (see LINKS) exist to define residues associated with PTM, but none with structural information of the protein.

PTM-SD curates these information and provides important structural details on all PTMs found in protein structures. PTMs are more complicated to analyze on protein structures as their annotation are quite versatile.
PTM-SD provides an unified view on them. It gives a precise position and correspondance with true protein sequences as the different problematic cases observed
It is so useful for scientists interested in one PTM associated to one protein or scientists implicated in the analysis of a particular kind of PTM.

Moreover, PTM-SD gives information PTMs in terms of localisation, secondary structures, Protein Blocks and could be used also to define a non-redundant dataset.

Publication
Please cite:

Pierrick Craveur, Joseph Rebehmed, and Alexandre G. de Brevern
PTM-SD: a database of structurally resolved and annotated posttranslational modifications in proteins
Database 2014: bau041 doi:10.1093/database/bau041 published online May 24, 2014

Data source
We have used different databases to characterize the different PTM resolved in the protein structures (see the FLOWCHART for details):

Protein structures are found in Protein Data Bank (http://www.rcsb.org/pdb/home/home.do) [ref].
Note: Actually only the first model of NMR structures is kept.

Sequences contained in the PDB were compared with sequences of the UniProt database (http://www.uniprot.org/) [ref].

PTMs found in the protein structures were compared with PTMs annotated in dbPTM (http://dbptm.mbc.nctu.edu.tw/) [ref] and PTMCuration (http://selene.princeton.edu/PTMCuration/) [ref].

Flowchart

For the last step it was necessary
to verify if the PTM annotations correspond to the chemical structure of the modified residue found at the same position.

We use automatic and manual verification processes, which are based on the atom information extract from PDB files
and a correspondence annotation table.

PTM-SD Statistics

On ... there are ... entries in PTM-SD.

Pass the pointer over the pie graph below to get the corresponding number of entries.

Update
There are 2 different processes of update: the structures and the annotations update.

The structures data are weekly updated, just after the weekly update of the PDB.
Updated data in red.
The annotations data are monthly updated.
Updated data in red.

Browse Database
To access records from PTM-SD, the database can be browsed using 2 search modes according to different filters; the Simple and the Advanced Mode.

The filters in Simple mode are:

P D B - I D You can specified one or a list of PDB id code separated by semicolon. This filter is not case sensitive (Ex: 1fzc;1FZG;1Lt9).
U n i P r o t - A C You can specified one or a list of UniProt AC separated by semicolon. This filter is not case sensitive (Ex: P00396;P06008;P0C1V1).
O R G A N I S M Select among the list of organism (UniProt species identification codes followed by common name or, whether missing, by official scientific name).
P T M Select among the list of Post Translational Modification annotations.

Press Ctrl and Shift key to deal with individual and multiple selection.

For more complex filters use the advanced mode (see below).

Search
By clicking in the Search button you request Database as follow:

Each selected values in same filter category are treat with the "OR" operator.
Each filter categorie are treat with the "AND" operator.

So for example by filter records like this :

It will request records corresponding to the ARAHY (Peanut) organism "AND" corresponding to PTMs N-linked Glycosylation "OR" Hydroxylation.
Request traduction: "All records corresponding to N-linked Glycosylation and Hydroxylation in Peanut organism."

Advanced search
To pass in Advanced Mode click on the Advanced search button.

By using Advanced Mode you can add more detailed and specific filters for your request:

MODIFIED AMINO ACID Modified amino acid type ( = the residue type found at the PTM site position in the UniProt sequence).
SECONDARY STRUCTURE ON PTM LOCATION Secondary structure assignment at the PTMs positions. The secondary structure is assign by DSSP, and the corresponding output "letters" are :
B residue in isolated beta-bridge
C loop or irregular
E extended strand, participates in beta ladder
G 3-helix (3/10 helix)
H α-helix
I 5 helix (pi helix)
T hydrogen bonded turn
S bend

SCOP CLASSIFICATION OF PROTEIN The Structural Classification Of Proteins (SCOP). By using this filter, you can restraint your search to specific structural classes of proteins.
The different classes available in PTM-SD are:
a All alpha proteins
b All beta proteins
c Alpha and beta proteins (a/b)
d Alpha and beta proteins (a+b)
e Multi-domain proteins (alpha and beta)
f Membrane and cell surface proteins and peptides
g Small proteins
h Coiled coil proteins
i Low resolution protein structures
j Peptides
None No yet classified by SCOP

Number of PTM by pdb_chain Defined the number of PTM sites found on the same PDB chain.
Length of pdb_chain Defined the length of the PDB chain in which the PTM sites are found. It is possible to take or don't take into account the number of missing residues on the chain.
ANNOTATION FOUND IN MODRES PDB RECORDS MODRES comment found in the PDB structural files selected to build PTM-SD.
The MODRES record provides descriptions of macromolecule modifications found in structure; its generated by the wwPDB, and could be related to chemical, engineering, or post-translationnal modification.
ANNOTATION FOUND IN dbPTM The dbPTM PTM annotations. Its correspond to the detailed ones from the downloadable data for experimentally verified PTM sites.
By selecting value in this filter you request for specific PTM annotations.
However these annotations could not correspond to the PTM found in the cristal structure. For more clarification see the examples below.

To come back to Simple Mode click on the Hide advanced search button.
Note: your selection in Advanced Mode will be deselected by returning to Simple Mode.

Some examples to understand how to use the ANNOTATION FOUND IN dbPTM advanced filter:

First a simple example of search.

By selecting Acetylation in the PTM filter,
you can search for:

"All records corresponding to Acetylation".

There are (on January 2014) 93 corresponding modified residues
both structurally resolved in PDB structures
and experimentaly annotated in dbPTM.

By selecting "N6-acetyllysine"
in the abvanced filter dbptm_annot, you can search for:

"All records for which there is 'N6-acetyllysine'
in the dbPTM annotation.".

There are (on January 2014) 181 corresponding modified residues
both structurally resolved in PDB structures
and experimentaly annotated in dbPTM.

As you can see there is more corresponding entries than in the previous example,
in which the search was made for Acetylation entries.
It is important to understand that the position of one PTMs site both
structurally resolved and experimentaly annotated could
also correspond to the position of other PTM annotations.

See the N-TRIMETHYLLYSINE found in the Chain B of PDB 3N9L;
This PTMs site are annotated in dbPTM as N6,N6,N6-trimethyllysine
but also as N6-acetyllysine.

So by using both simple and advanced filters,
it is possible to make some specific request.

Thereby selecting Acetylation in the PTM filter,
and "N6-acetyllysine" in the abvanced one dbptm_annot,
you can search for:

"All records corresponding to Acetylation,
on which there is 'N6-acetyllysine' in the dbPTM annotation.".
This could be traduct as "All N6-acetylation of lysine both
structurally resolved and experimentaly annotated".

There are (on January 2014) 83 corresponding entries in PTM-SD.

Result table
By clicking on the Filter button, the search results display below the filter table.

The result table contains one line for each corresponding PTM site from PTM-SD.
It contains several information:

Details a link on the PTM-SD details page of the PTM site,
organism the protein organism,
uniprot_ac the UniProt AC with a link on the UniProt page of the protein,
pdb_id the PDB id Code with a link on the PDB page of the structure,
pdb_chain the PDB id Chain,
uniprot_pos the position of the PTM site in the UniProt sequence,
pdb_pos the position of the PTM site in the PDB structure,
pdb_modres the PDB MODRES comment related to the PTM site,
dbptm_annot all the dbPTM annotations related to the PTM site position in the protein,
PTM the general annotation of the PTM site,
scop_class the structural classification of the protein from SCOP,
aa_uniprot the amino acid type of the PTM site position in the UniProt sequence,
pb the local structure assignment (Protein Blocks) of -10/+10 positions surrounding the PTM site (highlighted in purple),
ss the secondary structure assignment (DSSP) of -10/+10 positions surrounding the PTM site (highlighted in purple).

Tools
PTM-SD provide 3 tools to compute statistics and analysis on selected entries from the result table.

Select entries by using the checkbox at the begin of each result table line.
Use the top checkbox to Select/Deselect all entries in the result table.

Statistics
Click on the Statistics button to compute pie chart of organism distribution, protein distribution and general PTM annotation distribution.

Click on the Show details button to display the raw data of the related pie chart.

The raw data are sortable by name, frequency or percentage.

Neq
Click on the Neq button to compute Neq graphic.

The equivalent number of PBs (Neq) is a statistical measurement similar to an entropy,
and represents the average number of PBs at a given position.

Neq is calculated as follows:

Where px is the probability of PB x.

Note: A Neq value of 1 indicates that only one type of PB is observed,
while a value of 16 is equivalent to a random distribution.

Clustering
The visitor has the possibility to compute dendogram and histogram of pairwise sequence identity between selected proteins.
He could also reduce redundancy in his selected entries thereby creating a non-redundant selection.

For the moment the clustering process is automatically made by the statistical sofware R (using hclust and cutree function).

Follow the example below to know how to use the Clustering tools:

1) Make a request and select all the return PTM-SD entries in the result table.
(see Browse Database section for help)

In our example we searched for all Methylation entries.

On January 2014 there was 861 related entries.

2) Then use Statistic tool to compute pie charts.
There are some proteins that are more represent than the other,
and maybe there are some protein that are homologs.

These 861 Methylation entries are observed in 163 PDB files,
which represent 40 different proteins (40 different UniProt AC).

3) Click on "Clustering"
On the left is compute a dendrogram which represents the homology
between these 40 proteins based on pairwise sequence identities
provided by Clustal W.
On the center is display a histogram which represents the distribution
of the pairwise sequence identities used to compute the dendogram.

On the rigth you have the possibility to reduce redundancy in your selected
entries.

4) Choose a threshold value (default is 30%) and
5) Click on "Reduce redundancy"

In our case 21 proteins were remove from our selection, which represent the
deselection of 485 lines (PTM-SD entries) in the results table.

The dendrogram and the histogram were automatically updated according to entries
which remained selected in the table.

The threshold value is indicated by a red line in both figures.

Some entries which remain selected in the table could correspond to the same
position in the same protein but found in different PDB files.

6) Click on "Remove duplicate positions"

...

By using again the Statistic tool, the reduction of the redundancy in the selected entries is observed.

Create your dataset
At any moment you have the possibility to donwload 2 kind of data corresponding to selected entries in the result table.

List of the P D B - I D in which the selected PTMs are found. A .csv style file containing the PTM-SD data of selected entries.

By using Tools you have the possibility to create a non-redundant selection, and then create your own dataset.
To obtain it follow this 4 steps:

Request the database by using filters in the Browse Database section

Select entries in the result table.
If needed, use the Clustering tool to reduce redundancy in your selection.

Go to CREATE YOUR DATASET section.
Click on the HERE button :

And follow the instructions that have just appear.

Gallery
In each PTM site "Details page", there are two images to illustrate the structure of the PTM site in context of the protein conformation.
The first one present an entire view of the PDB chain which contains the PTM site, and the second correspond to a zoom in the PTM site structure.

The current PTM site are colored in purple, and the other PTM sites present in the same PDB chain are highlighted in yellow.
This color code is the same in the aligned sequences below in the "Details page".

These two images were automatically generated with the PyMOL software.
The PyMOL script used to obtain these images is downloadable at the bottom of the "Details page".

Note: As the images production is automatic, the focus on the PTM site could not be the most optimized.
In these cases it is recommended to use the PyMOL script to visualize the PTM site in 3D.

Alignment
The "Details" page of each PTM site provide an alignment table highlighting the sequence/structure relationships in the chain which contains the concerned PTM site.
This last includes an alignment of the protein amino acids sequence, secondary structure and local structure (Protein Blocks) assignments.

The PTM position corresponding to the current page is highlighting in purple and the other PTM sites present in the chain are colored in yellow.
The same color code is used in the images gallery.

The PTM positions found in other PDB structures are highlighting in green.

Uniprot Sequence
The Uniprot sequence corresponds to protein sequence in which the PTM site is annotated.

Clustal
The sequences alignment were performed using Clustal W (2.0.12).
The symbols for similarity/idendity of position, provide by Clustal W, were also aligned. They facilitate the observation of differences between the protein sequence from UniProt and the chain residues from the PDB.

PDB Sequence
This sequence corresponds to the chain residues found in the PDB file.
The amino acids are sorted according to the polypeptide bond order, from the N-termini to the C-termini.
The amino acids are in one letter code, and the 'X' label correponds to unnatural residues.

Note: This sequence could not corresponds to the fasta sequence which could be found in the related PDB page. Indeed, the fasta could not include residues insertions, mutations, ...

PDB Information
The PDB information sequence is a linear representation of the reccord type of the PDB file (version 3.30).
A one letter code is used for each PDB chain positions, as follow:
H The residue is defined on the 'HETATM' PDB record.
"Non-polymer or other “non-standard” chemical coordinates, such as water molecules or atoms
presented in HET groups use the HETATM record type."
_ The residue is defined on the 'ATOM ' PDB record.
"The ATOM records present the atomic coordinates for standard amino acids and nucleotides."
I The residue is numeroted as an inserted residue.
"Alphabet letters are commonly used for insertion code. The insertion code is used when two
residues have the same numbering. The combination of residue numbering and insertion code
defines the unique residue."
M The residue is defined in the 'REMARK 465 MISSING RESIDUES' PDB record.
"REMARK 465 lists the residues that are present in the SEQRES records but are completely absent
from the coordinates section."
S The residue is defined on the 'SEQADV' PDB record.
"The SEQADV record identifies differences between sequence information in the SEQRES records of
the PDB entry and the sequence database entry given in DBREF."

As some positions could cumulate these information, the following hierarchical relation is used to choose the one letter representation:
M > S > H > I > _

DSSP Sequence
This sequence correspond to the secondary structure assignment made by DSSP. The output "letters" are :
B residue in isolated beta-bridge
C loop or irregular
E extended strand, participates in beta ladder
G 3-helix (3/10 helix)
H α-helix
I 5 helix (pi helix)
T hydrogen bonded turn
S bend

Protein Block Sequence
This sequence correspond to the Protein Blocks (PBs [ref]) assignment made by a slightly modified Python PBxplore tool.

This structural alphabet is composed of 16 local structure prototypes of 5 residues in length. They efficiently approximate every part of protein structures.

PBs m and d can be roughly described as prototypes for the central region of α-helix and β-strand, respectively;
PBs a through c primarily represent the N-cap of β-strand while e and f correspond to C-caps;
PBs g through j are specific to coils;
PBs k and l correspond to N cap of α-helix while PBs n through p to C-caps.

They have been used in various approaches, e.g. protein superimposition [ref, ref], protein binding site analysis [ref, ref] or prediction [ref, ref].

H	The residue is defined on the 'HETATM' PDB record. "Non-polymer or other “non-standard” chemical coordinates, such as water molecules or atoms presented in HET groups use the HETATM record type."
_	The residue is defined on the 'ATOM ' PDB record. "The ATOM records present the atomic coordinates for standard amino acids and nucleotides."
I	The residue is numeroted as an inserted residue. "Alphabet letters are commonly used for insertion code. The insertion code is used when two residues have the same numbering. The combination of residue numbering and insertion code defines the unique residue."
M	The residue is defined in the 'REMARK 465 MISSING RESIDUES' PDB record. "REMARK 465 lists the residues that are present in the SEQRES records but are completely absent from the coordinates section."
S	The residue is defined on the 'SEQADV' PDB record. "The SEQADV record identifies differences between sequence information in the SEQRES records of the PDB entry and the sequence database entry given in DBREF."

The structures data are weekly updated, just after the weekly update of the PDB. Updated data in red.	The annotations data are monthly updated. Updated data in red.

P D B - I D	You can specified one or a list of PDB id code separated by semicolon. This filter is not case sensitive (Ex: 1fzc;1FZG;1Lt9).
U n i P r o t - A C	You can specified one or a list of UniProt AC separated by semicolon. This filter is not case sensitive (Ex: P00396;P06008;P0C1V1).
O R G A N I S M	Select among the list of organism (UniProt species identification codes followed by common name or, whether missing, by official scientific name).
P T M	Select among the list of Post Translational Modification annotations.

First a simple example of search. By selecting Acetylation in the PTM filter, you can search for: "All records corresponding to Acetylation". There are (on January 2014) 93 corresponding modified residues both structurally resolved in PDB structures and experimentaly annotated in dbPTM.
By selecting "N6-acetyllysine" in the abvanced filter dbptm_annot, you can search for: "All records for which there is 'N6-acetyllysine' in the dbPTM annotation.". There are (on January 2014) 181 corresponding modified residues both structurally resolved in PDB structures and experimentaly annotated in dbPTM. As you can see there is more corresponding entries than in the previous example, in which the search was made for Acetylation entries. It is important to understand that the position of one PTMs site both structurally resolved and experimentaly annotated could also correspond to the position of other PTM annotations. See the N-TRIMETHYLLYSINE found in the Chain B of PDB 3N9L; This PTMs site are annotated in dbPTM as N6,N6,N6-trimethyllysine but also as N6-acetyllysine.
So by using both simple and advanced filters, it is possible to make some specific request. Thereby selecting Acetylation in the PTM filter, and "N6-acetyllysine" in the abvanced one dbptm_annot, you can search for: "All records corresponding to Acetylation, on which there is 'N6-acetyllysine' in the dbPTM annotation.". This could be traduct as "All N6-acetylation of lysine both structurally resolved and experimentaly annotated". There are (on January 2014) 83 corresponding entries in PTM-SD.

Details	a link on the PTM-SD details page of the PTM site,
organism	the protein organism,
uniprot_ac	the UniProt AC with a link on the UniProt page of the protein,
pdb_id	the PDB id Code with a link on the PDB page of the structure,
pdb_chain	the PDB id Chain,
uniprot_pos	the position of the PTM site in the UniProt sequence,
pdb_pos	the position of the PTM site in the PDB structure,
pdb_modres	the PDB MODRES comment related to the PTM site,
dbptm_annot	all the dbPTM annotations related to the PTM site position in the protein,
PTM	the general annotation of the PTM site,
scop_class	the structural classification of the protein from SCOP,
aa_uniprot	the amino acid type of the PTM site position in the UniProt sequence,
pb	the local structure assignment (Protein Blocks) of -10/+10 positions surrounding the PTM site (highlighted in purple),
ss	the secondary structure assignment (DSSP) of -10/+10 positions surrounding the PTM site (highlighted in purple).

Select entries by using the checkbox at the begin of each result table line.	Use the top checkbox to Select/Deselect all entries in the result table.

1) Make a request and select all the return PTM-SD entries in the result table. (see Browse Database section for help) In our example we searched for all Methylation entries. On January 2014 there was 861 related entries. 2) Then use Statistic tool to compute pie charts. There are some proteins that are more represent than the other, and maybe there are some protein that are homologs. These 861 Methylation entries are observed in 163 PDB files, which represent 40 different proteins (40 different UniProt AC).
3) Click on "Clustering" On the left is compute a dendrogram which represents the homology between these 40 proteins based on pairwise sequence identities provided by Clustal W. On the center is display a histogram which represents the distribution of the pairwise sequence identities used to compute the dendogram. On the rigth you have the possibility to reduce redundancy in your selected entries. 4) Choose a threshold value (default is 30%) and 5) Click on "Reduce redundancy"
In our case 21 proteins were remove from our selection, which represent the deselection of 485 lines (PTM-SD entries) in the results table.
The dendrogram and the histogram were automatically updated according to entries which remained selected in the table. The threshold value is indicated by a red line in both figures. Some entries which remain selected in the table could correspond to the same position in the same protein but found in different PDB files. 6) Click on "Remove duplicate positions"
...
By using again the Statistic tool, the reduction of the redundancy in the selected entries is observed.

List of the P D B - I D in which the selected PTMs are found.	A .csv style file containing the PTM-SD data of selected entries.

1	Request the database by using filters in the Browse Database section
2	Select entries in the result table. If needed, use the Clustering tool to reduce redundancy in your selection.
3	Go to CREATE YOUR DATASET section. Click on the HERE button :
4	And follow the instructions that have just appear.

B	residue in isolated beta-bridge
C	loop or irregular
E	extended strand, participates in beta ladder
G	3-helix (3/10 helix)
H	α-helix
I	5 helix (pi helix)
T	hydrogen bonded turn
S	bend