As of September 2007, the KNOTTIN database is updated using mostly
the automatic procedures KNOTER3D and KNOTER1D, according to the flow
chart shown below.
KNOTER3D determines if a three-dimensional structure is a knottin:
KNOTER1D predicts if a protein sequence is a knottin
- New Protein Data Bank entries are filtered for
proteins with at least 3 disulfide bridges and a connectivity typical
of knottins (but not only), i.e. I-IV, II-V, III-VI..
- The selected protein structures are renumbered according to the
unique knottin numbering.
- The renumbered structures are
superimposed onto a reference knottin structure (the X-ray structure of
Cucurbita Pepo Trypsin Inhibitor II, CPTI-II, PDB ID 2btcI) for
residues 40, 60-61, 79-81 and 99-100. These residues approximately define
Beta-sheet (CSB) elementary motif in knottins that includes
the II-V and III-VI disulfide bridges.
Proteins that display both a CSB motif and a third I-IV disulfide bridge are indeed knottins.
- Root mean square deviations below 2.5 define knottins.
Ambiguous hits ( 2.5 < RMSD < 3.5 ) are manually checked.
If the RMSD is above 3.5, the structure is rejected.
- The SwissProt/TrEMBL (UniProt) release is filtered for
cysteine-rich proteins. Each protein seq1 in the resulting subset is
compared to all known knottin sequences knot_seq2 using a similarity
score S based on:
- the BLAST P-value for the seq1/knot_seq2 comparison.
- the number of conserved cysteines of the knot when aligning seq1 onto
the multiple alignment of the family of knot_seq2.
- the compatibility of intercysteine loop lengths with known knottin
- the taxonomic proximity between seq1 and knot_seq2.
- the similarity between the seq1 function and knottin functions
- the similarity between seq1 keywords and knottin keywords, and/or
dissimilarity between seq1 keywords and non-knottin keywords.
- New sequences are classified by decreasing S scores.
This list is then manually annotated for sequences to be included or
General data for all selected knottin structures are
obtained from the Protein Data bank or MMDB and are stored in Table 1.
Local structural information is computed for each residue and stored in
Non-local information is computed for all interactions and stored in
General data for all selected knottin sequences are obtained from the
SwissProt/TrEMBL database and stored in Table 4.
| Searching the database
| The button "Search
Database" points at a web form that allows to search either knottin
structures OR knottin sequences.
- Select the SEQUENCE or STRUCTURE database
- Make selections based on various criteria
Select one or all knottin protein families
Enter keyword (currently only one) to select entries that contain the
keyword in the Family, SwissProt ID, PDB ID, Descriptor, Source or
Function fields. The keyword is case insensitive.
exemples: 2eti, agatoxin, sodium, ITR2_momco, homo
allows selection of the method used to build the model (X-ray or NMR)
(i.e. loop lengths). Use Perl Regular expressions to select particular
loop lengths. For example, "^[2-4]$" will select loops with 2, 3, or 4
- Sort the results
- Family sort by family name
- SwissProt ID: sort by SwissProt ID
- PDB code: sort by PDB code
- Length: sort by sequence length
- RMS dev: sort by root mean square deviation from the reference structure 2btcI.
- Nomenclature: sort by the a-e loop lengths as used in the knottin
- Output tables
- The checkboxes allow selections for further sequence alignments or
structural superimpositions available via menus immediately above
and (possibly) below the output tables.
The green buttons point at individual knottin cards
- The "Jmol" link in structure tables forwards you to 3D viewing of the structure with the Jmol Java applet.
- Nomenclature: The knottin
nomenclature based on loop lengthes
- Family: The protein family
- SwissProt: The SwissProt ID ("-" means that no SwissProt ID is available yet)
- PDBsum: The PDB code points at the corresponding page in
- For 3D structures
R points at renumbered PDB files according to
the unique knottin numbering scheme
points at renumbered AND structurally fitted PDB files. The
renumbered structures are superimposed to a reference knottin structure
(the X-ray structure of Cucurbita Pepo Trypsin Inhibitor II, CPTI-II,
PDB ID 2btcI) for residues 40, 60-61, 79-81 and 99-100. These residues
define approximately the Cystine-Stabilized
Beta-sheet elementary motif in knottins that includes the
II-V and III-VI disulfide bridges
The chain ID in the PDB file
The sequence length
the RMS deviation from the reference structure for backbone atoms of
residues 40, 60-61, 79-81 and 99-100
The method for structure determination.
- Individual knottin cards
data similar to those in the Main tabular output are available
Supplementary information essentially includes:
"Collier de Perles"
two-dimensional representation. We gratefully thank Marie-Paule
Lefranc, for authorizing the use of the expression "Collier de Perles"
which originally refer to standardized 2D representations, in IMGT, the
international ImMunoGenetics information system® (http://imgt.cines.fr) [Lefranc,
M.-P.et al. Nucleic Acid. Res., 1999, 27, 209-212]
The original program for automatic drawing of Colliers de Perles was
rewritten and adapted to knottins by Quentin Kaas.
• For 3D
sequence-structure table showing:
The knottin unique numbering.
The amino acid at each position
The secondary structure computed using the Stride
The Phi angle computed using the Stride
The Psi angle computed using the Stride
The accessible surface area computed using the Stride
The percent of accessibility computed using the local PDBgeo program
columns indicate the
knotted disulfide bridges
Brown columns indicate
the additional disulfides
- Sequence alignments
- A "Get Alignment" button is available on top of the output tables.
- It is first necessary to select which knottins must be aligned using either
the "Select All" button or the checkboxes on the left side of each protein line.
- The renumbered sequences of the selected
proteins are passed to PAT
(Protein Analysis Toolkit written by Jerome Gracy firstname.lastname@example.org)
that manages to produce and display an html color-coded
- Each alignment displayed on the KNOTTIN website is preceeded by a drop-down
menu that allows to:
- retrieve a text-formated version of the alignment in various formats for download
- create a sequence logo
- send the alignment to PAT
and switch the browser page to the PAT webserver for
more powerful analyses.
| BLASTing the database
| The button
"Blast Database" points at a web form that allows users to upload their
own sequence and BLAST (or search a HMM model database) this sequence against the KNOTTIN
- Select search
BLAST means that a blastp is performed against all sequences in the
database (including sequences of the PDB files)
HMM: selecting HMM means that a HMMsearch is performed against HMM models pre-computed for each family in
the database. The output diplays first the HHMsearch result, then the alignment of the user
sequence with the sequences used to build the matching model (HMMalign
is used for this).
- Paste sequence in FASTA format
The given sequence must conform to the
(see below) and must only contain permitted characters.
A sequence in FASTA format begins with a single-line
description, followed by lines of one-letter code sequence data.
- The description line starts with a greater than
- The word following the greater than symbol
(">") immediately is the name of the sequence, the rest of the
line is the description.
- Name and description are optional.
- All lines of text should be shorter than 80
- The sequence ends if there is another greater
than symbol (">") at the beginning of a line.
>ITR2_ECBEL Trypsin Inhibitor II (EETI-II).
- Select MAX
number of hits
Restrict the number of hits shown in
- Select MAX
Display only hits with below the
maximal selected p-value
The user sequence is passed to PAT (Protein
Analysis Tools written by Jerome
Gracy) that manages to run BLASTA against the knottin
The BLAST output is then HTML color-coded by PAT for display.
| CONFORMATION SEARCH
button "CONFORMATION SEARCH" points at a web form that allows users to
search the KNOTTIN database for 20-residue long segments that satisfy
the selection explained below:
- Select display
Default display includes sequence and
position according to the unique
Users can choose to display the Phi and Psi angles, the secondary
structure and the percent of accessibility (PAC) by checking the
appropriate radio button.
All subsequent selections are optional,
although at least one selection must be performed. The maximum output
is currently limited to 1000 segments.
- Select protein
To select protein(s), type a list of
PDB IDs separated by blanks. (example "2btc 1ha9 1c6w").
The segment search will only be done on these proteins.
Note that if no other selection is performed, results will include all
sliding segments for each , i.e. as many segments as there is residues
in the protein. To avoid this, users can select a starting position for
the segment (see below)
- Starting position
To select only segments starting at a
given position, users can use the scrolling list. The position
correspond to the unique
knottin numbering. "*" means any position.
- Select sequence pattern
Users can select specific residue for
each position along the 20-residue segment using the scrolling lists.
"*" means any residue
- Select Ramachandran quadrant
Users can select the Ramachandran
quadrant for each residue along the 20-residue segment.
*: any quadrant
UL: Upper Left (Phi <0, Psi>0)
UR: Upper Right (Phi >0, Psi>0)
LL: Lower Left (Phi <0, Psi<0)
LR: Lower Right (Phi >0, Psi<0)
- Select secondary structure
Users can select the secondary
structure for each residue along the 20-residue segment. Secondary
structure is computed using the STRIDE program and uses the following
*: any secondary structure
G: 3-10 helix
B: Isolated bridge
C: Coil (none of the above)
- Output display
ouput consists of a succession of segments extracted from the KNOTTIN
The first line indicates the protein (PDB ID, chain) and the sequence
of the segment.
The second line indicates the numbering of each residue according to
the knottin unique
Subsequent lines indicates dihedral angles, percent of accessibility or
secondary structure, depending on the output request.
Output positions that match the requested selection are shown in red.
|The "Tools" menu points at a web form that allows users to submit a PDB file or a protein sequence.
KNOTER1D will determine if a 3D structure is a knottin
KNOTER3D will predict if a protein sequence is a knottin.
- Submit a protein structure or a protein sequence
The 3D structure must be in PDB format and the sequence in the FASTA format.
- Provide additional information if a sequence is submitted
KNOTER1D will search for homologs in the
SwissProt/TrEMBL database, and extract taxonomy, function or keywords
from the best homolog if any. The user has the possibility to
provide this information directly to KNOTER1D.
KNOTER3D or KNOTER1D will indicate if the protein is or is probably a knottin,
A detailed output of KNOTER1D is available via an html link.
If the protein is a knottin or putative knottin, standardized information is provided
• the original and standardized sequence numbers of the cysteines of the knot
• the knottin nomenclature
• a standardized Collier de Perles 2D representation
• a standardized renumbered alignment (knotted cysteines I, II, III, V and VI are renumbered 20, 40, 60, 80 and 100, respectively)
| Regular Expressions
expression can be used to select knottins in the KNOTTIN database
according to the knottin nomenclature,
i.e. according to loop lengths. To facilitate writing, separate
expressions are entered for each loop.
For example: ^1$ matches 1; 1 matches 1, 12, 31 etc; ^[2-4]$ matches 2,3 or 4; ^[1,4,7][0,1]*$ matches 1, 4, 7, 10, 40, 70, 11, 41, 71
Main metacharacters used in regular expressions are indicated below:
| beginning of string
|| the string begins with "6"
| end of string
|| the string ends with "6"
| any character (except newline)
|| an "a" followed by any single character followed
by a "6"
| match 0 or more times the preceding item
|| any number of (or no) "c" followed by a "3"
| match 1 or more times the preceding item
|| any number of "c" (at least one) followed by a
| match 0 or 1 times the preceding item
|| no or one "c" followed by a "3"
| set of characters
|| either "a" or "b" or "c" followed by a "3"
|| either the string begins with a "1" or ends with
| repetition modifier
|| two to five "c" followed by a "3"