Method
This page provides more explanation on the different informations given for each variant. This information can be found on the descriptive page of a variant. It's accessible via the "Search" or "List" page, by clicking on nucleotide notation.
Variant description
- ID nucleotide
- - Deletion (del):
- If the deletion involves several nucleotides, we have the first nucleotide position concerned, followed by the last nucleotide position, and ending with the inscription « del ». Example: c.1101_1152del, it means that the deletion encompass nucleotide 1101 to 1152. If deletion involves only one nucleotide there is only one position indicated. Example: c.1214del, it means that only the nucleotide 1214 was deleted. In publications, it's possible to find the inscription "del", followed from deleted nucleotide.
- - Insertion (ins) :
- The two nucleotides positions between which the insertion takes place are indicated and followed by the notation "ins", and ending with the inserted sequence. Example: c.1155_1156insTGTCG, its means that the nucleotide motif "TGTCG" was inserted between the nucelotide 1155 and 1156.
- - Deletion+Insertion (delins) :
- If the deletion involves several nucleotides, we have the first nucleotide position concerned and the last, followed by the notation "delins", and ending with the inserted sequence. Example: c.1118_1145delinsCGTTTA, it means that the nucleotides 1118 to 1145 was deleted and replaced by the nucleotid motif "CGTTTA". If the deletion implies only one nucleotide, we have its position, followed by the notation "delins", and ending with the inserted sequence. Example: c.1154delinsTCTGTC, it means that only the nucleorid 1154 was deleted and replaced by "TCTGTC". In publications these notations can be modified. For example, in the case of a deletion+substitution, the deletion and substitution can be separated by a "/", or "delins" can be replaced by ">".
- - Substitution (>):
- If the substitution involves several nucleotides, we have the first nucleotide position concerned , followed by the last, followed by the substituted and the replacement sequences separated by " > ". Example: c.1149_1152GGAC>TTGTCA, it means that the substitution encompass nucleotide 1149 to 1152, corresponding to "GGAC", replaced by "TTGTCA". If the substitution involves only one nucleotide, we have its position, followed by the last of the replaced and the replacement nucleotides separated by ">". Example: c.1113A>G, it means that only the nucleorid 1113, corresponding for nucleotide "A", was replaced by nucleotide "G".
- - Complex :
- Complex cases accumulate at least two previous mutations type and use the same notation. The notations for each mutation are separated by ";". Example: [c.1113A>C;1122_1140del], in this case we have firt one substitution of the nuclotide 1113, corresponding for nucleotide "A", by nucleotide"G". Secondly, we have a deletion of nucleotides 1122 to 1140. The notation can be in "[ ]" or not.
- Protein ID
- Mutation
- -Insertion: addition of one nucleotide or nucleotide segmen
- -Deletion+insertion: deletion of a nucleotide segment and replace it by a new one.
- -Substitution: replacement of one amino acid by another.
- -Complex: mutation of different types taking place at different positions.
- Type
- Classification
- - Class A: Losses of the last two acidic amino acid segments (ref typology) and have the common terminal sequence.
- - Class B: Loss of any acidic amino acid segment a (ref typology) and have the common terminal sequence.
- - Class C: Loss of the last acidic amino acid segment (ref typology) and have the common terminal sequence. The last two classes don’t have the common terminal sequence.
- - Class D: short sequence with variable termination.
- - Class E: Sequence retaining reticulim retention pattern (KDEL), like the wild type CALR sequence. Example of variant protein sequence of different classses can be see in Figure 2.
- Category
- COSMIC code
- Pathology
-
The nucleotide notation describes the cDNA mutation. All notation begins with « c. », followed by a specific notation to each mutation type. 4 mutations type are present in the database:
-
The protein notation describes the mutation in the protein. All notation begins with "p.". In general cases, "p." is followed by the first mutated amino acid letter, its position and the replacement amino acid letter, then the notation "fs*" (for frameshift), and ending with the mutated sequence size.
Example: p.K368Rfs*51, the amino acid "K", position 368, is the first amino acid muted, replaced by "R". The new sequence muted measures 51 amino acids
Specific case:
In the case where the variant protein sequence is identical to the wild-type CALR protein sequence, "fs*X" is replaced by "=".
In the case where the mutation induces an anticipated stop codon without nucleotide replacement, the annotation ending with "*". Example: p.E370*.
If the stop codon appears after an amino acid replacement, the notation stops after the replacement amino acid letter. Example: p.K368R.
If several mutations occur at distant locations in the nucleotide sequence result to distant mutations on the protein, the replaced amino acids, their position and the replacement amino acid letter are separated by a "+".
Example: p.E371D+K375Rfs*49, it means that amino acid "E", position 371, is the first amino acid muted by ponctuel substitution, and replaced by amino acid "D". The rest of the mutation beging to amino acid "K", position "375", replaced by "R", and produced a new muted sequence of 49 amino acids.
Two protein notations can have the same first amino acid muted, but the following sequence can have different size. It depends to the original mutation.
Also, a same muted protein can be caused to different genomic mutations.
-
Indicates the mutation type that has taken places on the nucleotide sequence. There are 4 mutation types in the database:
-
Thanks to the work lead by Kamplf and collaborator and Nangalia and collaborator, the different variants have been divided into 1-like type, 2-like type and other type. This typology was determined according to variant type 1 and type 2, which are the most frequently observed mutations. These variants each involve a different acidic amino acid segment. Three segments were identified: segment 1 from D362 to E364, segment 2 from E369 to D373 and segment 3 from E378 to D384. The 1-like variants lose the last 2 segments and the 2-like variants keep the 3 segments. (See Figure 1)
-
After observing a greater variety of mutated sequences in CALR-ET variants, we realized that the classical typology described above was limited. A new classification has been proposed for this database.
The majority of the variants have a common terminal sequence: "RMRRTRRKMRRKMSPARPRTSCREACLQGWTEA". This sequence is found in the of the first 3 variants classes:
-
57 variants have been identified in the literature and named from type 1 to 57 (Kamplf, T. et al., 2013; Pietra, D. et al., 2016). They represent the first CALR mutations observed in the chronic myeloproliferative neoplasms (MPNs) cases. In the Kamplf and collaborator publication we can also found two germinal variant, noted "Type G1" and "Type G2". If a variant is one of the 57 identified or germinal variants, its number appears in this section, if not it’s replaced by "#NA".
You can see all categorized variants on the "List" page, by clicking on the "List of categorized variants" button.
-
Genomic Mutation ID is a unique code to a genomic mutation and allows the variant to be found in the COSMIC database (https://cancer.sanger.ac.uk/cosmic; Tate et al., 2018), if it is exist there. It begins by "COSV" and followed by a unique number. If it doesn't exist, it’s replaced by "#NA" notation. It indicate the definitive position of the variant on the genome.
-
Indicates the different pathologies for which the variant has been observed. Each variant is involved at least in Essential thrombocythemia, exepted for the categorized variants.
Structural data
-
Secondary structure prediction and a 3D model are available for the majority of variants.
All sequences begin to amino acid A352.
- Secondary structure prediction
- The secondary structure prediction of the variants was generated with PSIPRED v.4.0 (http://bioinf.cs.ucl.ac.uk/psipred/; Buchan, D. W. and al., 2019). Three informations are provided (see Figure 3) :
- - Conf: provides the confidence level of the prediction associated with an amino acid position. The confidence level ranges from 0(low) to 9(high).
- - Pred: Prediction of the secondary structure, the coils (C) are indicated in blue and the helix (H) in red.
- - AA: The protein sequence of the variant. Amino acids are shown in red and basic amino acids are shown in blue.
- 3D modelling
-
3D model variants was generated by Modeller v.9.24 (https://salilab.org/modeller, Sali et Blundell, 1993).
A template was found with PSI-BLAST for modeling type 1 (class A) and type 2 (class B) variant. Constraints were added on the helix positions, according to the analysis of ther secondary structure prediction. The type 1 and type 2 model was used as template for the class A and class B variants respectively.
For class C variants, the 3D structure of the class A or B variant with the highest percentage of sequence identity was used as the template.
For class D and class E variant, WT CALR structural model (PDB code: 6ENY_G) was used as template. If the protein mutation is only a ponctual substitution or a early stop codon, the WT CALR structural model was directely modified on PyMOL (The PyMOL Molecular Graphics System, Version 0.99rc6 Schrödinger, LLC).
Remarks
-
Some categorized variants may be presented with different nucleotide notation in publications. Only one notation is listed for each type in the database.
For example, the variant type 1 can be found with the notation c.1099_1150del or c.1092_1143del.
In the database the variant type 1 has the notation c.1099_1150del.