Method



This page provides more explanation on the different informations given for each variant. This information can be found on the descriptive page of a variant. It's accessible via the "Search" or "List" page, by clicking on nucleotide notation.

Variant description

  1. ID nucleotide

  2. The nucleotide notation describes the cDNA mutation. All notation begins with « c. », followed by a specific notation to each mutation type. 4 mutations type are present in the database:

    - Deletion (del):

    If the deletion involves several nucleotides, we have the first nucleotide position concerned, followed by the last nucleotide position, and ending with the inscription « del ».
    Example: c.1101_1152del, it means that the deletion encompass nucleotide 1101 to 1152.

    If deletion involves only one nucleotide there is only one position indicated.
    Example: c.1214del, it means that only the nucleotide 1214 was deleted.

    In publications, it's possible to find the inscription "del", followed from deleted nucleotide.

    - Insertion (ins) :

    The two nucleotides positions between which the insertion takes place are indicated and followed by the notation "ins", and ending with the inserted sequence.
    Example: c.1155_1156insTGTCG, its means that the nucleotide motif "TGTCG" was inserted between the nucelotide 1155 and 1156.

    - Deletion+Insertion (delins) :

    If the deletion involves several nucleotides, we have the first nucleotide position concerned and the last, followed by the notation "delins", and ending with the inserted sequence.
    Example: c.1118_1145delinsCGTTTA, it means that the nucleotides 1118 to 1145 was deleted and replaced by the nucleotid motif "CGTTTA".

    If the deletion implies only one nucleotide, we have its position, followed by the notation "delins", and ending with the inserted sequence.
    Example: c.1154delinsTCTGTC, it means that only the nucleorid 1154 was deleted and replaced by "TCTGTC".

    In publications these notations can be modified. For example, in the case of a deletion+substitution, the deletion and substitution can be separated by a "/", or "delins" can be replaced by ">".

    - Substitution (>):

    If the substitution involves several nucleotides, we have the first nucleotide position concerned , followed by the last, followed by the substituted and the replacement sequences separated by " > ".
    Example: c.1149_1152GGAC>TTGTCA, it means that the substitution encompass nucleotide 1149 to 1152, corresponding to "GGAC", replaced by "TTGTCA".

    If the substitution involves only one nucleotide, we have its position, followed by the last of the replaced and the replacement nucleotides separated by ">".
    Example: c.1113A>G, it means that only the nucleorid 1113, corresponding for nucleotide "A", was replaced by nucleotide "G".


    - Complex :

    Complex cases accumulate at least two previous mutations type and use the same notation. The notations for each mutation are separated by ";".
    Example: [c.1113A>C;1122_1140del], in this case we have firt one substitution of the nuclotide 1113, corresponding for nucleotide "A", by nucleotide"G". Secondly, we have a deletion of nucleotides 1122 to 1140.

    The notation can be in "[ ]" or not.

  3. Protein ID

  4. The protein notation describes the mutation in the protein. All notation begins with "p.". In general cases, "p." is followed by the first mutated amino acid letter, its position and the replacement amino acid letter, then the notation "fs*" (for frameshift), and ending with the mutated sequence size.
    Example: p.K368Rfs*51, the amino acid "K", position 368, is the first amino acid muted, replaced by "R". The new sequence muted measures 51 amino acids

    Specific case:

    In the case where the variant protein sequence is identical to the wild-type CALR protein sequence, "fs*X" is replaced by "=".

    In the case where the mutation induces an anticipated stop codon without nucleotide replacement, the annotation ending with "*". Example: p.E370*.
    If the stop codon appears after an amino acid replacement, the notation stops after the replacement amino acid letter. Example: p.K368R.

    If several mutations occur at distant locations in the nucleotide sequence result to distant mutations on the protein, the replaced amino acids, their position and the replacement amino acid letter are separated by a "+".
    Example: p.E371D+K375Rfs*49, it means that amino acid "E", position 371, is the first amino acid muted by ponctuel substitution, and replaced by amino acid "D". The rest of the mutation beging to amino acid "K", position "375", replaced by "R", and produced a new muted sequence of 49 amino acids.

    Two protein notations can have the same first amino acid muted, but the following sequence can have different size. It depends to the original mutation.
    Also, a same muted protein can be caused to different genomic mutations.

  5. Mutation

  6. Indicates the mutation type that has taken places on the nucleotide sequence. There are 4 mutation types in the database:
    -Insertion: addition of one nucleotide or nucleotide segmen
    -Deletion+insertion: deletion of a nucleotide segment and replace it by a new one.
    -Substitution: replacement of one amino acid by another.
    -Complex: mutation of different types taking place at different positions.

  7. Type

  8. Thanks to the work lead by Kamplf and collaborator and Nangalia and collaborator, the different variants have been divided into 1-like type, 2-like type and other type. This typology was determined according to variant type 1 and type 2, which are the most frequently observed mutations. These variants each involve a different acidic amino acid segment. Three segments were identified: segment 1 from D362 to E364, segment 2 from E369 to D373 and segment 3 from E378 to D384. The 1-like variants lose the last 2 segments and the 2-like variants keep the 3 segments. (See Figure 1)


    Figure 1. Highlighting of the three acidic amino acid segments (in red) in CALR, wild type and mutated.


  9. Classification

  10. After observing a greater variety of mutated sequences in CALR-ET variants, we realized that the classical typology described above was limited. A new classification has been proposed for this database.

    The majority of the variants have a common terminal sequence: "RMRRTRRKMRRKMSPARPRTSCREACLQGWTEA". This sequence is found in the of the first 3 variants classes:

    - Class A: Losses of the last two acidic amino acid segments (ref typology) and have the common terminal sequence.

    - Class B: Loss of any acidic amino acid segment a (ref typology) and have the common terminal sequence.

    - Class C: Loss of the last acidic amino acid segment (ref typology) and have the common terminal sequence.

    The last two classes don’t have the common terminal sequence.

    - Class D: short sequence with variable termination.

    - Class E: Sequence retaining reticulim retention pattern (KDEL), like the wild type CALR sequence.

    Example of variant protein sequence of different classses can be see in Figure 2.


    Figure 2. Highlighting of the specific sequence of different classes.

  11. Category

  12. 57 variants have been identified in the literature and named from type 1 to 57 (Kamplf, T. et al., 2013; Pietra, D. et al., 2016). They represent the first CALR mutations observed in the chronic myeloproliferative neoplasms (MPNs) cases. In the Kamplf and collaborator publication we can also found two germinal variant, noted "Type G1" and "Type G2". If a variant is one of the 57 identified or germinal variants, its number appears in this section, if not it’s replaced by "#NA".

    You can see all categorized variants on the "List" page, by clicking on the "List of categorized variants" button.

  13. COSMIC code

  14. Genomic Mutation ID is a unique code to a genomic mutation and allows the variant to be found in the COSMIC database (https://cancer.sanger.ac.uk/cosmic; Tate et al., 2018), if it is exist there. It begins by "COSV" and followed by a unique number. If it doesn't exist, it’s replaced by "#NA" notation. It indicate the definitive position of the variant on the genome.

  15. Pathology

  16. Indicates the different pathologies for which the variant has been observed. Each variant is involved at least in Essential thrombocythemia, exepted for the categorized variants.

Structural data

    Secondary structure prediction and a 3D model are available for the majority of variants.
    All sequences begin to amino acid A352.


  1. Secondary structure prediction

  2. The secondary structure prediction of the variants was generated with PSIPRED v.4.0 (http://bioinf.cs.ucl.ac.uk/psipred/; Buchan, D. W. and al., 2019). Three informations are provided (see Figure 3) :

    - Conf: provides the confidence level of the prediction associated with an amino acid position. The confidence level ranges from 0(low) to 9(high).

    - Pred: Prediction of the secondary structure, the coils (C) are indicated in blue and the helix (H) in red.

    - AA: The protein sequence of the variant. Amino acids are shown in red and basic amino acids are shown in blue.



    Figure 3. Secondary structure prediction of type 1 variant, with PSIPRED


  3. 3D modelling

  4. 3D model variants was generated by Modeller v.9.24 (https://salilab.org/modeller, Sali et Blundell, 1993).

    A template was found with PSI-BLAST for modeling type 1 (class A) and type 2 (class B) variant. Constraints were added on the helix positions, according to the analysis of ther secondary structure prediction. The type 1 and type 2 model was used as template for the class A and class B variants respectively.

    For class C variants, the 3D structure of the class A or B variant with the highest percentage of sequence identity was used as the template.

    For class D and class E variant, WT CALR structural model (PDB code: 6ENY_G) was used as template. If the protein mutation is only a ponctual substitution or a early stop codon, the WT CALR structural model was directely modified on PyMOL (The PyMOL Molecular Graphics System, Version 0.99rc6 Schrödinger, LLC).

Remarks

    Some categorized variants may be presented with different nucleotide notation in publications. Only one notation is listed for each type in the database.
    For example, the variant type 1 can be found with the notation c.1099_1150del or c.1092_1143del.
    In the database the variant type 1 has the notation c.1099_1150del.


References

Buchan DWA, Jones DT (2019). The PSIPRED Protein Analysis Workbench: 20 years on. Nucleic Acids Research. https://doi.org/10.1093/nar/gkz297.

Jones, D.T. (1991) The application of fractal clustering to efficient molecular ray-tracing on low-cost computers. J. Mol. Graphics. 9, 249-253.

Kampfl, T., Gisslinger, H., Harutyunyan, A. S., Nivarthi, H., Rumi, E., Milosevic, J. D., ... & Chen, D. (2013). Somatic mutations of calreticulin in myeloproliferative neoplasms. New England Journal of Medicine, 369(25), 2379-2390.

Nangalia, J., Massie, C. E., Baxter, E. J., Nice, F. L., Gundem, G., Wedge, D. C., ... & Aziz, A. (2013). Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. New England Journal of Medicine, 369(25), 2391-2405.

Pietra, D., Rumi, E., Ferretti, V. V., Di Buduo, C. A., Milanesi, C., Cavalloni, C., ... & Bellini, M. (2016). Differential clinical effects of different mutation subtypes in CALR-mutant myeloproliferative neoplasms. Leukemia, 30(2), 431-438.

Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of molecular biology, 234(3), 779-815.

Tate, J. G., Bamford, S., Jubb, H. C., Sondka, Z., Beare, D. M., Bindal, N., ... & Fish, P. (2019). COSMIC: the catalogue of somatic mutations in cancer. Nucleic acids research, 47(D1), D941-D947.