SWORD (SWift and Optimized Recognition of structural Domains) is an automated method that identifies protein domains using information on protein internal contacts between the residues.
For a given protein structure, SWORD can provide multiple alternative decompositions into domains.

To have more informations on methods implemented in SWORD2 please look at:

Postic G, Ghouzam Y, Chebrek R, Gelly J-C. Science Advances (2017) 3:e1600552

Gelly J-C, Lin H-Y, de Brevern AG, Chuang T-J, Chen F-C. Genome Biol Evol (2012) 4:966-975

Gelly J-C, de Brevern AG. Bioinformatics (2011) 27(1):132-3

Gelly J-C, Etchebest C, Hazout S, de Brevern AG. Nucleic Acids Res (2006)

Gelly J-C, de Brevern AG, Hazout S. Bioinformatics (2006) 22:129-133

SWORD2 main algorithm


SWORD takes as input a protein structure in .pdb format (can be either a PDB id from the PDB database, or proprietary structure or ID from the AlphaFold Protein Structure Database) and performs protein domain assignment using a partitioning algorithm based on hierarchical clustering of Protein Units. The web-server implements a method published in 1,2,3,4.

Protein Units (PUs) are evolutionarily preserved protein substructures describing the protein architecture at an intermediate level between secondary structures and domains (1,2,3,4). SWORD identifies PUs using Protein Peeling algorithm (4), which translates protein Cα-Cα distances into a distance probability matrix and performs its dissection by optimising 'partition index' reflecting structural independence of the subunits. The details and evolution of the method are provided in (2,3,4) and in the corresponding web service: Peeling 3.

pus

After PU identification SWORD gradually merges PUs and finds optimal domain delineation using two criteria: the separation (σ) and the compactness (κ). High value of σi,j indicates a high number of contacts between PUs i and j, meaning that these PUs are good candidates to be merged. Compactness criterion κi,j measures the contact density of the protein domain resulting from merging PUs i and j, and thus its high value that merging of PUs is favourable (1).

The optimal domain assignment is chosen according to the highest domain compactness. We also assess the quality of assignment using an estimation as a function of the Euclidean distance between the decomposition i and the threshold of the acceptance region.

Then, for each structure we also report a measure of structural ambiguity, called the A-index which is introduced in (1). This measure is similar to the Hirsch index (h-index) used in scientometrics, except that it is based on the decomposition quality. Thus, a protein structure with an A-index of 3 has at least three different decompositions, each with a quality of 3 (
) or more.

The full description of the method is provided in (1).

Finally, the structural units, i.e. structural domains and Protein Units are assessed using the scoring function TIG-score Total Information Gain score (5) which allows to predict the native-like character of a 3D structure. This score is designed to behave like the Gibbs free energy and is often referred to as "pseudo-energy".

Based on pairwise distances, the value of the TIG tends to increase with increasing macromolecule size. To overcome this bias, a Z-score of the TIG is computed, following the atom shuffling method (7): for each domain, 2000 random sequence decoys are generated, the pseudo-energies of which follow a non-normal distribution of parameters μ and σ. The Z-score is calculated as (E − μ)/σ and, therefore, expresses the distance (in standard deviations σ) between the pseudo-energy of the domain E and those of the random decoys. To provide users a direct interpretation of the Z-score value, SWORD2 outputs the probability estimated by Chebyshev's inequality that we call Autonomous Unit Likelihood (AUL). Thus, for a substructure with a Z-score of −2.0, the probability of not observing the same pseudo-energy for a randomly delineated domain reaches 75%. This probability of being native reaches 94% for a domain with a Z-score of −4.0. The higher the probability, the more likely it is that the region under consideration is capable of autonomous folding.

SWORD2 Output


SWORD provides both a user-friendly interface for visualisation of the dissection results and raw output files available for download in the form of .tar.gz archive.

The output folder contains following files:

results/

logs/