LSDDB

A currative database for lysosomal storage disorder

Home Classifications SNPs References Help About

Tools Used

PANTHER

PANTHER (Protein Analysis Through Evolutionary Relationships) is a database of protein coding genes with functional as well as evolutionary relationships. This program depends on HMMs (Hidden Markov Models) library from MSA (Multiple Sequence Alignment) of various protein families. It is also to be noted that if none of the protein families present in HHMs libraries are mapped with the input mutation, PANTHER program will not return an output.

Reference : Mi H, Ebert D, Muruganujan A, et al (2021) PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Research 49:D394–D403. https://doi.org/10.1093/nar/gkaa1106

PhD-SNP

Based on a SVM based classifier, PhD-SNP predicts whether an SNP is disease related or neutral. PhD-SNP gives output score ranging from 0-1 where score above 0.5 is considered to be disease associated. PANTHER program depends on HMMs (Hidden Markov Models) library from MSA (Multiple Sequence Alignment) of various protein families. If PANTHER program is not able to find the input mutation to any other protein families in the library, no output is returned.

Reference : Capriotti, E., Calabrese, R., Casadio, R. (2006) Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics, 22:2729-2734.

WS-SNPs&GO

SNPs&GO program predicts if the given variation has an impact on protein based on functional data coded by GO (Gene ontology) terms. Further, this is based on three roots; molecular function, the protein’s biological process and its cellular component. The input for this program is protein sequence information. The program computes the profile for the input through pairwise alignment from BLAST program. It is to be noted that for each mutation the program calculates vector that is a 51-element feature listed as follows; the mutation (20 values), sequence environment (20 values), sequence profile (5 values), output from PANTHER program (4 values), and functional annotation score (2 values). Based on the available data, one can use either SNPs&GO or SNPs&GO3d modules. Here the input can be served as protein sequence as raw format or FASTA format, or by uploading the FASTA sequence from the local system, or by giving the Swiss-Prot code of the protein. The revised SNPs&GO program was shown to have 81% accuracy and 0.61 as MCC (Mathew correlation coefficient) and 0.89 as area under the curve (AOC) of the ROC curve. This program gives its prediction as either Disease or Neutral. Also it gives the probability that the given input is disease related. If the probability score is >0.5 then the input mutation is predicted as deleterious or disease related.

Reference : Capriotti E, Altman RB, Bromberg Y (2013) Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14 Suppl 3:S2. https://doi.org/10.1186/1471-2164-14-S3-S2

SIFT

SIFT (Sorting Intolerant from Tolerant) predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations. This program is based on evolutionary data for its predictions pertaining functional properties of mutants. The output score of SIFT ranges from 0-1 where a score >0.05 is predicted to be neutral and those <0.05 are considered deleterious.

Reference : Sim N-L, Kumar P, Hu J, et al (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Research 40:W452–W457. https://doi.org/10.1093/nar/gks539

SNAP

SNAP is primarily based on neural networks and considers the biochemical aspects of the substitution and correlates it with the functional features of given protein thereby differentiating between neutral and disease related inputs. The scores of SNAP has a range between 100 to -100 where > denotes disease related variants and <100 is considered neutral. Meta-SNP gives the prediction based on an 8 element vector. Further this 8 element vector is composed of 2 groups of 4 elements each. 1st group consists of the scores of PANTHER, PhD-SNP, SIFT, SNAP. If any of these programs fail to give a prediction then the program takes the default threshold for making the prediction. The 2nd group consists of total of 4 elements from PHD-SNP. Where, first 2 elements are Fwt (Frequency of wild type) and Fmut (Frequency of mutant) residues at the site of mutation. 3rd element is the total number of aligned sequences (Nal) at mutation site. The 4th element is the CI (conservation index) which is the conservation of the mutated site.

Reference : Johnson AD, Handsaker RE, Pulit SL, et al (2008) SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24:2938–2939. https://doi.org/10.1093/bioinformatics/btn564.

Meta-SNP

Meta-SNP program which is a Meta predictor was used for predicting if mutations were disease related or neutral. This program does it by integrating few leading prediction programs. PhD-SNP predicts the mutations if they are disease related, and PANTHER, SIFT, SNAP which are based on function associations. PANTHER, SIFT, and SNAP could annotate the mutations as functionally disruptive relative to the wild type while PhD-SNP could predict disease associated mutations. On the whole Meta-SNP has 79% accuracy with MCC of 0.59 and AUC of 0.87. It is also reported that accuracy of Meta-SNP is three times that of a single predictor.