Linear Motif Domain Interaction Prediction, abbreviated as "LMDIPred", is a web server that detects the occurrence of peptides conforming to linear motifs mediating Protein-Protein Interactions (PPIs) with SH3, WW and PDZ domains, in user-provided amino-acid sequence(s). ( Sarkar et al. PLoS One. 2018. doi: 10.1371/journal.pone.0200430.)

A comparison of the total number of SwissProt proteins that are known to contain these three domains from all organisms and only humans (as on July 10, 2017) is shown in the following figure:

Datasets:

Positive dataset. A non-redundant dataset consisting of 115 SH3-domain binding 6-mer peptides, 140 WW-domain binding 6-mer peptides and 165 PDZ-domain binding 4-mer peptides, were created from the LMPID database, to be used as positive training examples for the respective class of peptides.

Download LMDIPred Positive datasets:

Negative dataset.A set of 3960 fasta-formatted protein sequences [3192 from Oryza sativa subsp. japonica (short-grained Asian rice), 400 from Solanum tuberosum (potato), and 368 from Triticum aestivum (common wheat)] were downloaded from UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase. Perl scripts were used to extract 6-residue (for SH3 & WW) and 4-residue (for PDZ) long peptides from random positions within these sequences, and a set of 120 such random peptides were used as negative training examples for each class of peptide ligands.

Download Negative dataset:    Random peptide Instances

Independent dataset.The indepedent dataset was composed of 62 experimentally validated PDZ-binding 10-mer mouse peptides from Stiffler et al [PubMed ID: 17641200], and 25 experimentally validated SH3-binding yeast peptides of variable length from Tonikian et al [PubMed ID: 19841731].

Download Independent datasets:

Table 1: Overview of the datasets for each class of ligand motifs :

Domain Positive Training Data Negative Training Data Approx Ratio (Positive:Negative)
SH3 115 425 ~1:4
WW 140 400 ~1:3
PDZ 165 375 ~1:2
 

Table 2: Performance analysis of SVM models for each class of ligand motifs using different input features :

[ The area under the Receiver Operating Characteristic (ROC) curve, or "AUC" ("Area Under Curve"), is an estimate of the accuracy of the prediction method, and can be interpreted as the probability that the classifier will assign a higher score to a randomly chosen positive example than to a randomly chosen negative example. AUC of 100%-90% denotes excellent prediction, and the accuracy decreases with the AUC values such that AUC <= 50% denotes incorrect or random prediction.]

Input Features AUC (%) values for different domains
  SH3 WW PDZ
Amino Acid Composition (AAC) 88.05 93.54 92.31
Dipeptide Composition (DPC) 86.79 96.33 93.65
Tripeptide Composition (TPC) 94.72 96.11 92.44
AAC + DPC 94.63 97.77 93.98
AAC + TPC 95.56 97.86 97.69
DPC + TPC 95.34 97.58 94.89
AAC + DPC + TPC 97.45 98.35 90.49
 

Table 3A: Performance of different prediction methods in 5-fold cross-validation for the SH3 domain binding peptides :

Method Threshold Sensitivity Specificity Accuracy MCC
SVM Prediction -0.25 0.94 0.95 0.95 0.85
PSSM Scanning 1.00 0.70 0.93 0.88 0.62
Motif Instance Matching NA 0.17 1.00 0.83 0.36
Regular Expression Scanning NA 0.81 0.91 0.89 0.67
 

Table 3B: Performance of different prediction methods in 5-fold cross-validation for the WW domain binding peptides :

Method Threshold Sensitivity Specificity Accuracy MCC
SVM Prediction -0.05 0.96 0.96 0.96 0.90
PSSM Scanning 0.50 0.88 0.84 0.85 0.66
Motif Instance Matching NA 0.13 1.00 0.78 0.29
Regular Expression Scanning NA 0.89 0.99 0.97 0.91
 

Table 3C: Performance of different prediction methods in 5-fold cross-validation for the PDZ domain binding peptides :

Method Threshold Sensitivity Specificity Accuracy MCC
SVM Prediction -0.10 0.92 0.93 0.92 0.83
PSSM Scanning 0.60 0.69 0.95 0.86 0.68
Motif Instance Matching NA 0.30 1.00 0.78 0.47
Regular Expression Scanning NA 0.77 0.87 0.83 0.63
 
 

Table 3: ROC plots of different prediction methods for different datasets : [Green (SVM Prediction), Blue (PSSM Scanning), Black (Motif Instance Matching) and Red (Regular Expression Scanning). ROC plots for MIM and RES methods appear as smooth flat lines when compared to the plots for SVM and PSSM, because SVM and PSSM outputs comprise of continuous scores, while the MIM and RES produce discrete outcomes, one or zero (either “match” or “mismatch”)]

(A) SH3 domain   (B) WW domain   (C) PDZ domain
   
 
 
 

© 2015 Bose Institute, Kolkata. All rights reserved

For queries and suggestions please contact Dr. Sudipto Saha (ssaha4@jcbose.ac.in, ssaha4@gmail.com)