LncPTPred is a Machine Learning (ML) based tool to predict the interaction between Long non-coding RNA (lncRNA) and Protein. It has been executed in 2 phases: Data Curation & Machine Learning. In the Data curation phase, data has been collected from Photoactivatable Ribonucleoside-enhanced Crosslinking and Immunoprecipitation (PAR-CLIP) and High-throughput Sequencing of RNA isolated by Crosslinking Immunoprecipitation (HITS-CLIP) based experimental assay to extract RNA binding position corresponding to given protein. Then, they are rigorously pre-processed against LncRbase v.2 and Ensembl biomart databases to pull out positively bound lncRNA. Finally, the non-interacting segments are screened to generate negative data. Within the interacting segments, we have checked for specific sequence motifs which affect lncRNA’s binding affinity. This has been shown using motif binding plot.

Various Machine Learning models have been trained using this extracted dataset. Here we have incorporated Stacking Ensemble based Meta classifiers using Scikit-learn. Models are built in two layers; layer1 consisting of LightGBM and XGBoost models whereas layer2 stacks the model generated from layer1 using Logistic Regression based meta-classifier. The distinctive feature of our tool is its ability to predict the interacting lncRNA segments and corresponding binding probabilities (in the form of Final_Interacting_Score) for a particular protein.

Standalone version of LncPTPred tool can be accessed from here.

Run your Analysis from here.