About - MDPD

MDPD - Microbiome Database of Pulmonary Diseases

1. Data Summary

Amplicon-16S, Amplicon-ITS, and whole metagenome sequencing.
430 BioProjects and 59,362 runs/samples.
19 different pulmonary diseases and a healthy group.
278 subgroups different diseases and healthy.
10 human microbiome body sites.
Microbial information of Bacteria (n = 2296), Eukaryota (n = 219), Virus (n = 193), and Archaea (n = 13).

2. Features of MDPD

MDPD captures the dynamics of the microbes in different human body sites, including their,

Composition and abundance,
Association of the microbes with various covariates (age, gender, smoking status),
Microbial markers for different groups (diseases/healthy) and their subgroups,
Cross-disease or healthy subgroup comparisons, and
Microbial community structure.

3. Enhancing data quality

Systematically extracted and curated relevant metadata of the groups from public databases (NCBI, ENA, PubMed) and related research articles.
Re-analyses of the raw data using state-of-the-art methods.
Implemented rigorous quality control (QC) methods and stringent criteria to ensure only high-quality data was included.

4. Ensuring re-usability

BIOM files (.rds) stored in the database.
Curated metadata stored in the database.
Users can also further analyze the data, e.g., according to country, gender, age, smoking status, and many other available metadata.

5. Data availability

A summary of BioProjects is available here.
The source code of MDPD is available in a GitHub repository - https://github.com/PulmonomicsLab/mdpd
Batch download of the BIOM files:
- Obstructive (ZIP, 59MB) - Download
- Restrictive (ZIP, 48MB) - Download
- Infectious (ZIP, 100MB) - Download
- Malignancy (ZIP, 36MB) - Download
- Vascular (ZIP, 1.5MB) - Download
- Healthy (ZIP, 56MB) - Download
For dataset usage, please cite the Zenodo link .

6. Tools, libraries, and packages used

Name	Tool/Package/Library	Version	Usage
Trim galore	Perl wrapper	0.6.7	Quality trimming of the raw FastQ reads.
dada2	R package	1.26	Taxonomic inference from Amplicon-16S and Amplicon-ITS sequencing data.
Kraken2	Tool	2.1.3	Taxonomic inference from Whole Metagenome Sequencing data.
Bracken	Tool	3.1	Re-estimating species' relative abundance at the level using kmer information from the Kraken2 report file.
Pavian	Web application	1.0	To explore the classified taxonomic reads (%) from Whole Metagenome Sequencing data.
phyloseq	R package	1.42.0	Manipulation of the .biom files.
psadd	R package	0.1.3	Generate interactive Krona plots using the Kronatools.
file2meco	R package	0.7.1	Conversion of the Phyloseq object to a microtable object
microeco	R package	1.8.0	Detecting taxonomic abundance at the genus/species level. Microbial marker identification using LEfSe (Kruskal-Wallis’ test and Wilcoxon rank-sum tests), ALDEx2 (t-test), LinDA, and ANCOM-BC2 with Benjamini-Hochberg (BH) method for FDR correction.
Maaslin2	R package	1.12.0	Finding the association of the microbes with the covariates, such as age groups and gender, using linear mixed models.
MMUPHin	R package	1.12.1	Batch correction for microbiome data.
bugphyzz	R package	1.0	Functional annotation of the identified microbes.
taxonomizr	R package	0.11.1	Fetch the NCBI taxonomy information, such as taxonomy IDs.
mboost	R package	2.9.11	Model-based gradient boosting for generating co-occurrence networks.
boot	R package	1.3.30	Bootstrap Resampling for making the co-occurrence networks.
Plotly.js	JavaScript library	3.0.1	Creating interactive plots.
Cytoscape.js	JavaScript library	3.31.2	Building microbial co-occurrence networks.
SVG 3D Tag Cloud jQuery plugin	JavaScript library	-	Drawing the 3D interactive plot on the home page

© 2025 Bose Institute. All rights reserved. For queries, please contact Dr. Sudipto Saha (ssaha4@jcbose.ac.in, ssaha4@gmail.com).