MDPD - Microbiome Database of Pulmonary Diseases


1. Data

Microbiome Database of Pulmonary Diseases (MDPD) contains a total of 5970 runs compiled from 64 BioProjects. The R script for performing the computational analysis pipeline is available here. A brief summary of the BioProjects along with their external hyperlinks to the NCBI BioProject is given as follows:


BioProject IDRun CountGroupIsolation SourceBiomeAssay TypeNCBI BioProject hyperlinks
PRJEB12006 24AsthmaBronchial MucosaLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB12006
PRJEB26356 118AsthmaSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB26356
PRJEB27079 4AsthmaSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB27079
PRJEB13896 298AsthmaStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB13896
PRJEB24006 25AsthmaStoolGutWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB24006
PRJEB15534 72Asthma; ControlBronchial BrushLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB15534
PRJNA662456 100Asthma; ControlStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA662456
PRJNA434133 196Asthma; HealthyBAL; StoolLung; GutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA434133
PRJNA474717 59Asthma; HealthyStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA474717
PRJEB9033 8Control; Asthma; COPD; Lung CancerSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB9033
PRJNA415608 24Control; COPDLung TissueLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA415608
PRJNA302453 199Control; COPDSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA302453
PRJNA472758 22Control; Lung CancerLung BiopsyLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA472758
PRJNA647170 50Control; Lung CancerLung Tumor Tissue; Lung Normal TissueLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA647170
PRJNA668745 30Control; Lung CancerStoolGutAmplicon; WMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA668745
PRJNA554461 46Control; PneumoniaEndotracheal AspirateLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA554461
PRJNA678854 173Control; PneumoniaEndotracheal AspirateLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA678854
PRJNA507462 128Control; PneumoniaStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA507462
PRJNA664352 366Control; TuberculosisSputum; StoolLung; GutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA664352
PRJNA390194 4COPDBALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA390194
PRJNA316126 184COPDSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA316126
PRJNA377739 584COPDSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA377739
PRJNA418003 26COPDSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA418003
PRJNA322414 3COPDSputumLungAmplicon; WMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA322414
PRJEB47052 31COVID-19BALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB47052
PRJNA693784 26COVID-19BALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA693784
PRJNA747262 212COVID-19; ControlStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA747262
PRJNA684070 113COVID-19; HealthyStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA684070
PRJNA728736 8COVID-19; HealthyStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA728736
PRJNA624223 73COVID-19; HealthyStoolGutWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA624223
PRJEB13657 96Cystic FibrosisBALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB13657
PRJNA313226 14Cystic FibrosisSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA313226
PRJNA339813 94Cystic FibrosisSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA339813
PRJNA599290 68Cystic FibrosisSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA599290
PRJNA644204 12Cystic FibrosisSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA644204
PRJEB32062 25Cystic FibrosisSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB32062
PRJNA316056 12Cystic FibrosisSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA316056
PRJNA516870 79Cystic FibrosisSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA516870
PRJNA71831 2Cystic FibrosisSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA71831
PRJNA170783 299Cystic FibrosisStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA170783
PRJNA552270 25Cystic Fibrosis; HealthyColon MucusGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA552270
PRJEB44071 20Cystic Fibrosis; HealthyStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB44071
PRJNA438847 35Cystic Fibrosis; HealthyStoolGutWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA438847
PRJNA439311 102Healthy; COPDSputumLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA439311
PRJEB9034 18Healthy; COPDSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB9034
PRJNA562766 152Healthy; COPDStoolGutAmplicon; WMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA562766
PRJNA316588 18Healthy; COPD; Cystic FibrosisSputumLungWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA316588
PRJNA477678 7Healthy; Lung CancerSputum; BALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA477678
PRJEB26531 260Healthy; Lung CancerStoolLungAmplicon; WMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB26531
PRJNA507734 60Healthy; Lung CancerStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA507734
PRJNA736821 79Healthy; Lung CancerStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA736821
PRJNA622267 88Healthy; TuberculosisStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA622267
PRJNA331073 48Healthy; TuberculosisStoolGutAmplicon; WMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA331073
PRJNA401385 61Healthy; TuberculosisStoolGutWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA401385
PRJEB34172 60Lung CancerBALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB34172
PRJEB29934 134Lung CancerBAL; Lung Tumor Tissue; Lung TissueLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB29934
PRJNA592147 316Lung CancerSupraglottic SwabLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA592147
PRJEB33316 48Lung CancerStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB33316
PRJEB48780 69Lung CancerStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB48780
PRJNA606061 70Lung CancerStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA606061
PRJNA626477 19Lung CancerStoolGutWMShttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA626477
PRJNA305470 22PneumoniaBALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA305470
PRJNA449183 20PneumoniaBALLungAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA449183
PRJNA746092 332PneumoniaStoolGutAmpliconhttps://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA746092

2. R codes of computational analysis pipeline

                    ## R codes for computational analysis

# Required packages
library(microbiomeMarker)
library(mia)
library(phyloseq)

# Read the biom file
Biom_file = "Disease_SampleSource_AssayType.biom" # biom file name
Data = import_biom(Biom_file, parseFunction = parse_taxonomy_default)

# Removing the artefacts from 16s amplicon data
Data_1 = subset_taxa(Data, !is.na(Kingdom) & !Kingdom %in% c("Holozoa", "Eukaryota", "Nucletmycea"))
Data_2 = subset_taxa(Data_1, Phylum != "Incertae sedis")
Data_3 = subset_taxa(Data_2, Class != "Incertae Sedis")
Data_4 = subset_taxa(Data_3, !is.na(Order) & !Order %in% c("Gammaproteobacteria Incertae Sedis", "Oxyphotobacteria Incertae Sedis", "Alphaproteobacteria Incertae Sedis", "Incertae Sedis", "uncultured"))
Data_5 = subset_taxa(Data_4, !is.na(Family) & !Family %in% c("Rhizobiales Incertae Sedis", "Coriobacteriales Incertae Sedis", "Puniceispirillaes Incertae Sedis", "Bacteroidales Incertae Sedis", "Entomoplasmatales Incertae Sedis", "Micrococcales Incertae Sedis", "Actinomycetales Incertae Sedis", "Desulfotomaculales Incertae Sedis", "Nitrococcales Incertae Sedis", "Synechococcales Incertae Sedis", "Azospirillales Incertae Sedis", "Brachyspirales Incertae Sedis", "Eurycoccales Incertae Sedis", "Flavobacteriales Incertae Sedis", "uncultured"))
Data_6 = subset_taxa(Data_5, !is.na(Genus) & !Genus %in% c("Incertae Sedis", "uncultured"))

# Removing the artefacts from WMS data
Data_1 = subset_taxa(Data, Phylum != "Chordata")
Data_2 = subset_taxa(Data_1, Order != "Bacteroidetes Order II. Incertae sedis")
Data_3 = subset_taxa(Data_2,  !is.na(Family) & !Family %in% c("Clostridiales   Family XIII. Incertae Sedis", "Clostridiales Family XVII. Incertae Sedis", "Clostridiales Family XVI. Incertae Sedis", "Thermoanaerobacterales Family III. Incertae Sedis", "Thermoanaerobacterales Family IV. Incertae Sedis"))


# Agglomeration into genus rank for 16s amplicon data
Data_agglomerate_genus = phyloseq::tax_glom(Data_6, "Genus", NArm = TRUE)
# agglomerate into species rank for WMS data
Data_agglomerate_species = phyloseq::tax_glom(Data_3, "Species", NArm = TRUE)

# Normalization and analysis of the data

# Perform normalization and differential analysis using LEfSe at genus rank for 16s amplicon data
Lefse_genus = run_lefse(Data_agglomerate_genus, norm = "CPM", wilcoxon_cutoff = 0.05, group= "SubGroup", kw_cutoff= 0.01, multigrp_strat = TRUE, lda_cutoff = 2, taxa_rank = "Genus")
# Perform normalization and differential analysis using LEfSe at species rank for WMS data
Lefse_species = run_lefse(Data_agglomerate_species, norm = "CPM", wilcoxon_cutoff = 0.05, group= "SubGroup", kw_cutoff= 0.01, multigrp_strat = TRUE, lda_cutoff = 2, taxa_rank = "Species")

# Normalization of the data (transform abundances to percentage) for heatmap and merge sample BioProject-wise
merged.data = merge_samples(Data_6, group = "BioProject")
merged.percent = transform_sample_counts(merged.data, function(x) x*100/sum(x))

# Normalization of the data between 0-1 for heatmap
Data_normalize = function(x) {(x - min(x)) / (max(x) - min(x))}
Final_output = fun_range(x = Data_normalize)

# Normalization of the data (relative abundance) to get prevalent taxa
Data_summarized = makeTreeSummarizedExperimentFromPhyloseq(Data_agglomerate_genus/species)
Data_transform = transformSamples(Data_summarized, method = "relabundance")
prevalent_taxa = getPrevalentTaxa(Data_transform, detection = 0.0001, prevalence = 50/100, rank = "Genus/Species", sort = TRUE)
Prevalence_value = getPrevalence(Data_transform, detection = 0.0001, prevalence = 50/100)
                


© 2023 Bose Institute. All rights reserved. For queries, please contact Dr. Sudipto Saha (ssaha4@jcbose.ac.in, ssaha4@gmail.com).