PVT


HOME	MANUAL	PVT-CLOUD	DOWNLOAD	CONTACT

PREREQUISITES: To use PVT, the following programs must be present in your PATH:

samtools
bamtools
bowtie2

BOWTIE2 indices:Illumina's iGenomes project can be found here.

EXECUTABLES: PVT binary executables can be downloaded from here. These executables were built for 64 bit Ubuntu Linux. Extract the zip and put the extracted object files in the environment path variable. The following command line instructions can be used from the terminal.

#tar –xzvf
#cd pvt_v1
#cp * /path_variable/

The package does not require installation.

INSTRUCTIONS

1. Before executing PVT for several replicates in batch, the following are steps are required.
(a) Download genes.gtf (transcript annotations) and genome.fa (whole genome fasta sequence) of the concerned organism from Download Indices and Annotations.
(b) Generate gene.juncs and genes.fa which would serve as reference for subsequent spliced alignment analysis.
(c) Build bowtie indices of genes.fa thus generated. Use the following commands from the terminal

#mkdir out_dir
#gtf_juncs genes.gtf > out_dir/genes.juncs
#gtf_to_fasta --output-dir out_dir/ --gtf-annotations genes.gtf --gtf-juncs out_dir/genes.juncs genes.gtf genome.fa out_dir/genes.fa
#bowtie2-build out_dir/genes.fa out_dir/genes

2. The steps for the PVT for single-end and paired-end reads are as shown in the figures below. The following shell scripts for single end and paired end can be downloaded from here.

Single End Read Analysis: Edit the shell script, replacing the input.fastq with the desired input file. Make separate shell scripts for each replicate. Execute the shell scripts in your home machine.

#wget http://bicresources.jcbose.ac.in/zhumur/pvt/steps/single_end/single_end.sh
#sed -i 's/input.fastq/desired_input.fastq/g' single_end.sh
#sh single_end.sh

Paired End Read Analysis: Execution for paired-end reads, requires two machines: denoted as the Home Machine and Remote Machine.
For promptless execution of the scripts, configure ssh as shown here.
Download the monitor.sh script in the remote machine and execute. Edit the script to change the IP address of the home and remote machines. Execute shell scripts in your remote machine.

#wget http://bicresources.jcbose.ac.in/zhumur/pvt/steps/single_end/single_end.sh
#sh monitor.sh

In the home machine, Download the paired_end.sh script and edit the shell script, replacing the input_1.fastq and input_2.fastq with the desired input files (input_1.fastq and input_2.fastq are the left and right mate pair reads respectively). Make separate shell scripts for each replicate. Execute the shell scripts in your home machine.

#wget http://bicresources.jcbose.ac.in/zhumur/pvt/steps/paired_end/paired_end.sh
#sed -i 's/input_1.fastq/desired_input_1.fastq/g' paired_end.sh
#sed -i 's/input_2.fastq/desired_input_2.fastq/g' paired_end.sh
#sh paired_end.sh

Links

Samtools [Source] [Homepage]
Li H. et. al (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.

Bamtools [Source] [Homepage]
Barnett, Derek W., et al. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27.12 (2011): 1691-1692.

Bowtie2 [Source] [Homepage]
Langmead B et. al Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.

Download Indices and Annotations

Prebuilt Indices for PVT

Related Tools

TopHat2 [Source] [Homepage]
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol.14,R36 (2013)

MapSplice [Source] [Homepage]
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010)

GEM [Source] [Homepage]
Marco-Sola, S., Sammeth, M., Guigó, R. & Ribeca, P. The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012).

GSNAP [Source] [Homepage]
Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

QPALMA [Source] [Homepage]
F De Bona et.al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008 Aug 15;24(16).

Figure: PVT pipeline and its order of execution for single end reads.

Figure: PVT pipeline and its order of execution for paired end reads.

Note: If you find PVT (Pipelined Version of TopHat) useful, please cite us at: PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis