This repository contains a Nextflow pipeline for processing raw (sc) TRIBE data
Raw reads are processed closely following the GATK best-practices workflow for RNAseq short variant discovery. Raw reads go through the following steps:
- Adapter and homopolymer sequence trimming
- Depleting rRNA sequences
- Aligning to the reference genome
- Deduplication
- Variant identification and filtering
Create a nextflow.config in your data directory, setting, e.g.,:
params {
// Workflow flags
reads = "fastq/*_R{1,2}.fastq.gz"
outdir = 'TRIBE'
genomeAnnotations = "group_references/ensembl/95/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.95.gtf"
genomeFasta = "group_references/ensembl/95/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.dna.toplevel.ERCC92.fa"
depleteFasta = "dmel_rRNA.fa"
knownVariants = "group_references/ensembl/95/drosophila_melanogaster/drosophila_melanogaster.vcf.gz"
intervalList = "group_references/ensembl/95/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.dna.toplevel.ERCC92.complete.interval_list"
exonsIntervals = "reference/95/Drosophila_melanogaster.BDGP6.95_exons.interval_list"
intronsIntervals = "reference/95/Drosophila_melanogaster.BDGP6.95_introns.interval_list"
exonsBed = "group_references/ensembl/95/drosophila_melanogaster/exons_unique.bed.gz"
}
and running nextflow run main.nf -resume
You may process bulk data by adding -profile bulk