Differential expression analysis with DESeq2¶
We can use DESeq2 to compare gene expression differences in samples between experimental conditions.
Quickstart: Running DESeq2 via elvers¶
We recommend you run deseq2
via the diffexp subworkflow.
If you want to run it as a standalone program instead, you need to have generated read quantification data via salmon
.
1) If you have salmon results, run:
elvers examples/nema.yaml deseq2
2) If not, you need to run salmon
and any other missing steps. It's probably best to run the diffexp subworkflow, but you can also try:
elvers examples/nema.yaml salmon deseq2
DESeq2 Commands¶
This pipeline uses snakemake to run a few R scripts to conduct basic differential expression analysis. We read in transcript abundance information (generated with salmon) via tximport. Note that in the salmon step, we combine files of all "units" within a sample in order to then conduct differential expression at the sample level.
We assume the assembly has a gene-to-transcript map, such as the one produced via trinity. This is a tab separated file (transcript \t gene
) that enables count data to be aggregated at the gene level prior to differntial expression analysis. This is recommended, see Soneson et al, 2016. However, if you do not have this mapping, we provide an option to conduct differential expression at the transcript level via the config (see "Customizing DESeq2 Parameters" section, below).
After reading in count data, we take in two additional pieces of information: first, the sample names in the samples.tsv
document, and second the desired contrast
, provided as part of the DESeq2 parameters, below. We store all data in an .rds
r data format to support easy reloading of this data for additional user analyses. In addition, we plot a PCA of the normalized counts and perform a standard DESeq2 analysis and print a tsv
of results for each contrast specified in
the deseq2 params.
You can find these R scripts in the elvers
github repo. The snakemake rules and scripts were modified from rna-seq-star-deseq2 workflow and our own data analysis and workshops, e.g.DIBSI-RNAseq.
Modifying Params for DESeq2¶
Be sure to set up your sample info and build a configfile first (see Understanding and Configuring Workflows).
To see the available parameters for the deseq2
rule, run
elvers config deseq2 --print_params
This will print the following:
#################### deseq2 ####################
deseq2:
contrasts:
time0-vs-time6:
- time0
- time6
gene_trans_map: true
pca:
labels:
- condition
#####################################################
The default contrasts
reflect the condition
information in the test data nema_samples.tsv
. Please modify the contrasts to the reflect your data. Multiple contrasts should be supported: each contrast needs a name, and a list below it specifying the conditions to compare, e.g.:
contrasts:
my-contrast:
- conditionA
- conditionB
The pca labels
should not be changed unless you need to change the name of the condition
column in the samples.tsv
. This functionality hasn't been extensively tested, so file an issue if something goes wrong!
Be sure the modified lines go into the config file you're using to run elvers
(see Understanding and Configuring Workflows).
References¶
Additional links:
- DE lecture by Jane Khudyakov, July 2017
- Example DE analysis from two populations of killifish! (Fundulus heteroclitus MDPL vs. MDPL)
- A Review of Differential Gene Expression Software for mRNA sequencing
Snakemake Rules¶
For snakemake afficionados, see the deseq2 rules on github.