Skip to content

Differential expression analysis with DESeq2

We can use DESeq2 to compare gene expression differences in samples between experimental conditions.

Quickstart: Running DESeq2 via elvers

We recommend you run deseq2 via the diffexp subworkflow.

If you want to run it as a standalone program instead, you need to have generated read quantification data via salmon.

1) If you have salmon results, run:

elvers examples/nema.yaml deseq2

2) If not, you need to run salmon and any other missing steps. It's probably best to run the diffexp subworkflow, but you can also try:

elvers examples/nema.yaml salmon deseq2

DESeq2 Commands

This pipeline uses snakemake to run a few R scripts to conduct basic differential expression analysis. We read in transcript abundance information (generated with salmon) via tximport. Note that in the salmon step, we combine files of all "units" within a sample in order to then conduct differential expression at the sample level.

We assume the assembly has a gene-to-transcript map, such as the one produced via trinity. This is a tab separated file (transcript \t gene) that enables count data to be aggregated at the gene level prior to differntial expression analysis. This is recommended, see Soneson et al, 2016. However, if you do not have this mapping, we provide an option to conduct differential expression at the transcript level via the config (see "Customizing DESeq2 Parameters" section, below).

After reading in count data, we take in two additional pieces of information: first, the sample names in the samples.tsv document, and second the desired contrast, provided as part of the DESeq2 parameters, below. We store all data in an .rds r data format to support easy reloading of this data for additional user analyses. In addition, we plot a PCA of the normalized counts and perform a standard DESeq2 analysis and print a tsv of results for each contrast specified in the deseq2 params.

You can find these R scripts in the elvers github repo. The snakemake rules and scripts were modified from rna-seq-star-deseq2 workflow and our own data analysis and workshops, e.g.DIBSI-RNAseq.

Modifying Params for DESeq2

Be sure to set up your sample info and build a configfile first (see Understanding and Configuring Workflows).

To see the available parameters for the deseq2 rule, run

elvers config deseq2 --print_params

This will print the following:

  ####################  deseq2  ####################
deseq2:
  contrasts:
    time0-vs-time6:
    - time0
    - time6
  gene_trans_map: true
  pca:
    labels:
    - condition
  #####################################################

The default contrasts reflect the condition information in the test data nema_samples.tsv. Please modify the contrasts to the reflect your data. Multiple contrasts should be supported: each contrast needs a name, and a list below it specifying the conditions to compare, e.g.:

  contrasts:
    my-contrast:
      - conditionA
      - conditionB

The pca labels should not be changed unless you need to change the name of the condition column in the samples.tsv. This functionality hasn't been extensively tested, so file an issue if something goes wrong!

Be sure the modified lines go into the config file you're using to run elvers (see Understanding and Configuring Workflows).

References

Additional links:

Snakemake Rules

For snakemake afficionados, see the deseq2 rules on github.