Quantification with Salmon¶
We can use Salmon to quantify expression. Salmon is a (relatively) new breed of software for quantifying RNAseq reads that is both really fast and takes transcript length into consideration (Patro et al. 2015).
Quickstart¶
We recommend that you run salmon quantification via the "default" Eel Pond workflow or the quantify subworkflow. See "Advanced Usage" below for running salmon as a standalone rule.
Salmon Commands¶
There are two commands for salmon, salmon index
and salmon quant
. The first command, salmon index
will index the transcriptome:
salmon index --index nema --transcripts nema_trinity.fasta --type quasi
And the second command, salmon quant
will quantify the trimmed reads (not diginormed) using the transcriptome. For each pair of reads for a sample, we run:
salmon quant -i nema -l A -1 <(gunzip -c $R1) -2 <(gunzip -c $R2) -o ${sample_name}_quant
Both indexing the transcriptome and running quantification are integrated as rules in the elvers workflow, so the whole process happens in an automated fashion.
Modifying Params for Salmon¶
Be sure to set up your sample info and build a configfile first (see Understanding and Configuring Workflows).
To see the available parameters for the salmon
rule, run
elvers config salmon --print_params
This will print the following:
#################### salmon ####################
salmon:
input_trimmomatic_trimmed: True
index_params:
extra: ''
quant_params:
libtype: A
extra: ''
#####################################################
If you set input_trimmomatic_trimmed: False
in the salmon parameters, then salmon will use your raw input data instead of trimming first. Using trimmed data as input is recommended, this is just if you're pre-trimmed with another program!
In addition to changing parameters we've specifically enabled, you can modify the extra
param to pass any extra parameters.In salmon, both index
and quantification
steps can accept an extra
param. See the Salmon documentation to learn more about the parameters you can pass into salmon
.
Be sure the modified lines go into the config file you're using to run elvers
(see Understanding and Configuring Workflows).
Output files:¶
Your main output directory will be determined by your config file: by default it is BASENAME_out
(you specify BASENAME).
Salmon will output files in the quant
subdirectory of this output directory. Each sample will have its own directory, and the two most interesting files will be the salmon_quant.log
and quant.sf
files. The former contains the log information from running salmon, and the latter contains the transcript count data.
A quant.sf
file will look something like this.
Name Length EffectiveLength TPM NumReads
TRINITY_DN2202_c0_g1_i1 210 39.818 2.683835 2.000000
TRINITY_DN2270_c0_g1_i1 213 41.064 0.000000 0.000000
TRINITY_DN2201_c0_g1_i1 266 69.681 0.766816 1.000000
TRINITY_DN2222_c0_g1_i1 243 55.794 2.873014 3.000000
TRINITY_DN2291_c0_g1_i1 245 56.916 0.000000 0.000000
TRINITY_DN2269_c0_g1_i1 294 89.251 0.000000 0.000000
TRINITY_DN2269_c1_g1_i1 246 57.479 0.000000 0.000000
TRINITY_DN2279_c0_g1_i1 426 207.443 0.000000 0.000000
TRINITY_DN2262_c0_g1_i1 500 280.803 0.190459 1.000912
TRINITY_DN2253_c0_g1_i1 1523 1303.116 0.164015 4.000000
TRINITY_DN2287_c0_g1_i1 467 247.962 0.000000 0.000000
TRINITY_DN2287_c1_g1_i1 325 113.826 0.469425 1.000000
TRINITY_DN2237_c0_g1_i1 306 98.441 0.542788 1.000000
TRINITY_DN2237_c0_g2_i1 307 99.229 0.000000 0.000000
TRINITY_DN2250_c0_g1_i1 368 151.832 0.000000 0.000000
TRINITY_DN2250_c1_g1_i1 271 72.988 0.000000 0.000000
TRINITY_DN2208_c0_g1_i1 379 162.080 1.978014 6.000000
TRINITY_DN2277_c0_g1_i1 269 71.657 0.745677 1.000000
TRINITY_DN2231_c0_g1_i1 209 39.409 0.000000 0.000000
TRINITY_DN2231_c1_g1_i1 334 121.411 0.000000 0.000000
TRINITY_DN2204_c0_g1_i1 287 84.121 0.000000 0.000000
More on Salmon¶
For further reading, on salmon see
- Intro blog post: http://robpatro.com/blog/?p=248
- A 2016 blog post evaluating and comparing methods here
- Salmon github repo here
- https://github.com/ngs-docs/2015-nov-adv-rna/blob/master/salmon.rst
- http://angus.readthedocs.io/en/2016/rob_quant/tut.html
- https://2016-aug-nonmodel-rnaseq.readthedocs.io/en/latest/quantification.html
Advanced Usage: Running Salmon as a standalone rule¶
You can run salmon as a standalone rule, instead of withing a larger elvers
workflow. However, to do this, you need to make sure the input files are available.
For salmon, you need both 1) an assembly, and 2) trimmed input files. The assembly can be generated via another workflow, or passed to elvers
via the configfile.
Specifying an assembly:
1) If you've alread run read trimming and want to use a Trinity assembly generated via elvers
, you can run:
elvers my_config assemble salmon
If you've already run the assembly, elvers
will just use this info to locate that assembly.
2) Alternatively, you can input an assembly via the [get_reference](get_reference.md) utility rule:
```
elvers get_reference salmon
```
with an assembly in your `yaml` configfile, e.g.:
```
get_reference:
reference: examples/nema.assembly.fasta
gene_trans_map: examples/nema.assembly.fasta.gene_trans_map #optional
reference_extension: '_input'
```
This is commented out in the test data yaml, but go ahead and uncomment (remove leading `#`) in order to use this option. If you have a gene to transcript map, please specify it as well. If not, delete this line from your `config`. The `assembly_extension` parameter is important: this is what allows us to build assemblies from several different assemblers on the same dataset. Feel free to use `_input`, as specified above, or pick something equally simple yet more informative. **Note: Please don't use additional underscores (`_`) in this extension!**. For more details, see the [get_reference documentation](get_reference.md).
Specifying input reads:
If you haven't yet run read trimming, you'll also need to run those steps:
```
elvers myconfig get_data trimmomatic salmon
```
Or if you have set `input_trimmomatic_trimmed: False`:
```
elvers myconfig get_data salmon
```
Snakemake Rule¶
We wrote snakemake wrappers to run salmon index and salmon quant.
For snakemake afficionados, see the Salmon rule on github.