Sourmash_Compute Subworkflow¶
Subworkflows combine tools in the right order to facilitate file targeting withing elvers
. The "sourmash_compute" subworkflow conducts read quality trimming and kmer trimming prior to sourmash compute of the kmer-trimmed files. It currently also computes sourmash signatures for an assembly as well, which needs to be provided, either by running an assembly or providing one in your configfile. At the moment, this workflow consists of:
- get_data - an
elvers
utility - trimmomatic
- sourmash
Quickstart¶
If you've generated an assembly, even if you've already run elvers examples/nema.yaml assemble
:
1) "Run" trinity assembly at the same time. If you've already run the assembly, elvers
will just locate your assembly file for sourmash_compute
.
elvers examples/nema.yaml assemble sourmash_compute
2) OR, Pass an assembly in via get_reference
, with an assembly specified in your config file.
elvers get_reference sourmash_compute
In the configfile:
get_reference:
reference: examples/nema.assembly.fasta
gene_trans_map: examples/nema.assembly.fasta.gene_trans_map #optional
reference_extension: '_input'
This is commented out in the test data yaml, but go ahead and uncomment (remove leading #
) in order to use this option. If you have a gene to transcript map, please specify it as well. If not, delete this line from your config
. The assembly_extension
parameter is important: this is what allows us to build assemblies from several different assemblers on the same dataset. Feel free to use _input
, as specified above, or pick something equally simple yet more informative. Note:
Please don't use additional underscores (_
) in this extension!. For more details, see the get_reference documentation.
Configuring the sourmash_compute subworkflow¶
To set up your sample info and build a configfile, see Understanding and Configuring Workflows.
If you want to add the sourmash_compute
program parameters to a previously built configfile, run:
elvers config.yaml sourmash_compute --print_params
A small set of parameters should print to your console:
#################### sourmash_compute ####################
get_data:
download_data: false
use_ftp: false
trimmomatic:
adapter_file:
pe_path: ep_utils/TruSeq3-PE-2.fa
se_path: ep_utils/TruSeq3-SE.fa
extra: ''
trim_cmd: ILLUMINACLIP:{}:2:40:15 LEADING:2 TRAILING:2 SLIDINGWINDOW:4:15 MINLEN:25
sourmash:
k_size: 31
scaled: 1000
extra: ''
#######################################################
Override default params for any program by placing these lines in your yaml
config file, and modifying values as desired. For more details, see Understanding and Configuring Workflows.For more on what parameters are available, see the docs for each specific program or utility rule: