Skip to content

Sourmash_Compute Subworkflow

Subworkflows combine tools in the right order to facilitate file targeting withing elvers. The "sourmash_compute" subworkflow conducts read quality trimming and kmer trimming prior to sourmash compute of the kmer-trimmed files. It currently also computes sourmash signatures for an assembly as well, which needs to be provided, either by running an assembly or providing one in your configfile. At the moment, this workflow consists of:

Quickstart

If you've generated an assembly, even if you've already run elvers examples/nema.yaml assemble:

1) "Run" trinity assembly at the same time. If you've already run the assembly, elvers will just locate your assembly file for sourmash_compute.

elvers examples/nema.yaml assemble sourmash_compute

2) OR, Pass an assembly in via get_reference, with an assembly specified in your config file.

elvers get_reference sourmash_compute

In the configfile:

get_reference:
  reference: examples/nema.assembly.fasta
  gene_trans_map:  examples/nema.assembly.fasta.gene_trans_map #optional
  reference_extension: '_input'

This is commented out in the test data yaml, but go ahead and uncomment (remove leading #) in order to use this option. If you have a gene to transcript map, please specify it as well. If not, delete this line from your config. The assembly_extension parameter is important: this is what allows us to build assemblies from several different assemblers on the same dataset. Feel free to use _input, as specified above, or pick something equally simple yet more informative. Note: Please don't use additional underscores (_) in this extension!. For more details, see the get_reference documentation.

Configuring the sourmash_compute subworkflow

To set up your sample info and build a configfile, see Understanding and Configuring Workflows.

If you want to add the sourmash_compute program parameters to a previously built configfile, run:

elvers config.yaml sourmash_compute --print_params

A small set of parameters should print to your console:

 ####################  sourmash_compute  ####################
get_data:
  download_data: false
  use_ftp: false
trimmomatic:
  adapter_file:
    pe_path: ep_utils/TruSeq3-PE-2.fa
    se_path: ep_utils/TruSeq3-SE.fa
  extra: ''
  trim_cmd: ILLUMINACLIP:{}:2:40:15 LEADING:2 TRAILING:2 SLIDINGWINDOW:4:15 MINLEN:25
sourmash:
  k_size: 31
  scaled: 1000
  extra: ''  
  #######################################################

Override default params for any program by placing these lines in your yaml config file, and modifying values as desired. For more details, see Understanding and Configuring Workflows.For more on what parameters are available, see the docs for each specific program or utility rule: