Skip to content

Sourmash

sourmash is a command-line tool and Python library for computing MinHash sketches from DNA sequences, comparing them to each other, and plotting the results. This allows you to estimate sequence similarity between even very large data sets quickly and accurately. Please see the mash software and the mash paper (Ondov et al., 2016) for background information on how and why MinHash sketches work.

Sourmash is dib-lab software! Please see the sourmash documentation for more on sourmash. Sourmash 2.0 is coming soon. In the meantime, please cite Brown and Irber, 2016

At the moment we have only enabled sourmash compute functionality.

Quickstart

Run Sourmash as part of the "default" Eel Pond workflow or via the sourmash_compute subworkflow. At the moment, sourmash compute requires both an assembly and a set of reads as input. Please see the sourmash_compute subworkflow for how to run sourmash compute properly.

Sourmash Command

On the command line, the command elvers runs for each file is approximately:

sourmash compute --scaled 1000 \
  -k 31 input_file -o output.sig

Output files:

Your main output directory will be determined by your config file: by default it is BASENAME_out (you specify BASENAME).

Sourmash will output files in the sourmash subdirectory of this output directory. Sourmash signatures will have the same name as the file they're generated from, but end with .sig instead of .fasta or .fq.gz.

Modifying Params for Sourmash:

Be sure to set up your sample info and build a configfile first (see Understanding and Configuring Workflows).

To see the available parameters for the sourmash rule, run

elvers config sourmash --print_params

This will print the following:

  ####################  sourmash  ####################
sourmash:
  k_size: 31
  scaled: 1000
  extra: '' 
  #####################################################

In addition to changing parameters we've specifically enabled, you can modify the extra param to pass any extra sourmash parameters, e.g.:

  extra: ' --track-abundance '

Be sure the modified lines go into the config file you're using to run elvers (see Understanding and Configuring Workflows).

See the sourmash documentation to learn more about the parameters you can use with sourmash compute.

Sourmash elvers rule

We use a slightly modified version of the sourmash snakemake wrapper to run Sourmash compute via snakemake.

For snakemake afficionados, see our sourmash rules on github.