Sourmash¶
sourmash is a command-line tool and Python library for computing MinHash sketches from DNA sequences, comparing them to each other, and plotting the results. This allows you to estimate sequence similarity between even very large data sets quickly and accurately. Please see the mash software and the mash paper (Ondov et al., 2016) for background information on how and why MinHash sketches work.
Sourmash is dib-lab software! Please see the sourmash documentation for more on sourmash. Sourmash 2.0 is coming soon. In the meantime, please cite Brown and Irber, 2016
At the moment we have only enabled sourmash compute functionality.
Quickstart¶
Run Sourmash as part of the "default" Eel Pond workflow or via the sourmash_compute subworkflow. At the moment, sourmash compute requires both an assembly and a set of reads as input. Please see the sourmash_compute subworkflow for how to run sourmash compute properly.
Sourmash Command¶
On the command line, the command elvers runs for each file is approximately:
sourmash compute --scaled 1000 \
-k 31 input_file -o output.sig
Output files:¶
Your main output directory will be determined by your config file: by default it is BASENAME_out
(you specify BASENAME).
Sourmash will output files in the sourmash
subdirectory of this output directory. Sourmash signatures will have the same name as the file they're generated from, but end with .sig
instead of .fasta
or .fq.gz
.
Modifying Params for Sourmash:¶
Be sure to set up your sample info and build a configfile first (see Understanding and Configuring Workflows).
To see the available parameters for the sourmash
rule, run
elvers config sourmash --print_params
This will print the following:
#################### sourmash ####################
sourmash:
k_size: 31
scaled: 1000
extra: ''
#####################################################
In addition to changing parameters we've specifically enabled, you can modify the extra
param to pass any extra sourmash parameters, e.g.:
extra: ' --track-abundance '
Be sure the modified lines go into the config file you're using to run elvers
(see Understanding and Configuring Workflows).
See the sourmash documentation to learn more about the parameters you can use with sourmash compute.
Sourmash elvers rule¶
We use a slightly modified version of the sourmash snakemake wrapper to run Sourmash compute via snakemake.
For snakemake afficionados, see our sourmash rules on github.