Skip to content

For dammit developers

dammit!

Setting up your local computer for dammit development

We can basically follow the instructions for travis, because we're telling travis to do the same things we want to do on our local computers.

Make sure conda is installed. If not, here are instructions:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
export PATH="$HOME/miniconda3/bin:$PATH"

Fork dammit repository to your account. Clone this fork to your local computer, then create a dev branch called testing:

git clone https://github.com/username/dammit.git
git remote add upstream https://github.com/dib-lab/dammit.git 
git checkout -b testing
git branch

Now you are on the testing branch.

Keep original repository in the master branch. Make sure it is up-to-date periodically by running:

git pull upstream master

Set up a Python 3 environment to work in:

conda create -n dammit_dev python=3
source activate dammit_dev

Install dependencies:

conda config --set always_yes yes --set changeps1 no
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

conda install python numpy pandas numexpr>=2.3.1 khmer>=2.1 sphinx>1.3.1 sphinx_rtd_theme>=0.1.9 pytest pytest-runner doit>=0.29.0 matplotlib shmlast infernal hmmer transdecoder=3.0.1 last busco=3.0.2 parallel bioconductor-seqlogo
python setup.py install

Last line of the output should be:

Finished processing dependencies for dammit==1.0rc2

Lastly, install databases (will install in ~/.dammit/databases/)

dammit databases --install

Output should be:

(dammit_dev) campus-019-072:dammit johnsolk$ dammit databases --install
Unable to revert mtime: /Library/Fonts
# dammit
## a tool for easy de novo transcriptome annotation

by Camille Scott

**v1.0rc2**, 2018

## submodule: databases
### Database Install
#### Info
* Database Directory: /Users/johnsolk/.dammit/databases
* Doit Database: /Users/johnsolk/.dammit/databases/databases.doit.db


*All database tasks up-to-date.*

Nothing to install!

Now you are ready to edit and make changes!

To-do for dammit

  • [ ] update transdecoder version
  • [ ] orthodb version (other database versions?)
  • [ ] add swissprot
  • [x] change order of conda channels to include conda-forge last
  • [ ] update documentation
  • [ ] add pipeline for accepting .pep file as input (skips transdecoder, transcriptome stats and BUSCO tasks)

Versioning

A new version is required when a new version of a database is added or a major change happens that changes the commandline interface. Change the VERSION file when this hapens.

(Note 11/30/2018: We should make all changes above in the T-do, then move to v1.1)

Notes on dammit

Written by Camille Scott. See tutorial.

  1. Look at pydoit documentation, and Camille's workshop
  2. pypi and bioconda (supported method of installation)
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Architecture:

Take a look at code and tests in the dammit directory:

  • The core driver of dammit is the dammit/app.py file, which sets up the commandline arguments. Everything happens here. If you want to add an argument, this is where it hapens.
  • There are two subcommand task-handler files: annotate.py and databases.py
  • Tasks are steps being run, separated into different files. For example, the hmmer.py file contains all hmmer tasks.
  • The task handler has on logger, pulls from config to figure out where databases are located (all happens in background), some doit stuff happening
  • Decorators transfer the function's return into a doit function (e.g. line 59 of shell.py) import doit_task then @doit_task

databases, 2 pipelines:

  • quick
  • full

annotate, more pipelines:

  • uniref1
  • full
  • nr

config.json

Can use custom config.json file to include different parameters for the programs run by the tasks, e.g. transdecoder LongOrgs -m 50, etc.

parallel.py

hmmer, infernal, lastl,

requires gnu parallel

(There are instructions for how to runon multi-node hpc, somewhere.)

ui.py

output for user to markdown formatting for copying/pasting into GitHub issue reporting

generate-test-data-.sh re-genreates test data adn puts it in proper dirs

TESTS!

dammit/tests

Run test_databases.py yourself, locally (because databases cannot be cached on travis-ci)

  • makes sure tasks and pipeline run and produce output, they don't all check expected output. some integration output.
  • uses pytest
  • set of tests files
  • testing pydoit tasks is a pain
  • under utils, file to run tasks. give it a list of tasks, it will execute in own directory.
  • functions start with 'test', check assertions
  • fixtures are a means of setting upa consistent environment before running an individual test, e.g. be in a clean directory. tmpdir will create a randomly name temporary directory.
  • make tests for new tasks (Sometimes they will take a long time to run...)
  • test_annotate.py must be run locally by yourself.
  • before pushing release, do both of these
  • make long tests (assumes environment is already setup)
  • travis-ci is building the recipe that lives in the repo
  • make-ci-test, not long and not huge and not requires_datbases

Reviewing a PR

Tests must pass before merging!

  • Have there been radical changes? (Are you adding things to handler, maybe time to take a step back and make sure code uses reasonable variable names, tests, etc)
  • Does travis build?
  • Try to make commit messages somewhat informative

If these all seem reasonable to you, approve!

Fix travis:

.travis.yml

  • make sure conda env uses right Python
  • fix conda channel order

Bioconda

  • https://anaconda.org/bioconda/dammit
  • Recipe: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/dammit/meta.yaml

Documentation