Name APPS/BIO/HMMER-3.0
Description HMMER analysis tool
Status Production
Last update 2010-04-01

HMMER Runtime Environment home page

Version information

HMMER 3.0 is the newest version, but not fully backwards compatible with HMMER 2.3.2.

HMMER 2.3.2 definition can be found here.

Interface definition

The runtime environment (RE) sets the following environment variables:

Also, prepare_db command is available for unpacking gzipped database files to HMMER_DB_DIR. See the example below for usage.

Examples

Here is an example of running HMMER in grid. Here we search the globin profile (taken from HMMER tutorial) against Uniprot-Swissprot database version 2010_04. The database is taken from NDGF-BioGrid database repository.

Download the example files here.

The job description file hmmer.xrsl

& 
(executable=run_hmmer.sh)
(jobname=hmmer_globin_sprot)
(stdout=std.out)
(stderr=std.err)
(gmlog=gridlog)
(cputime=60)
(memory=1000)
(disk=500)
(runtimeenvironment=APPS/BIO/HMMER-3.0)
(inputfiles=
  ("globin.hmm" "globin.hmm")
  (
   "database.fasta.gz" 
   "srm://srm.ndgf.org/biogrid/db/uniprot/UniProt2010_04/uniprot_sprot.fasta.gz"
  )
)

The job script run_hmmer.sh unpacks the databases to HMMER_DB_DIR using prepare_db -command and runs hmmsearch against the database.

#!/bin/bash

echo "Hello HMMER!"

echo "Preparing database"
prepare_db database.fasta.gz

echo "Searching globin.hmm against the database"
hmmsearch globin.hmm $HMMER_DB_DIR/database.fasta

echo "Bye HMMER!"

System administrator guide for installing the RE

HMMER binaries

HMMER version 3.0 comes with icc-precompiled binaries that should be close to optimal for all generic x86_64 processors. Here we download HMMER binary release for x86_64 and copy the binaries to a version specific installation directory under /grid/nordugrid-arc/appl/hmmer/:

 
$ mkdir /tmp/hmmer/ 
$ cd /tmp/hmmer/
$ wget ftp://selab.janelia.org/pub/software/hmmer3/3.0/hmmer-3.0-linux-intel-x86_64.tar.gz
$ tar xvfz hmmer-3.0-linux-intel-x86_64.tar.gz 
$ mkdir -p /grid/nordugrid-arc/appl/hmmer/3.0/bin
$ cp hmmer-3.0-linux-intel-x86_64/binaries/* /grid/nordugrid-arc/appl/hmmer/3.0/bin/


It is advisable to use version number (3.0) in the installation path. This way it is easy to support multiple versions of HMMER binaries and runtime environments. 'make check' is useful to see that the compiled binaries behave as expected.

Runtime environment scripts

HMMER can take advantage of multicore/multicpu nodes by using threads. It scales nicely for bigger searches (at least to 8 cores). You might want to consider configuring the RE to allocate full nodes to HMMER jobs and adjust the HMMER_NCPU -environment variable accordingly (see the example RE scripts, they have notes on this). This way the users can get maximum benefit out of the necessary overhead of unpacking the database to node local disk and bundle more searches into one grid job and also maximize the operating system disk cache usage.

The runtime environment script can be downloaded below. As long as the interface requirements are satisfied, the implementation does not really matter. And some adaptation is needed anyway to accommodate differences in the cluster environment (batch queue systems, temporary directory location etc.).

Download runtime environment script templates: SGE version or PBS version and prepare_db -script.

Modify the scripts as needed and save the main script in your ARC runtime directory as APPS/BIO/HMMER-3.0.

NOTE: Make sure that prepare_db script is available in the path for grid jobs using the RE, for example by placing it under HMMER installation bin -directory.

Contact information

Contact olli.tourunen@csc.fi if you have any grid use specific questions. Contact your local HMMER guru in application related questions.