Name APPS/BIO/HMMER-2.3.2
Description HMMER analysis tool
Status Production
Last update 2008-02-26

HMMER Runtime Environment home page

Version information

Only this development version currently available.

Interface definition

The runtime environment (RE) sets the following environment variables:

Also, prepare_db command is available for unpacking gzipped database files to HMMER_DB_DIR. See the example below for usage.

Examples

Here is an example of running HMMer in grid. Here we search the globin profile (taken from HMMer tutorial) against Uniprot-Swissprot database version 12.6. The database is taken from NDGF-BioGrid database repository.

Download the example files here.

The job description file hmmer.xrsl

& 
(executable=run_hmmer.sh)
(jobname=hmmer_globin_sprot)
(stdout=std.out)
(stderr=std.err)
(gmlog=gridlog)
(cputime=60)
(memory=1000)
(disk=500)
(runtimeenvironment=APPS/BIO/HMMER-2.3.2)
(inputfiles=
  ("globin.hmm" "globin.hmm")
  (
   "database.fasta.gz" 
   "srm://srm.ndgf.org/biogrid/db/uniprot/UniProt12.6/uniprot_sprot.fasta.gz"
  )
)

The job script run_hmmer.sh unpacks the databases to HMMER_DB_DIR using prepare_db -command and runs hmmsearch against the database.

#!/bin/bash

echo "Hello HMMer!"

echo "Preparing database"
prepare_db database.fasta.gz

echo "Searching globin.hmm against the database"
hmmsearch globin.hmm $HMMER_DB_DIR/database.fasta

echo "Bye HMMER!"

System administrator guide for installing the RE

HMMer binaries

Building and installing HMMer from source should be quite straightforward. Here is an example installation where HMMer is installed in a shared directory /grid/apps that is also visible on the compute nodes.

$ wget ftp://selab.janelia.org/pub/software/hmmer/CURRENT/hmmer-2.3.2.tar.gz
$ tar xvfz hmmer-2.3.2.tar.gz
$ cd hmmer-2.3.2
$ ./configure --prefix /grid/apps/hmmer/2.3.2 --enable-threads
$ make
$ make check
$ make install
Intel compilers seem to produce very fast code for hmmer. Here is an example provided by Jens Larsson.
$ export CFLAGS="-O3 -xS -ip"
$ export LDFLAGS="-Wl,--rpath,/software/intel/11.0.74/lib/intel64"
$ export CC=icc
$ ./configure --prefix /software/biogrid/hmmer/2.3.2 --enable-threads
$ make; make check; make install

It is advisable to use version number (2.3.2) in the installation path. This way it is easy to support multiple versions of HMMer binaries and runtime environments. 'make check' is useful to see that the compiled binaries behave as expected.

Runtime environment scripts

HMMer can take advantage of multicore/multicpu nodes by using threads. It scales in a linear fashion (at least to 8 cores). It is thus advisable to configure the RE to allocate full nodes to HMMer jobs and adjust the HMMER_NCPU -environment variable accordingly (see the example RE scripts, they have notes on this). This way the users can get maximum benefit out of the necessary overhead of unpacking the database to node local disk and bundle more searches into one grid job and also maximize the operating system disk cache usage.

The runtime environment script can be downloaded below. As long as the interface requirements are satisfied, the implementation does not really matter. And some adaptation is needed anyway to accommodate differences in the cluster environment (batch queue systems, temporary directory location etc.).

Download runtime environment script templates: SGE version or PBS version and prepare_db -script.

Modify the scripts as needed and save the main script in your ARC runtime directory as APPS/BIO/HMMER-2.3.2. Make sure that prepare_db script is available in the path for grid jobs using the RE, for example by placing it under HMMer installation bin -directory.

Contact information

Contact olli.tourunen@csc.fi if you have any grid use specific questions. Contact your local HMMer guru in application related questions.