Name | APPS/BIO/HMMER-2.3.2 |
---|---|
Description | HMMER analysis tool |
Status | Production |
Last update | 2008-02-26 |
Only this development version currently available.
The runtime environment (RE) sets the following environment variables:
Also, prepare_db command is available for unpacking gzipped database files to HMMER_DB_DIR. See the example below for usage.
Here is an example of running HMMer in grid. Here we search the globin profile (taken from HMMer tutorial) against Uniprot-Swissprot database version 12.6. The database is taken from NDGF-BioGrid database repository.
Download the example files here.
The job description file hmmer.xrsl
& (executable=run_hmmer.sh) (jobname=hmmer_globin_sprot) (stdout=std.out) (stderr=std.err) (gmlog=gridlog) (cputime=60) (memory=1000) (disk=500) (runtimeenvironment=APPS/BIO/HMMER-2.3.2) (inputfiles= ("globin.hmm" "globin.hmm") ( "database.fasta.gz" "srm://srm.ndgf.org/biogrid/db/uniprot/UniProt12.6/uniprot_sprot.fasta.gz" ) )
The job script run_hmmer.sh unpacks the databases to HMMER_DB_DIR using prepare_db -command and runs hmmsearch against the database.
#!/bin/bash echo "Hello HMMer!" echo "Preparing database" prepare_db database.fasta.gz echo "Searching globin.hmm against the database" hmmsearch globin.hmm $HMMER_DB_DIR/database.fasta echo "Bye HMMER!"
Building and installing HMMer from source should be quite straightforward. Here is an example installation where HMMer is installed in a shared directory /grid/apps that is also visible on the compute nodes.
$ wget ftp://selab.janelia.org/pub/software/hmmer/CURRENT/hmmer-2.3.2.tar.gz $ tar xvfz hmmer-2.3.2.tar.gz $ cd hmmer-2.3.2 $ ./configure --prefix /grid/apps/hmmer/2.3.2 --enable-threads $ make $ make check $ make installIntel compilers seem to produce very fast code for hmmer. Here is an example provided by Jens Larsson.
$ export CFLAGS="-O3 -xS -ip" $ export LDFLAGS="-Wl,--rpath,/software/intel/11.0.74/lib/intel64" $ export CC=icc $ ./configure --prefix /software/biogrid/hmmer/2.3.2 --enable-threads $ make; make check; make install
It is advisable to use version number (2.3.2) in the installation path. This way it is easy to support multiple versions of HMMer binaries and runtime environments. 'make check' is useful to see that the compiled binaries behave as expected.
HMMer can take advantage of multicore/multicpu nodes by using threads. It scales in a linear fashion (at least to 8 cores). It is thus advisable to configure the RE to allocate full nodes to HMMer jobs and adjust the HMMER_NCPU -environment variable accordingly (see the example RE scripts, they have notes on this). This way the users can get maximum benefit out of the necessary overhead of unpacking the database to node local disk and bundle more searches into one grid job and also maximize the operating system disk cache usage.
The runtime environment script can be downloaded below. As long as the interface requirements are satisfied, the implementation does not really matter. And some adaptation is needed anyway to accommodate differences in the cluster environment (batch queue systems, temporary directory location etc.).
Download runtime environment script templates: SGE version or PBS version and prepare_db -script.
Modify the scripts as needed and save the main script in your ARC runtime directory as APPS/BIO/HMMER-2.3.2. Make sure that prepare_db script is available in the path for grid jobs using the RE, for example by placing it under HMMer installation bin -directory.
Contact olli.tourunen@csc.fi if you have any grid use specific questions. Contact your local HMMer guru in application related questions.