NAME
- hmmsearch - search a sequence database with a profile
HMM
SYNOPSIS
hmmsearch [options] hmmfile
seqfile
DESCRIPTION
hmmsearch reads an HMM from hmmfile and searches
seqfile for significantly similar sequence matches.
seqfile will be looked for first in the current working
directory, then in a directory named by the environment variable
BLASTDB. This lets users use existing BLAST databases, if
BLAST has been configured for the site.
hmmsearch may take minutes or even hours to run,
depending on the size of the sequence database. It is a good idea
to redirect the output to a file.
The output consists of four sections: a ranked list of the best
scoring sequences, a ranked list of the best scoring domains,
alignments for all the best scoring domains, and a histogram of the
scores. A sequence score may be higher than a domain score for the
same sequence if there is more than one domain in the sequence; the
sequence score takes into account all the domains. All sequences
scoring above the -E and -T cutoffs are shown in the
first list, then every domain found in this list is shown in
the second list of domain hits. If desired, E-value and bit score
thresholds may also be applied to the domain list using the
--domE and --domT options.
OPTIONS
- -h
- Print brief help; includes version number and summary of all
options, including expert options.
- -A <n>
- Limits the alignment output to the <n> best
scoring domains. -A0 shuts off the alignment output and can
be used to reduce the size of output files.
- -E <x>
- Set the E-value cutoff for the per-sequence ranked hit list to
<x>, where <x> is a positive real number.
The default is 10.0. Hits with E-values better than (less than)
this threshold will be shown.
- -T <x>
- Set the bit score cutoff for the per-sequence ranked hit list
to <x>, where <x> is a real number. The
default is negative infinity; by default, the threshold is
controlled by E-value and not by bit score. Hits with bit scores
better than (greater than) this threshold will be shown.
- -Z <n>
- Calculate the E-value scores as if we had seen a sequence
database of <n> sequences. The default is the number
of sequences seen in your database file
<seqfile>.
EXPERT OPTIONS
- --compat
- Use the output format of HMMER 2.1.1, the 1998-2001 public
release; provided so 2.1.1 parsers don't have to be rewritten.
- --cpu <n>
- Sets the maximum number of CPUs that the program will run on.
The default is to use all CPUs in the machine. Overrides the
HMMER_NCPU environment variable. Only affects threaded versions of
HMMER (the default on most systems).
- --cut_ga
- Use Pfam GA (gathering threshold) score cutoffs. Equivalent to
--globT <GA1> --domT <GA2>, but the GA1 and GA2 cutoffs
are read from the HMM file. hmmbuild puts these cutoffs there if
the alignment file was annotated in a Pfam-friendly alignment
format (extended SELEX or Stockholm format) and the optional GA
annotation line was present. If these cutoffs are not set in the
HMM file, --cut_ga doesn't work.
- --cut_tc
- Use Pfam TC (trusted cutoff) score cutoffs. Equivalent to
--globT <TC1> --domT <TC2>, but the TC1 and TC2 cutoffs
are read from the HMM file. hmmbuild puts these cutoffs there if
the alignment file was annotated in a Pfam-friendly alignment
format (extended SELEX or Stockholm format) and the optional TC
annotation line was present. If these cutoffs are not set in the
HMM file, --cut_tc doesn't work.
- --cut_nc
- Use Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT
<NC1> --domT <NC2>, but the NC1 and NC2 cutoffs are
read from the HMM file. hmmbuild puts these cutoffs there if the
alignment file was annotated in a Pfam-friendly alignment format
(extended SELEX or Stockholm format) and the optional NC annotation
line was present. If these cutoffs are not set in the HMM file,
--cut_nc doesn't work.
- --domE <x>
- Set the E-value cutoff for the per-domain ranked hit list to
<x>, where <x> is a positive real number.
The default is infinity; by default, all domains in the sequences
that passed the first threshold will be reported in the second
list, so that the number of domains reported in the per-sequence
list is consistent with the number that appear in the per-domain
list.
- --domT <x>
- Set the bit score cutoff for the per-domain ranked hit list to
<x>, where <x> is a real number. The
default is negative infinity; by default, all domains in the
sequences that passed the first threshold will be reported in the
second list, so that the number of domains reported in the
per-sequence list is consistent with the number that appear in the
per-domain list. Important note: only one domain in a
sequence is absolutely controlled by this parameter, or by
--domT. The second and subsequent domains in a sequence have
a de facto bit score threshold of 0 because of the details of how
HMMER works. HMMER requires at least one pass through the main
model per sequence; to do more than one pass (more than one domain)
the multidomain alignment must have a better score than the single
domain alignment, and hence the extra domains must contribute
positive score. See the Users' Guide for more detail.
- --forward
- Use the Forward algorithm instead of the Viterbi algorithm to
determine the per-sequence scores. Per-domain scores are still
determined by the Viterbi algorithm. Some have argued that Forward
is a more sensitive algorithm for detecting remote sequence
homologues; my experiments with HMMER have not confirmed this,
however.
- --informat <s>
- Assert that the input seqfile is in format
<s>; do not run Babelfish format autodection. This
increases the reliability of the program somewhat, because the
Babelfish can make mistakes; particularly recommended for
unattended, high-throughput runs of HMMER. Valid format strings
include FASTA, GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF,
CLUSTAL, and PHYLIP. See the User's Guide for a complete list.
- --null2
- Turn off the post hoc second null model. By default, each
alignment is rescored by a postprocessing step that takes into
account possible biased composition in either the HMM or the target
sequence. This is almost essential in database searches, especially
with local alignment models. There is a very small chance that this
postprocessing might remove real matches, and in these cases
--null2 may improve sensitivity at the expense of reducing
specificity by letting biased composition hits through.
- --pvm
- Run on a Parallel Virtual Machine (PVM). The PVM must already
be running. The client program hmmsearch-pvm must be
installed on all the PVM nodes. Optional PVM support must have been
compiled into HMMER.
- --xnu
- Turn on XNU filtering of target protein sequences. Has no
effect on nucleic acid sequences. In trial experiments,
--xnu appears to perform less well than the default post hoc
null2 model.
SEE ALSO
Master man page, with full list of and guide to the individual
man pages: see hmmer(1).
For complete documentation, see the user guide that came with
the distribution (Userguide.pdf); or see the HMMER web page,
http://hmmer.wustl.edu/.
COPYRIGHT
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
Freely distributed under the GNU General Public License (GPL).
See the file COPYING in your distribution for details on
redistribution conditions.
AUTHOR
Sean Eddy
HHMI/Dept. of Genetics
Washington Univ. School of Medicine
4566 Scott Ave.
St Louis, MO 63110 USA