% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hmmer3.R
\name{hmmerScan}
\alias{hmmerScan}
\title{Scanning a profile Hidden Markov Model database}
\usage{
hmmerScan(in.files, db, out.folder, threads = 0, verbose = TRUE)
}
\arguments{
\item{in.files}{A character vector of file names.}

\item{db}{The full name of the database to scan.}

\item{out.folder}{The name of the folder to put the result files.}

\item{threads}{Number of CPU's to use.}

\item{verbose}{Logical indicating if textual output should be given to monitor the progress.}
}
\value{
This function produces files in the folder specified by \samp{out.folder}. Existing files are
never overwritten by \code{\link{hmmerScan}}, if you want to re-compute something, delete the
corresponding result files first.
}
\description{
Scanning FASTA formatted protein files against a database of pHMMs using the HMMER3 software.
}
\details{
The HMMER3 software is purpose-made for handling profile Hidden Markov Models (pHMM)
describing patterns in biological sequences (Eddy, 2008). This function will make calls to the
HMMER3 software to scan FASTA files of proteins against a pHMM database. 

The files named in \samp{in.files} must contain FASTA formatted protein sequences. These files
should be prepared by \code{\link{panPrep}} to make certain each sequence, as well as the file name,
has a GID-tag identifying their genome. The database named in \samp{db} must be a HMMER3 formatted
database. It is typically the Pfam-A database, but you can also make your own HMMER3 databases, see
the HMMER3 documentation for help.

\code{\link{hmmerScan}} will query every input file against the named database. The database contains
profile Hidden Markov Models describing position specific sequence patterns. Each sequence in every
input file is scanned to see if some of the patterns can be matched to some degree. Each input file
results in an output file with the same GID-tag in the name. The result files give tabular output, and
are plain text files. See \code{\link{readHmmer}} for how to read the results into R.

Scanning large databases like Pfam-A takes time, usually several minutes per genome. The scan is set
up to use only 1 cpu per scan by default. By increasing \code{threads} you can utilize multiple CPUs, typically
on a computing cluster.
Our experience is that from a multi-core laptop it is better to start this function in default mode
from mutliple R-sessions. This function will not overwrite an existing result file, and multiple parallel
sessions can write results to the same folder.
}
\note{
The HMMER3 software must be installed on the system for this function to work, i.e. the command
\samp{system("hmmscan")} must be recognized as a valid command if you run it in the Console window.
}
\examples{
\dontrun{
# Using a FASTA file in the micropan package
# We need to uncompress it first...
extdata.path <- file.path(path.package("micropan"),"extdata")
filenames <- "Mpneumoniae_309_GID2.fsa"
pth <- lapply( file.path( extdata.path, paste( filenames, ".xz", sep="" ) ), xzuncompress )

# Using a miniature pHMM database in the micropan package
# We need to uncompress its datafiles first...
db <- "microfam0.hmm"
pth <- lapply( file.path( extdata.path,
   paste( db, c(".h3f.xz",".h3i.xz",".h3m.xz",".h3p.xz"), sep="" ) ), xzuncompress )
  
# ...and scanning the FASTA-file against microfam0...
hmmerScan(in.files=file.path(extdata.path,filenames), 
   db=file.path(extdata.path,db),out.folder=".")
  
# ...and compressing all files again...
pth <- lapply( file.path( extdata.path, filenames ), xzcompress )
pth <- lapply( file.path( extdata.path,
   paste( db, c(".h3f",".h3i",".h3m",".h3p"), sep="" ) ), xzcompress )
}

}
\author{
Lars Snipen and Kristian Hovde Liland.
}
\references{
Eddy, S.R. (2008). A Probabilistic Model of Local Sequence Alignment That Simplifies
Statistical Significance Estimation. PLoS Computational Biology, 4(5).
}
\seealso{
\code{\link{panPrep}}, \code{\link{readHmmer}}.
}

