\name{pattern.search}
\alias{pattern.search}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{
Detecting and grouping isotope m/z relations among peaks in a HRMS dataset
}
\description{
Algorithm for detecting isotopes pattern peak groups generated by an unknown candidate chemical component.
}
\usage{
pattern.search(peaklist, iso, cutint = min(peaklist[, 2]), rttol = c(-0.5, 0.5), 
mztol = 3, mzfrac = 0.1, ppm = TRUE, inttol = 0.5, 
rules = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), 
deter = FALSE, entry = 20)
}

\arguments{
  \item{peaklist}{
	Dataframe or matrix of HRMS peaks with three columns for (a) m/z, (b) intensity and (c) retention time, such as \code{\link[nontarget]{peaklist}}.
}
  \item{iso}{
	Object generated by \code{\link[nontarget]{make.isos}} from \code{\link[enviPat]{isotopes}}, defining the isotopes m/z differences to be screened for.
}
  \item{cutint}{
	Cutoff intensity. Peaks below this intensity will be (a) omitted and (b) not expected by any of the plausibility rules (see details). See parameter \code{rules} below.
}
  \item{rttol}{
	Minus, plus retention time tolerance. Units as given in column 3 of \code{peaklist} argument, e.g. [min].
}
  \item{mztol}{
	m/z tolerance setting: value by which the m/z of a peak may vary from its expected value. If parameter \code{ppm=TRUE} (see below) given in ppm, otherwise, if \code{ppm=FALSE}, in absolute m/z [u]. Defines
	the "large" mass tolerance used.
}
  \item{mzfrac}{
	"Small" mass tolerance used. Given as a fraction of \code{mztol}, see above.
}
  \item{ppm}{
	Should \code{mztol} be set in ppm (\code{TRUE}) or in absolute m/z (\code{FALSE})
}
  \item{inttol}{
	Intensity tolerance setting: fraction by which peak intensities may vary. E.g. if set to 0.2, a peak with expected intensity 10000 may range in between 8000 and 12000.	
}
  \item{rules}{
	Enabling(\code{TRUE})/disabling(\code{FALSE}) of \code{rules[1]} to \code{rules[11]}, see details. Vector with eight entries.
}
  \item{deter}{
	If using \code{\link[nontarget]{deter.iso}} instead of \code{\link[nontarget]{make.isos}}, set to \code{TRUE}. This disables all rules and makes \code{\link[nontarget]{pattern.search}}
	compatible with argument \code{iso} inputs from \code{\link[nontarget]{deter.iso}}. Otherwise, ignore.
}
  \item{entry}{
	Memory allocation setting. Increase value if the corresponding warning is issued. Otherwise, ignore.
}
}
\details{
Detecting groups of isotope pattern peaks involves two steps. 

In a first step, and within the given tolerances \code{rttol} and \code{mztol}, m/z differences among any two peaks are screened for matching differences in m/z among different isotope(s) of an element, 
as provided by the \code{iso} argument. This leads to a set of candidate isotope m/z differences, with each subsequently undergoing four plausibility checks (\code{rules} parameter entries 1 to 7). 

In a second step, the remaining candidate m/z isotope differences are sorted in tree-like structures (so-called isotope pattern groups), starting from the lowest m/z peak of the data set. 
Thus, a tree consists of several (>=2) peaks related by isotope m/z differences; the peak with lowest m/z in the tree (root node) represents the monoisotopic peak of the associated candidate 
molecular component. This does not require prior knowledge about the chemical nature of the components assigned. Again, the resulting trees undergo plausibilization (\code{rules} parameter entries 8 to 11).

In addition, groups with m/z isotope differences being detected within "small" \code{mztol} are used to calculate a minimum number of atoms per element associated with that m/z isotope difference.
}
\section{rules setting}{

\code{rules[1]}: Intensities between two peaks associated via any of the candidate m/z isotope differences of the \code{iso} argument are compared. 
Given this difference in intensity, the minimum number of atoms for the element with highest abundance in argument \code{iso} is calculated. 
If \eqn{(minimum number of atoms)*(minimum mass) > (m/z of lighter peak * maximum charge in argument iso)}, the candidate m/z difference is 
found implausible and therefore rejected. The minimum mass is set to that of protium (1H) plus its minimum association to numers of 
carbon atoms, i.e. 1.0078 + (1/6 * 12.0000). Fast precheck to \code{rules[2]} and \code{rules[3]}.

\code{rules[2]}: Repeats \code{rules[1]}, but uses abundances and minimum masses (including the C-ratios of \code{\link[enviPat]{isotopes}}) for only those isotope(s) of argument \code{iso} 
ranging within the "large" m/z tolerance set by \code{mztol}.

\code{rules[3]}: Repeats \code{rules[1]}, but now uses abundance and minimum masses (including the C-ratios of \code{\link[enviPat]{isotopes}}) individually for only 
those isotope(s) of argument \code{iso} ranging within the "small" m/z tolerance set by \code{mztol*mzfrac}. 

\code{rules[4]}: If the intensity ratio between two peaks associated via any of the candidate m/z isotope differences of the \code{iso} argument is smaller than 
the smallest isotope abundance ratio of an element of argument \code{iso}, the candidate m/z difference is found implausible and therefore rejected. 
Fast precheck to \code{rules[5]} and \code{rules[6]}.

\code{rules[5]}: Repeats \code{rules[4]}, but now uses abundances for only those isotope(s) of argument \code{iso} ranging within the "large" m/z tolerance set by \code{mztol}.

\code{rules[6]}: Repeats \code{rules[4]}, but now uses abundances for only those isotope(s) of argument \code{iso} ranging within the "small" m/z tolerance set by \code{mztol}.

\code{rules[7]}: Given those isotopes of argument \code{iso} ranging within the "small" m/z tolerance set by \code{mztol} and \code{mzfrac} and their C-ratio set in \code{\link[enviPat]{isotopes}},
the minimum number of carbon atoms and the associated 13C peak intensity to be expected at M+1 can be calculated. Checks if this expected 13C peak is present in the data set.
If not, the candidate m/z difference is rejected.

\code{rules[8]}: Given the intensity and m/z of the monoisotopic peak in a growing isotope pattern tree and values from argument \code{iso}, the maximum m/z to which a tree can grow is restrict.

\code{rules[9]}: Given (a) the intensities of the monoisotopic peak (=tree root node, interaction level 1) and its first isotopic daughter peaks (tree interaction level 2) and
(b) the candidate m/z isotope(s) within the "small" m/z tolerance set by \code{mztol} and \code{mzfrac} associated with (a), the occurrence of expected peaks (interaction level >2) above
the value set by argument cutint is checked. If expected but not found, the peak at interaction level 1 is rejected as being the monoisotopic candidate peak, and a tree is
grown on the remaining interrelated peaks. For example, if a monoisotopic peak (= tree interaction level 1) is associated with an intensive 13-C isotope peak (= tree interaction 
level 2), a second peak from two 13-C vs. 12-C isotope replacements can be expected and must be checked for.

\code{rules[10]}: Restriction to \code{rules[7]} and \code{[9]}: expected peaks are searched for only if no other measured peaks of higher intensity exist in a tolerance window of absolute m/z = 0.5 around 
the m/z of the expected peak. This allows skipping the search of expected peaks in cases of intensity masking by other peaks. For example, intensive 37-Cl often mask the occurrence
of a second 13-C peak to be expected from \code{rules[6]}, depending on the number of Cl and C atoms and the measurement resolution used.

\code{rules[11]}: In some cases, trees may - if several z-values are used - be nested within each other at different z-levels. This rules merges the nested group of z=x into the nesting peak 
group of z=y.

}

\note{
Input peaklist is internally sorted and saved in the lists returned by (a) increasing retention time and (b) m/z by all \code{\link[nontarget]{pattern.search}}, \code{\link[nontarget]{adduct.search}} 
and \code{\link[nontarget]{homol.search}}. Peak IDs refer to this very order - in contrast to group IDs. Different group IDs exist for adduct groups, isotope pattern groups and homologue series peak 
groups. Moreover, and at the highest level, yet other IDs exist for the individual components (see note section of \code{\link[nontarget]{combine}}).

Depending on values of \code{mztol}, several m/z isotope differences from argument \code{iso} may match a measured m/z difference between two peaks.

\code{rules[1]} to \code{rules[11]} encompass uncertainties in intensity set by parameter \code{inttol}.

In some cases, two or several isotope pattern trees may overlap. Overlapping trees are not merged by \code{rules[11]} but only fully nested ones.

Disabling \code{rules[10]} may in some cases lead to false rejections of candidate m/z isotope differences for \code{rules[7]} and \code{rules[9]}, especially for low resolutions.

\code{rules[9]} is recursive, i.e. may be applied several times on an ever decreasing number of peaks per tree, until plausibility holds or no m/z isotopic differences remain.
}
\section{Warning}{
Acceptable outcomes strongly depend on appropriate parametrization of the algorithm.

Including many isotopes and overly large values for \code{rttol} and/or \code{mztol} may lead to overflows. In this case, a warning is issued to increase parameter \code{entry} or to adjust values of \code{rttol} and/or \code{mztol}.

Group IDs are valid both for \code{pattern[[1]]} and \code{pattern[[3]]}.
}
\value{
List of type pattern with 12 entries

\item{pattern[[1]]}{\code{Patterns}. Dataframe with peaks (\code{mass},\code{intensity},\code{rt},\code{peak ID}) and their 
isotope pattern relations (\code{to ID},\code{isotope(s)},\code{mass tolerance},\code{charge level}) within
isotope pattern groups (\code{group ID},\code{interaction level}).}
\item{pattern[[2]]}{\code{Parameters}. Parameters used.}
\item{pattern[[3]]}{\code{Peaks in pattern groups}. Dataframe listing all peaks (\code{peak IDs}) per isotope pattern group (\code{group ID}) at the given z-level(s) (\code{charge level}).}
\item{pattern[[4]]}{\code{Atom counts}. Groups with m/z isotope differences being detected within "small" \code{mztol} are used to calculate a minimum number of atoms per element associated with that m/z isotope difference.}
\item{pattern[[5]]}{\code{Count of pattern groups}. Number of isotope pattern groups found on the different z-levels used.}
\item{pattern[[6]]}{\code{Removals by rules}. Times rules lead to rejections (\code{rules[1]} to \code{rules[10]}) or a merging of nested groups (\code{rules[11]}).}
\item{pattern[[7]]}{\code{Number of peaks with pattern group overlapping}. Number of overlapping groups; \code{overlap = 1} corresponds to no overlap.}
\item{pattern[[8]]}{\code{Number of peaks per within-group interaction levels}.}
\item{pattern[[9]]}{\code{Counts of isotopes}. Number of times a m/z isotope difference was detected (raw measure / number of isotope pattern groups)}
\item{pattern[[10]]}{\code{Elements}. Elements used via argument iso derived by \code{\link[nontarget]{make.isos}}.}
\item{pattern[[11]]}{\code{Charges}. z-levels used.}
\item{pattern[[12]]}{\code{Rule settings}. \code{rules[1]} to \code{rules[11]} settings used.}

}
\author{
Martin Loos
}
\seealso{
	\code{\link[nontarget]{pattern.search2}}
	\code{\link[nontarget]{rm.sat}}
	\code{\link[nontarget]{peaklist}}
	\code{\link[nontarget]{make.isos}}
	\code{\link[nontarget]{plotisotopes}}	
	\code{\link[nontarget]{plotdefect}}
	\code{\link[nontarget]{combine}}
	\code{\link[nontarget]{plotgroup}}
	\code{\link[enviPat]{isotopes}}
	\code{\link[enviPat]{resolution_list}}
}
\examples{
\donttest{
######################################################
# load required data: ################################
# HRMS peak list: ####################################
data(peaklist)
peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE);
peaklist<-peaklist[peaklist[,4],1:3];
# list of isotopes ###################################
data(isotopes)
######################################################
# (1) run isotope pattern grouping ###################
# (1.1) define isotopes and charge (z) argument ######
iso<-make.isos(isotopes,
	use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"),
	use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2))
# (1.2) run isotope grouping #########################
# save the list returned as "pattern" ################
pattern<-pattern.search(
  peaklist,
  iso,
  cutint=10000,
  rttol=c(-0.05,0.05),
  mztol=2,
  mzfrac=0.1,
  ppm=TRUE,
  inttol=0.2,
  rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
  deter=FALSE,
  entry=50
);
names(pattern);
# extract peaks listed in isotope pattern group no.1 #
# under pattern[[3]] from pattern[[1]] ###############
pattern[[1]][as.numeric(strsplit(as.character(pattern[[3]][1,2]),",")[[1]]),];
# (1.3) plot results #################################
plotisotopes(pattern);
plotdefect(pattern,elements=c("N"));
######################################################
}
}
