% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dereplicate.R
\name{dereplicate}
\alias{dereplicate}
\title{De-replicate reads into frequency tables for a set of FASTQ files}
\usage{
dereplicate(
  fs,
  min_sam_fr = 2,
  min_loc_fr = 0.001,
  by = "_([a-zA-Z0-9]*_[F|R])",
  out_xlsx = NULL
)
}
\arguments{
\item{fs}{Character vector with paths to all FASTQ to de-replicate.}

\item{min_sam_fr}{Numeric. Minimum number of sequence counts in a sample to
be retained (cell number).}

\item{min_loc_fr}{Numeric. Minimum frequency of de-replicated sequence in the group to be retained.
If \eqn{min\_loc\_fr \in (0,1)}, then a proportion relative to the
most frequent sequence is applied.}

\item{by}{Regex pattern to group FASTQ files in the list. Passed to \code{stringr::str_extract()}.}

\item{out_xlsx}{File name to write tables with de-replicated sequences
(\emph{Default: NULL; no file is written}).}
}
\value{
List of extracted groups (see 'by'). Each element is the list is
a dataframe with:
\itemize{
\item column 1: 'sequence', DNA sequence of read.
\item column 2: 'md5', md5 hash of DNA sequence.
\item column >= 3: frequency (\emph{integers}) of sequences per sample passing
'min_sam_fr' and 'min_loc_fr' filters.
}
}
\description{
Reads FASTQ files and computes frequency of unique sequences per sample.
Depth for unique sequences are organized in \emph{samples} x \emph{unique sequences} tables.
Unique Ssequences are ordered in descending frequency.
}
\details{
The \emph{by} parameter allows flexible grouping of files in the list. However, the results are not added within each group; individual results for each sample are always returned.
For instance, given 3 files s1_loc1_F.fq, s1_loc1_R.fq and s2_loc1_F.fq:
\itemize{
\item \code{"([a-zA-Z0-9]*_[a-zA-Z0-9]*)"}, returns \emph{s1_loc1} and \emph{s2_loc2}.
\item \code{"_([a-zA-Z0-9]*)_"}, returns \emph{loc1} and \emph{loc2}.
\item \code{"([a-zA-Z0-9]*_[F|R])"}, returns \emph{loc1_F}, \emph{loc1_R} and \emph{loc2_F}.
The \code{min_sam_fr} and \code{min_loc_fr} filters drop data not passing the filters, so they will become
zero or absent when combined.
If a path to an EXCEL file is set, each element in the list is written to a
different sheet in the workbook.
}
}
\examples{
fq <-
 list.files(system.file("extdata", "truncated",
                        package = "tidyGenR"),
                        pattern = "fastq.gz",
            full.names = TRUE)
dereplicate(fq)
}
