% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/speech.R
\name{gl_speech}
\alias{gl_speech}
\title{Call Google Speech API}
\usage{
gl_speech(audio_source, encoding = c("LINEAR16", "FLAC", "MULAW", "AMR",
  "AMR_WB", "OGG_OPUS", "SPEEX_WITH_HEADER_BYTE"), sampleRateHertz = 16000L,
  languageCode = "en-US", maxAlternatives = 1L, profanityFilter = FALSE,
  speechContexts = NULL, asynch = FALSE)
}
\arguments{
\item{audio_source}{File location of audio data, or Google Cloud Storage URI}

\item{encoding}{Encoding of audio data sent}

\item{sampleRateHertz}{Sample rate in Hertz of audio data. Valid values \code{8000-48000}. Optimal \code{16000}}

\item{languageCode}{Language of the supplied audio as a \code{BCP-47} language tag}

\item{maxAlternatives}{Maximum number of recognition hypotheses to be returned. \code{0-30}}

\item{profanityFilter}{If \code{TRUE} will attempt to filter out profanities}

\item{speechContexts}{An optional character vector of context to assist the speech recognition}

\item{asynch}{If your \code{audio_source} is greater than 60 seconds, set this to TRUE to return an asynchronous call}
}
\value{
A list of two tibbles:  \code{$transcript}, a tibble of the \code{transcript} with a \code{confidence}; \code{$timings}, a tibble that contains \code{startTime}, \code{endTime} per \code{word}.  If maxAlternatives is greater than 1, then the transcript will return near-duplicate rows with other interpretations of the text.
 If \code{asynch} is TRUE, then an operation you will need to pass to \link{gl_speech_op} to get the finished result.
}
\description{
Turn audio into text
}
\details{
Google Cloud Speech API enables developers to convert audio to text by applying powerful
neural network models in an easy to use API.
The API recognizes over 80 languages and variants, to support your global user base.
You can transcribe the text of users dictating to an application’s microphone,
enable command-and-control through voice, or transcribe audio files, among many other use cases.
Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage,
by using the same technology Google uses to power its own products.
}
\section{AudioEncoding}{


Audio encoding of the data sent in the audio message. All encodings support only 1 channel (mono) audio.
Only FLAC and WAV include a header that describes the bytes of audio that follow the header.
The other encodings are raw audio bytes with no header.
For best results, the audio source should be captured and transmitted using a
lossless encoding (FLAC or LINEAR16).
Recognition accuracy may be reduced if lossy codecs, which include the other codecs listed in this section,
are used to capture or transmit the audio, particularly if background noise is present.

Read more on audio encodings here \url{https://cloud.google.com/speech/docs/encoding}
}

\section{WordInfo}{


Use \code{tidyr::unnest()} to extract the word columns if needed.

\code{startTime} - Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word.

\code{endTime} - Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word.

\code{word} - The word corresponding to this set of information.
}

\examples{

\dontrun{

test_audio <- system.file("woman1_wb.wav", package = "googleLanguageR")
result <- gl_speech(test_audio)

result2 <- gl_speech(test_audio, maxAlternatives = 2L)

result_brit <- gl_speech(test_audio, languageCode = "en-GB")

## extract word timestamps
tidyr::unnest(result_brit)

## make an asynchronous API request (mandatory for sound files over 60 seconds)
asynch <- gl_speech(test_audio, asynch = TRUE)

## Send to gl_speech_op() for status or finished result
gl_speech_op(asynch)

}


}
\seealso{
\url{https://cloud.google.com/speech/reference/rest/v1/speech/recognize}
}
