% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vst.R
\name{vst}
\alias{vst}
\title{Variance stabilizing transformation for UMI count data}
\usage{
vst(umi, cell_attr = NULL, latent_var = c("log_umi"),
  batch_var = NULL, latent_var_nonreg = NULL, n_genes = 2000,
  n_cells = NULL, method = "poisson", do_regularize = TRUE,
  res_clip_range = c(-sqrt(ncol(umi)), sqrt(ncol(umi))),
  bin_size = 256, min_cells = 5, residual_type = "pearson",
  return_cell_attr = FALSE, return_gene_attr = TRUE,
  return_corrected_umi = FALSE, min_variance = -Inf, bw_adjust = 3,
  gmean_eps = 1, theta_given = NULL, show_progress = TRUE)
}
\arguments{
\item{umi}{A matrix of UMI counts with genes as rows and cells as columns}

\item{cell_attr}{A data frame containing the dependent variables; if omitted a data frame with umi and gene will be generated}

\item{latent_var}{The independent variables to regress out as a character vector; must match column names in cell_attr; default is c("log_umi")}

\item{batch_var}{The dependent variables indicating which batch a cell belongs to; no batch interaction terms used if omiited}

\item{latent_var_nonreg}{The non-regularized dependent variables to regress out as a character vector; must match column names in cell_attr; default is NULL}

\item{n_genes}{Number of genes to use when estimating parameters (default uses 2000 genes, set to NULL to use all genes)}

\item{n_cells}{Number of cells to use when estimating parameters (default uses all cells)}

\item{method}{Method to use for initial parameter estimation; one of 'poisson', 'nb_fast', 'nb', 'nb_theta_given'}

\item{do_regularize}{Boolean that, if set to FALSE, will bypass parameter regularization and use all genes in first step (ignoring n_genes).}

\item{res_clip_range}{Numeric of length two specifying the min and max values the results will be clipped to; default is c(-sqrt(ncol(umi)), sqrt(ncol(umi)))}

\item{bin_size}{Number of genes to put in each bin (to show progress)}

\item{min_cells}{Only use genes that have been detected in at least this many cells; default is 5}

\item{residual_type}{What type of residuals to return; can be 'pearson', 'deviance', or 'none'; default is 'pearson'}

\item{return_cell_attr}{Make cell attributes part of the output; default is FALSE}

\item{return_gene_attr}{Calculate gene attributes and make part of output; default is TRUE}

\item{return_corrected_umi}{If set to TRUE output will contain corrected UMI matrix; see \code{correct} function}

\item{min_variance}{Lower bound for the estimated variance for any gene in any cell when calculating pearson residual; default is -Inf}

\item{bw_adjust}{Kernel bandwidth adjustment factor used during regurlarization; factor will be applied to output of bw.SJ; default is 3}

\item{gmean_eps}{Small value added when calculating geometric mean of a gene to avoid log(0); default is 1}

\item{theta_given}{Named numeric vector of fixed theta values for the genes; will only be used if method is set to nb_theta_given; default is NULL}

\item{show_progress}{Whether to print messages and show progress bar}
}
\value{
A list with components
\item{y}{Matrix of transformed data, i.e. Pearson residuals, or deviance residuals; empty if \code{residual_type = 'none'}}
\item{umi_corrected}{Matrix of corrected UMI counts (optional)}
\item{model_str}{Character representation of the model formula}
\item{model_pars}{Matrix of estimated model parameters per gene (theta and regression coefficients)}
\item{model_pars_outliers}{Vector indicating whether a gene was considered to be an outlier}
\item{model_pars_fit}{Matrix of fitted / regularized model parameters}
\item{model_str_nonreg}{Character representation of model for non-regularized variables}
\item{model_pars_nonreg}{Model parameters for non-regularized variables}
\item{genes_log_gmean_step1}{log-geometric mean of genes used in initial step of parameter estimation}
\item{cells_step1}{Cells used in initial step of parameter estimation}
\item{arguments}{List of function call arguments}
\item{cell_attr}{Data frame of cell meta data (optional)}
\item{gene_attr}{Data frame with gene attributes such as mean, detection rate, etc. (optional)}
}
\description{
Apply variance stabilizing transformation to UMI count data using a regularized Negative Binomial regression model.
This will remove unwanted effects from UMI data and return Pearson residuals.
Uses future_lapply; you can set the number of cores it will use to n with plan(strategy = "multicore", workers = n).
If n_genes is set, only a (somewhat-random) subset of genes is used for estimating the
initial model parameters.
}
\section{Details}{

In the first step of the algorithm, per-gene glm model parameters are learned. This step can be done
on a subset of genes and/or cells to speed things up.
If \code{method} is set to 'poisson', glm will be called with \code{family = poisson} and
the negative binomial theta parameter will be estimated using the response residuals in
\code{MASS::theta.ml}.
If \code{method} is set to 'nb_fast', glm coefficients and theta are estimated as in the
'poisson' method, but coefficients are then re-estimated using a proper negative binomial
model in a second call to glm with
\code{family = MASS::negative.binomial(theta = theta)}.
If \code{method} is set to 'nb', coefficients and theta are estimated by a single call to
\code{MASS::glm.nb}.
}

\examples{
\donttest{
vst_out <- vst(pbmc)
}

}
