Optimize eSVD for matrices or sparse matrices.

Usage

# Default S3 method
opt_esvd(
  input_obj,
  x_init,
  y_init,
  z_init = NULL,
  covariates = NULL,
  family = "poisson",
  l2pen = 0.1,
  library_multipler = rep(1, nrow(input_obj)),
  max_iter = 100,
  nuisance_vec = rep(NA, ncol(input_obj)),
  offset_variables = NULL,
  tol = 1e-06,
  verbose = 0,
  ...
)

Arguments

input_obj: Dataset (either matrix or dgCMatrix) where the $n$ rows represent cells and $p$ columns represent genes. The rows and columns of the matrix should be named.
x_init: Initial matrix of the cells' latent vectors that is $n$ rows and $k$ columns. The row names should be the same as input_obj.
y_init: Initial matrix of the genes' latent vectors that is $p$ rows and $k$ columns. The row names should be the same as the column names of input_obj.
z_init: Initial matrix of the genes' coefficient vectors that is $p$ rows and ncol(covariates) columns. The row names should be the same as the column names of input_obj, and the column names should be the same as covariates.
covariates: matrix object with $n$ rows with the same rownames as input_obj where the columns represent the different covariates. Notably, this should contain only numerical columns (i.e., all categorical variables should have already been split into numerous indicator variables).
family: String among "gaussian", "curved_gaussian", "exponential", "poisson", "neg_binom", "neg_binom2", or "bernoulli". Notably, with exception of "neg_binom2", all the other families are parameterized such that eSVD is fitting the dot product to be the canonical parameter of these expoential-family distributions. For "neg_binom2", the dot product is the log-mean of the distribution (i.e., similar to the canonical parameterization of the Poisson family).
l2pen: Small positive number for the amount of penalization for both the cells' and the genes' latent vectors as well as the coefficients.
library_multipler: Vector of positive numerics of length $n$. It is the multiplier such that the variance of cell i's entries is the mean of cell i's entries times the square-root of cell i's value in library_multipler (entry-wise). This is used as an alternative interpretation of how library-size affects a cell's gene expression (instead of using the library size as a covariate to be regressed out).
max_iter: Positive integer for number of iterations.
nuisance_vec: Vector of non-negative numerics (or NA's) of length $p$, representing each gene's nuisance parameter when using an exponential-family distribution that requires one. It is used only when family is "curved_gaussian" or "neg_binom" or "neg_binom2".
offset_variables: A vector of strings depicting which column names in input_obj$covariate be treated as an offset during the optimization (i.e., their coefficients will not change throughout the optimization).
tol: Small positive number to differentiate between zero and non-zero.
verbose: Integer
...: Additional parameters

Value

a list with elements x_mat, y_mat, z_mat, library_multiplier, loss, nuisance_vec and param.