Format covariates
format_covariates.RdMainly, this method splits the categorical variables (which should be `factor` variables)
into indicator variables (i.e.,
one-hot encoding), dropping the last level, and then rescales
all the numerical variables (but does not center them),
and computes the "Log_UMI" (i.e., log total counts) for each cell.
"Log_UMI" is added as its own column.
Usage
format_covariates(
dat,
covariate_df,
bool_center = FALSE,
rescale_numeric_variables = NULL,
variables_enumerate_all = NULL
)Arguments
- dat
Dataset (either
matrixordgCMatrix) where the \(n\) rows represent cells and \(p\) columns represent genes. The rows and columns of the matrix should be named.- covariate_df
data.framewhere each row represents a cell, and the columns are the different categorical or numerical variables that you wish to adjust for- bool_center
Boolean if the numerical variables should be centered around zero, default is
FALSE- rescale_numeric_variables
A vector of strings denoting the column names in
covariate_dfthat are numerical and you wish to rescale- variables_enumerate_all
If not
NULL, this allows you to control specifically whichfactorvariables incovariate_dfyou would like to split into indicators. By default, this isNULL, meaning all thefactorvariables are split into indicators