Format covariates
format_covariates.Rd
Mainly, this method splits the categorical variables (which should be `factor` variables)
into indicator variables (i.e.,
one-hot encoding), dropping the last level, and then rescales
all the numerical variables (but does not center them),
and computes the "Log_UMI"
(i.e., log total counts) for each cell.
"Log_UMI"
is added as its own column.
Usage
format_covariates(
dat,
covariate_df,
bool_center = FALSE,
rescale_numeric_variables = NULL,
variables_enumerate_all = NULL
)
Arguments
- dat
Dataset (either
matrix
ordgCMatrix
) where the \(n\) rows represent cells and \(p\) columns represent genes. The rows and columns of the matrix should be named.- covariate_df
data.frame
where each row represents a cell, and the columns are the different categorical or numerical variables that you wish to adjust for- bool_center
Boolean if the numerical variables should be centered around zero, default is
FALSE
- rescale_numeric_variables
A vector of strings denoting the column names in
covariate_df
that are numerical and you wish to rescale- variables_enumerate_all
If not
NULL
, this allows you to control specifically whichfactor
variables incovariate_df
you would like to split into indicators. By default, this isNULL
, meaning all thefactor
variables are split into indicators