Skip to contents

Downloading the data

The data we will be working with can be downloaded from https://cells.ucsc.edu/autism/rawMatrix.zip. After downloading this, you will get a file called rawMatrix.zip, which, after unzipping, contains

  • barcodes.tsv: A matrix with 104,559 rows (i.e., cells) and 1 column, denoting the cells’ barcodes
  • genes.tsv: A matrix with 65,217 rows (i.e., genes) and 2 columns, denoting the ENSEMBL ID or Gene Symbols
  • matrix.mtx: A sparse matrix with 65,217 rows and 104,559 columns, denoting the counts for each cell-gene pair
  • meta.txt: A matrix with 104,559 rows (i.e., cells) and 16 columns, containing various covariates about each cell/donor

The original paper is from https://pubmed.ncbi.nlm.nih.gov/31097668/.

Setting up in R

You will the following packages. sparseMatrixStats is from Bioconductor.

library(Matrix)
library(Seurat)
library(sparseMatrixStats)

set.seed(10)
date_of_run <- Sys.time()
session_info <- devtools::session_info()

Reading in the data

First, we read in the data. For simplicity in this vignette’s preprocessing, since there are some duplicated genes, we arbitrarily remove any duplicates.

mat <- Matrix::readMM("/Users/kevinlin/Downloads/rawMatrix/matrix.mtx")
gene_mat <- read.csv("/Users/kevinlin/Downloads/rawMatrix/genes.tsv", sep = "\t", header = F)
cell_mat <- read.csv("/Users/kevinlin/Downloads/rawMatrix/barcodes.tsv", sep = "\t", header = F)
metadata <- read.csv("/Users/kevinlin/Downloads/rawMatrix/meta.txt", sep = "\t", header = T, stringsAsFactors = TRUE)
metadata$individual <- factor(paste0("indiv_", metadata$individual))

rownames(mat) <- gene_mat[,2]
colnames(mat) <- cell_mat[,1]

# remove duplicates
if(any(duplicated(rownames(mat)))){
  mat <- mat[-which(duplicated(rownames(mat))),]
}

rownames(metadata) <- metadata$cell
metadata <- metadata[,setdiff(colnames(metadata), "cell")]

Creating Seurat object

Next, we create the Seurat object and filter only the cells that are in Layer 2/3. (We will abbreviate this Seurat object as “asd” for “Autism Spectrum Disorder”.)

asd <- Seurat::CreateSeuratObject(counts = mat,
                                  meta.data = metadata)

asd <- subset(asd, cluster == "L2/3")

asd[["percent.mt"]] <- Seurat::PercentageFeatureSet(asd, pattern = "^MT-")

Preprocessing the Seurat object

We then do basic preprocessing. Mainly, this consists of first remove genes that have no counts at all, then removing cells that have too many counts, then removing genes with too many counts. Then, we aggregate genes that are deemed highly variable, the original DEGs found in the Velmeshev et al. paper, housekeeping genes, and SFARI genes. The union of all these genes are the ones that we set as “highly variable,” as we would like the eSVD-DE analysis later to use these genes.

Once we are done with this step, we are ready to perform the eSVD-DE analysis (in the other vignette).

set.seed(10)
mat <- SeuratObject::LayerData(asd, layer = "counts", assay = "RNA")
mat <- Matrix::t(mat)

# remove genes with all 0's
gene_sum <- Matrix::colSums(mat)
if (any(gene_sum == 0)) {
  features <- SeuratObject::Features(asd)
  features <- setdiff(features, names(gene_sum)[which(gene_sum == 0)])
  asd <- subset(asd, features = features)
}

# remove cells with too high counts
keep_vec <- rep(TRUE, length(Seurat::Cells(asd)))
keep_vec[asd$nCount_RNA >= 60000] <- FALSE
asd[["keep"]] <- keep_vec
asd <- subset(asd, keep == TRUE)

# remove genes with too high counts
mat <- SeuratObject::LayerData(asd, layer = "counts", assay = "RNA")
mat <- Matrix::t(mat)
gene_total <- Matrix::colSums(log1p(mat))
gene_threshold <- log1p(10) * nrow(mat)
features <- colnames(mat)[which(gene_total <= gene_threshold)]
asd <- subset(asd, features = features)

# find the variable genes
asd <- Seurat::NormalizeData(asd)
asd <- Seurat::FindVariableFeatures(asd, selection.method = "vst", nfeatures = 5000)

data("velmeshev_gene_df", package = "eSVD2")
de_genes <- velmeshev_gene_df[velmeshev_gene_df[, "Cell type"] == "L2/3", "Gene name"]

data("housekeeping_df", package = "eSVD2")
hk_genes <- housekeeping_df[, "Gene"]

data("sfari_df", package = "eSVD2")
sfari_genes <- sfari_df[, "gene.symbol"]

gene_keep <- unique(c(
  de_genes,
  hk_genes,
  sfari_genes,
  Seurat::VariableFeatures(asd)
))
rm_idx <- grep("^MT", gene_keep) # remove mitochondrial genes
if(length(rm_idx) > 0) gene_keep <- gene_keep[-rm_idx]
gene_current <- SeuratObject::Features(asd)
gene_keep <- intersect(gene_current, gene_keep)
Seurat::VariableFeatures(asd) <- gene_keep
asd <- subset(asd, features = Seurat::VariableFeatures(asd))

With this done, we can save this minimal Seurat object for the eSVD-DE vignette. This file will be roughly 97 Mb in size.

save(asd, date_of_run, session_info, file = "asd_seurat.RData")

Visualizing the data

It is nice to visualize any dataset before we do any analysis, and eSVD-DE is no different. Hence, we can do some simple preprocessing.

set.seed(10)
asd <- Seurat::ScaleData(asd, features = Seurat::VariableFeatures(asd))
asd <- Seurat::RunPCA(asd,
                      features = Seurat::VariableFeatures(object = asd),
                      verbose = FALSE)

set.seed(10)
asd <- Seurat::RunUMAP(asd, dims = 1:30)
p1 <- Seurat::DimPlot(
  asd,
  reduction = 'umap',
  group.by = 'diagnosis',
  label = TRUE,
  repel = TRUE,
  label.size = 2.5
) + Seurat::NoLegend()
p2 <- Seurat::DimPlot(
  asd,
  reduction = 'umap',
  group.by = 'individual',
  label = TRUE,
  repel = TRUE,
  label.size = 2.5
) + Seurat::NoLegend()
p1 + p2
UMAP of the L2/3 neurons

UMAP of the L2/3 neurons

Setup

The following shows the suggested package versions that the developer (GitHub username: linnykos) used when developing the eSVD2 package.

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.1 (2024-06-14)
##  os       Ubuntu 22.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language en
##  collate  C.UTF-8
##  ctype    C.UTF-8
##  tz       UTC
##  date     2024-07-13
##  pandoc   3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package           * version  date (UTC) lib source
##  abind               1.4-5    2016-07-21 [1] RSPM
##  bslib               0.7.0    2024-03-29 [1] RSPM
##  cachem              1.1.0    2024-05-16 [1] RSPM
##  cli                 3.6.3    2024-06-21 [1] RSPM
##  cluster             2.1.6    2023-12-01 [3] CRAN (R 4.4.1)
##  codetools           0.2-20   2024-03-31 [3] CRAN (R 4.4.1)
##  colorspace          2.1-0    2023-01-23 [1] RSPM
##  cowplot             1.1.3    2024-01-22 [1] RSPM
##  data.table          1.15.4   2024-03-30 [1] RSPM
##  deldir              2.0-4    2024-02-28 [1] RSPM
##  desc                1.4.3    2023-12-10 [1] RSPM
##  devtools            2.4.5    2022-10-11 [1] RSPM
##  digest              0.6.36   2024-06-23 [1] RSPM
##  dotCall64           1.1-1    2023-11-28 [1] RSPM
##  dplyr               1.1.4    2023-11-17 [1] RSPM
##  ellipsis            0.3.2    2021-04-29 [1] RSPM
##  evaluate            0.24.0   2024-06-10 [1] RSPM
##  fansi               1.0.6    2023-12-08 [1] RSPM
##  fastDummies         1.7.3    2023-07-06 [1] RSPM
##  fastmap             1.2.0    2024-05-15 [1] RSPM
##  fitdistrplus        1.2-1    2024-07-12 [1] RSPM
##  fs                  1.6.4    2024-04-25 [1] RSPM
##  future              1.33.2   2024-03-26 [1] RSPM
##  future.apply        1.11.2   2024-03-28 [1] RSPM
##  generics            0.1.3    2022-07-05 [1] RSPM
##  ggplot2             3.5.1    2024-04-23 [1] RSPM
##  ggrepel             0.9.5    2024-01-10 [1] RSPM
##  ggridges            0.5.6    2024-01-23 [1] RSPM
##  globals             0.16.3   2024-03-08 [1] RSPM
##  glue                1.7.0    2024-01-09 [1] RSPM
##  goftest             1.2-3    2021-10-07 [1] RSPM
##  gridExtra           2.3      2017-09-09 [1] RSPM
##  gtable              0.3.5    2024-04-22 [1] RSPM
##  highr               0.11     2024-05-26 [1] RSPM
##  htmltools           0.5.8.1  2024-04-04 [1] RSPM
##  htmlwidgets         1.6.4    2023-12-06 [1] RSPM
##  httpuv              1.6.15   2024-03-26 [1] RSPM
##  httr                1.4.7    2023-08-15 [1] RSPM
##  ica                 1.0-3    2022-07-08 [1] RSPM
##  igraph              2.0.3    2024-03-13 [1] RSPM
##  irlba               2.3.5.1  2022-10-03 [1] RSPM
##  jquerylib           0.1.4    2021-04-26 [1] RSPM
##  jsonlite            1.8.8    2023-12-04 [1] RSPM
##  KernSmooth          2.23-24  2024-05-17 [3] CRAN (R 4.4.1)
##  knitr               1.48     2024-07-07 [1] RSPM
##  later               1.3.2    2023-12-06 [1] RSPM
##  lattice             0.22-6   2024-03-20 [3] CRAN (R 4.4.1)
##  lazyeval            0.2.2    2019-03-15 [1] RSPM
##  leiden              0.4.3.1  2023-11-17 [1] RSPM
##  lifecycle           1.0.4    2023-11-07 [1] RSPM
##  listenv             0.9.1    2024-01-29 [1] RSPM
##  lmtest              0.9-40   2022-03-21 [1] RSPM
##  magrittr            2.0.3    2022-03-30 [1] RSPM
##  MASS                7.3-60.2 2024-04-26 [3] CRAN (R 4.4.1)
##  Matrix            * 1.7-0    2024-04-26 [3] CRAN (R 4.4.1)
##  MatrixGenerics    * 1.16.0   2024-04-30 [1] Bioconduc~
##  matrixStats       * 1.3.0    2024-04-11 [1] RSPM
##  memoise             2.0.1    2021-11-26 [1] RSPM
##  mime                0.12     2021-09-28 [1] RSPM
##  miniUI              0.1.1.1  2018-05-18 [1] RSPM
##  munsell             0.5.1    2024-04-01 [1] RSPM
##  nlme                3.1-164  2023-11-27 [3] CRAN (R 4.4.1)
##  parallelly          1.37.1   2024-02-29 [1] RSPM
##  patchwork           1.2.0    2024-01-08 [1] RSPM
##  pbapply             1.7-2    2023-06-27 [1] RSPM
##  pillar              1.9.0    2023-03-22 [1] RSPM
##  pkgbuild            1.4.4    2024-03-17 [1] RSPM
##  pkgconfig           2.0.3    2019-09-22 [1] RSPM
##  pkgdown             2.1.0    2024-07-06 [1] RSPM
##  pkgload             1.4.0    2024-06-28 [1] RSPM
##  plotly              4.10.4   2024-01-13 [1] RSPM
##  plyr                1.8.9    2023-10-02 [1] RSPM
##  png                 0.1-8    2022-11-29 [1] RSPM
##  polyclip            1.10-6   2023-09-27 [1] RSPM
##  profvis             0.3.8    2023-05-02 [1] RSPM
##  progressr           0.14.0   2023-08-10 [1] RSPM
##  promises            1.3.0    2024-04-05 [1] RSPM
##  purrr               1.0.2    2023-08-10 [1] RSPM
##  R6                  2.5.1    2021-08-19 [1] RSPM
##  ragg                1.3.2    2024-05-15 [1] RSPM
##  RANN                2.6.1    2019-01-08 [1] RSPM
##  RColorBrewer        1.1-3    2022-04-03 [1] RSPM
##  Rcpp                1.0.12   2024-01-09 [1] RSPM
##  RcppAnnoy           0.0.22   2024-01-23 [1] RSPM
##  RcppHNSW            0.6.0    2024-02-04 [1] RSPM
##  remotes             2.5.0    2024-03-17 [1] RSPM
##  reshape2            1.4.4    2020-04-09 [1] RSPM
##  reticulate          1.38.0   2024-06-19 [1] RSPM
##  rlang               1.1.4    2024-06-04 [1] RSPM
##  rmarkdown           2.27     2024-05-17 [1] RSPM
##  ROCR                1.0-11   2020-05-02 [1] RSPM
##  RSpectra            0.16-1   2022-04-24 [1] RSPM
##  Rtsne               0.17     2023-12-07 [1] RSPM
##  sass                0.4.9    2024-03-15 [1] RSPM
##  scales              1.3.0    2023-11-28 [1] RSPM
##  scattermore         1.2      2023-06-12 [1] RSPM
##  sctransform         0.4.1    2023-10-19 [1] RSPM
##  sessioninfo         1.2.2    2021-12-06 [1] RSPM
##  Seurat            * 5.1.0    2024-05-10 [1] RSPM
##  SeuratObject      * 5.0.2    2024-05-08 [1] RSPM
##  shiny               1.8.1.1  2024-04-02 [1] RSPM
##  sp                * 2.1-4    2024-04-30 [1] RSPM
##  spam                2.10-0   2023-10-23 [1] RSPM
##  sparseMatrixStats * 1.16.0   2024-04-30 [1] Bioconduc~
##  spatstat.data       3.1-2    2024-06-21 [1] RSPM
##  spatstat.explore    3.2-7    2024-03-21 [1] RSPM
##  spatstat.geom       3.2-9    2024-02-28 [1] RSPM
##  spatstat.random     3.2-3    2024-02-29 [1] RSPM
##  spatstat.sparse     3.1-0    2024-06-21 [1] RSPM
##  spatstat.utils      3.0-5    2024-06-17 [1] RSPM
##  stringi             1.8.4    2024-05-06 [1] RSPM
##  stringr             1.5.1    2023-11-14 [1] RSPM
##  survival            3.6-4    2024-04-24 [3] CRAN (R 4.4.1)
##  systemfonts         1.1.0    2024-05-15 [1] RSPM
##  tensor              1.5      2012-05-05 [1] RSPM
##  textshaping         0.4.0    2024-05-24 [1] RSPM
##  tibble              3.2.1    2023-03-20 [1] RSPM
##  tidyr               1.3.1    2024-01-24 [1] RSPM
##  tidyselect          1.2.1    2024-03-11 [1] RSPM
##  urlchecker          1.0.1    2021-11-30 [1] RSPM
##  usethis             2.2.3    2024-02-19 [1] RSPM
##  utf8                1.2.4    2023-10-22 [1] RSPM
##  uwot                0.2.2    2024-04-21 [1] RSPM
##  vctrs               0.6.5    2023-12-01 [1] RSPM
##  viridisLite         0.4.2    2023-05-02 [1] RSPM
##  xfun                0.45     2024-06-16 [1] RSPM
##  xtable              1.8-4    2019-04-21 [1] RSPM
##  yaml                2.3.9    2024-07-05 [1] RSPM
##  zoo                 1.8-12   2023-04-13 [1] RSPM
## 
##  [1] /home/runner/work/_temp/Library
##  [2] /opt/R/4.4.1/lib/R/site-library
##  [3] /opt/R/4.4.1/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────