Single Cell Gene Expression Data Analysis Sample Report

1. GEX Cell Ranger analysis (Basic)

Cell Ranger Count v7.2.0[1] was used with default parameters to do preprocessing, QC, alignment and counting. The filtered feature matrices were used as input of Seurat v5.0[2,3] for downstream analysis.

The cellranger html report provides qc of the analysis including Run Summary, Sequencing, Mapping and Cells. Here is an example of the report named web_summary.html.

The result from cellranger is in the folder cellrangerResult.

2. QC and filtering (Basic)

We created a Seurat object for each sample using CreateSeuratObject and merged them together. Sample and Group information were added to the meta data of the merged object. We conduct three steps of filtering.

First, we filtered the cells by discarding the cells that have number of UMIs lower than 500, number of features lower than 200 and mitochondrial percentage higher than 20%.

Second, within our data we will have many genes with zero counts. These genes can dramatically reduce the average expression for a cell and so we will remove them from our data. We will remove genes that have zero expression in all cells. If a gene is only expressed in a handful of cells, it is not particularly meaningful as it still brings down the averages for all other cells it is not expressed in. For our data we choose to keep only genes which are expressed in 10 or more cells.

Third, we will discard the cell cycle genes in the count matrix to remove the noise effect the cell cycle genes may introduce in clustering and cell type annotation. We can turn off this filtering step if the client requires to keep the cell cycle genes.

Here are the example of selected figures for QC. We provide these figures both before and after filtering:
1) Basic qc of number of genes, number of UMIs and percentage of mitochondrial genes.

2) Number of cells each sample identified

3) Number of genes per cell each sample identified

4) log10 Genes Per UMI which shows the complexity of the experiment

The QC figures are stored in folder Figures/01.QC.

3. Integration, clustering, cell type annotation (Basic)

After splitting the merged object to multiple layers, we used the first 30 dimensions from a Canonical Correlation Analysis to identify integration anchors using Seurat’s FindIntegrationAnchors function with default parameters. Seurat’s IntegrateData function was then used for integration using these anchors.

We performed principal component analysis (PCA) using the integrated Seurat object to identify the cell clusters using k-nearest neighbor clustering and visualized this in UMAP space (using the default parameters).

Cell types were assigned to each cluster based on marker gene expression, which is customized for each project according to the origin of the sample. The marker genes of these cell types were obtained from CellMarker v2.0[4].
Here is an example of umap based on cell types:

The result in this part is stored in Figures/02.Umap_all and Tables.

4. cell type marker gene plots (Basic)

Violin Plots and Dot Plots were extensively used to display the expression profiles of marker genes for each identified cell type, facilitating the annotations of cell types based on these gene markers.Here is an example of Fibroblasts marker gene Acta2 and Fap volin plot:

Here is an example of Dotplot for all marker genes:

5. Barplot of number and proportion of cell types (Basic)

Cell proportions and numbers of each cell type across conditions were visualized by barplot.
Here is an example of bar plots for cell proportions and numbers of each cell type across samples:

The result in this part is stored in Figures/03.Barplot_cellType_proportion_num

6. Boxplot or barplot for comparison of cell type number and proportion of cells across conditions (Extended)

To compare the cell proportions and number of cells in each cell type across conditions, boxplot and wilcoxon rank sum test were used. Here is an example of box plots for comparison across conditions for cell proportions and numbers of each cell type:

For projects where each condition includes only one sample, making box plots unsuitable, we provide bar plots to compare conditions. Below is an example of a bar plot illustrating the comparison of cell proportions and numbers for each cell type across conditions:

The result in this part is stored in Figures/04.Boxplot_cellType_proportion_num_comparison

7. Differential expression analysis across cell types (Extended)

Following the initial characterization, we proceeded with differential expression (DE) analysis across the different cell types. We generated comprehensive tables listing genes that are differentially expressed between each cell type and all others, which helps in understanding the unique gene signatures that define each cell type. For each cell type, heatmaps were also created to depict the top five DE genes, highlighting the most significant changes in gene expression. Here is an example of heatmap of top5 DE genes across cell types:

Statistical analysis on the number of DE genes across all cell types was provided to illustrate the variation in gene expression among different cell populations. Here is one example of barplot for number of up and down regulated genes across cell types:

Enrichment analysis of GO and KEGG for DE genes in each cell type are conducted, which are stored in DE_cellType/GO_KEGG.

The result in this part is stored in DE_celltype.

8. Differential expression (DE) of genes between conditions for all cells and in each cell type (Extended)

For differential expression analysis between conditions, we offer two modes: replicate and non-replicate. Differential expression analysis was conducted across all cells and individually for each cell type. Additionally, DE analysis can accommodate multiple comparison groups.

Replicate mode is designed for projects that include replicated samples in each condition. We begin by aggregating all cells within the same sample and cell type using the AggregateExpression function, which returns a Seurat object where each ‘cell’ represents the pseudobulk profile of one cell type per sample. Subsequent cell-type-specific differential expression analysis between two conditions is performed using DESeq2. We used cutoff of p value <0.05 absolute log2FC > 0.585 (fold change 1.5) to obtain the differentially expressed genes.

Non-replicate mode is tailored for projects with only one sample per condition. In this scenario, we employ the FindMarkers function from the Seurat package to perform differential expression analysis.

We generate tables of differentially expressed genes for each comparison group and provide a volcano plot for each comparison. Additionally, we produce violin plots, dot plots, and feature plots for the top 6, top 12, and top 18 genes, respectively. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses are conducted for the differentially expressed (DE) genes to identify enriched biological pathways and functions. The following are the examples of plots for DE analysis:

Example of volcano plot of DE genes in one comparison

Example of violin plot of top6 DE genes across conditions

Example of dot plot of top6 DE genes across conditions

Example of feature plot of one DE gene across conditions

Enrichment analysis of GO and KEGG for DE genes between conditions are conducted, which are stored in DE_condition/GO_KEGG.

The result in this part is stored in DE_condition.

9. Pseudotime and Trajectory Analysis (Customized)

Single-cell Pseudotime and trajectory analysis are computational methods used to infer the developmental progression and lineage relationships of individual cells within a population. These approaches leverage single-cell transcriptomic data to unravel the temporal ordering and spatial relationships of cells, enabling the reconstruction of developmental trajectories and the identification of key regulatory events during cellular development. Pseudotime analysis orders cells along a hypothetical trajectory, representing their progression from an undifferentiated state to a mature cell type.

Left plot below displays the Pseudotime analysis performed using R-package Slingshot. Trajectory analysis maps the branching patterns and interconnections between different cell lineages, providing insights into cellular differentiation and dynamic processes such as cell fate decisions and cell state transitions. We performed the Trajectory analysis using Monocle2 packages and is shown in right plot below.

Example of pseudotime (Left) and trajectory (Right) analysis

The result in this part is stored in Pseudotime_analysis

10. Cell subtype for a certain cell type (Customized)

We offer analysis services for identifying subtypes within a specific cell type. Our workflow begins with basic analyses outlined in Sections 3–5, where we subset cells of the targeted cell type, perform clustering, annotate the subtypes, and generate UMAP plots. Visualizations include marker gene expression plots for the identified subtypes and bar plots illustrating cell type proportions and numbers across samples. Additionally, upon client request, we can create box plots or bar plots to compare the numbers and proportions of each subtype under different conditions. Furthermore, we can conduct differential gene expression analysis between subtypes and across conditions, similar to the extended analyses detailed in Sections 6–8. The following is an example of umap of cell subtypes in a specific cell type:

Example of subcell types for Mono cells.

This section is customized and will only be conducted when the client specifically requests subtype analysis for a particular cell type. The result in this part is stored in Cell_subtype_analysis

11. Comparison of expressions of interested genes across conditions (Customized)

We compared the expression of the genes across conditions by wilcoxon rank sum test, which was visualized by feature plot, violin plot and dot plot, same as the top ranked DE genes in differential expression (DE) analysis between conditions.

This section is customized and will be conducted only when the client specifically requests it and provides a list of genes of interest. The result in this part is stored in Figures/05.Interested_gene_comparison_plots

12. hashtag analysis (Customized).

Cell Hashing uses oligo-tagged antibodies against ubiquitously expressed surface proteins to place a “sample barcode” on each single cell, enabling different samples to be multiplexed together and run in a single experiment. The processing of Hash tagged single cell data anlaysis is similar to GEX single cell analysis except for this two specific steps after GEX Cell Ranger analysis: #### Calculate Hashtag CountS and QC CITE-seq-Count is used to quantify the hashtag counts associated with the cell barcodes that were generated by GEX Cell Ranger analysis. The QC stats and parameters used for analysis will be shown as below. We will use hashtag UMI count for downstream analysis.

The QC stats and parameters for cite-seq-count.

Hashtag clustering and classifying

We used Seurat to read in hashtag count matrix generated by CITE-seq-count as a new assay independent from RNA. HTODemux function in Seurat is applied to assign single cellls back to their sample origin. A single cell is considered as singlet if the quantile of inferred ‘negative’ distribution is over 0.99. If the cells are assigned as “negative”, “doublet”, or “unmapped”, they will be removed from the downstream analysis.

UMAP of hashtag classification (Left) and assignment by max hashtag count sample (Right)

This section is customized and will be conducted only when the client specifically requests it and provides hashtag data.

Citation

1, Zheng, Grace X.Y., Terry, Jessica M. et al. Massively parallel digital transcriptional profiling of single cells. Nature Communications. 8: 1-12 (2017).
2, Hao, Y., Stuart, T., Kowalski, M.H. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 42, 293–304 (2024).
3, Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
4, Hu C, Li T, Xu Y. et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res (2023).
5, Cell Ranger, Zheng, Grace XY et al.,2017
6, clusterProfiler 4.0, Wu, Tianzhi et al.,2021
7, Monocle 2, Qiu, Xiaojie et al., 2017
8, Slingshot, Street, Kelly et al., 2018

Appendix

Table1: The List of software used in the analysis pipeline.

Software	Version
Cellranger	7.2.0
R-Seurat	5.0
R-clusterProfiler	3.18.1
R-slingshot	1.8.0
R	4.2.0

Single Cell Gene Expression Data Analysis Sample Report

2024-11-15

Analysis Schema

Basic analysis

1. GEX Cell Ranger analysis

2. QC and filtering

3. Integration, clustering, cell type annotation

4. Cell type marker gene plots

5. Barplot of number and proportion of cells for each cell type

Extended analysis

6. Boxplot or barplot for comparison of cell type number and proportion of cells across conditions

7. Differential expression analysis across cell types

8. Differential expression of genes between conditions for all cells and in each cell type

Customized analysis

9. Pseudotime and trajectory analysis

10. Cell subtype identification for a certain cell type

11. Comparison of expressions of interested genes across conditions

12. hashtag analysis