Cell Ranger Count v7.2.0[1] was used with default parameters to do
preprocessing, QC, alignment and counting. The filtered feature matrices
were used as input of Seurat v5.0[2,3] for downstream analysis.
The cellranger html report provides qc of the analysis
including Run Summary, Sequencing, Mapping and Cells. Here is an example of the report named
web_summary.html.
The result from cellranger is in the folder
cellrangerResult.
We created a Seurat object for each sample using CreateSeuratObject
and merged them together. Sample and Group information were added to the
meta data of the merged object. We conduct three steps of filtering.
First, we filtered the cells by discarding the cells that
have number of UMIs lower than 500, number of features lower than 200
and mitochondrial percentage higher than 20%.
Second, within
our data we will have many genes with zero counts. These genes can
dramatically reduce the average expression for a cell and so we will
remove them from our data. We will remove genes that have zero
expression in all cells. If a gene is only expressed in a handful of
cells, it is not particularly meaningful as it still brings down the
averages for all other cells it is not expressed in. For our data we
choose to keep only genes which are expressed in 10 or more cells.
Third, we will discard the cell cycle genes in the count
matrix to remove the noise effect the cell cycle genes may introduce in
clustering and cell type annotation. We can turn off this filtering step
if the client requires to keep the cell cycle genes.
Here are the example of selected figures for QC. We provide these
figures both before and after filtering:
1) Basic qc of number of
genes, number of UMIs and percentage of mitochondrial genes.
2) Number of cells each sample identified
3) Number of genes per cell each sample identified
4) log10 Genes Per UMI which shows the complexity of the
experiment
The QC figures are stored in folder
Figures/01.QC.
After splitting the merged object to multiple layers, we used the
first 30 dimensions from a Canonical Correlation Analysis to identify
integration anchors using Seurat’s FindIntegrationAnchors function with
default parameters. Seurat’s IntegrateData function was then used for
integration using these anchors.
We performed principal
component analysis (PCA) using the integrated Seurat object to identify
the cell clusters using k-nearest neighbor clustering and visualized
this in UMAP space (using the default parameters).
Cell types
were assigned to each cluster based on marker gene expression, which is
customized for each project according to the origin of the sample. The
marker genes of these cell types were obtained from CellMarker v2.0[4].
Here is an example of umap based on cell types:
The result in this part is stored in
Figures/02.Umap_all and Tables.
Violin Plots and Dot Plots were extensively used to display the
expression profiles of marker genes for each identified cell type,
facilitating the annotations of cell types based on these gene
markers.Here is an example of Fibroblasts marker gene Acta2 and Fap
volin plot:
Here is an example of Dotplot for all marker genes:
Cell proportions and numbers of each cell type across conditions were
visualized by barplot.
Here is an example of bar plots for cell
proportions and numbers of each cell type across samples:
The result in this part is stored in
Figures/03.Barplot_cellType_proportion_num
To compare the cell proportions and number of cells in each cell type
across conditions, boxplot and wilcoxon rank sum test were used. Here is
an example of box plots for comparison across conditions for cell
proportions and numbers of each cell type:
For projects where each condition includes only one sample,
making box plots unsuitable, we provide bar plots to compare conditions.
Below is an example of a bar plot illustrating the comparison of cell
proportions and numbers for each cell type across conditions:
The result in this part is stored in Figures/04.Boxplot_cellType_proportion_num_comparison
Following the initial characterization, we proceeded with
differential expression (DE) analysis across the different cell types.
We generated comprehensive tables listing genes that are differentially
expressed between each cell type and all others, which helps in
understanding the unique gene signatures that define each cell type. For
each cell type, heatmaps were also created to depict the top five DE
genes, highlighting the most significant changes in gene expression.
Here is an example of heatmap of top5 DE genes across cell types:
Statistical analysis on the number of DE genes across all cell
types was provided to illustrate the variation in gene expression among
different cell populations. Here is one example of barplot for number of
up and down regulated genes across cell types:
Enrichment analysis of GO and KEGG for DE genes in each cell
type are conducted, which are stored in
DE_cellType/GO_KEGG.
The result in this part is
stored in DE_celltype.
Example of volcano plot of DE genes in one comparison
Example of violin plot of top6 DE genes across conditions
Example of dot plot of top6 DE genes across conditions
Example of feature plot of one DE gene across conditions
Enrichment analysis of GO and KEGG for DE genes between
conditions are conducted, which are stored in
DE_condition/GO_KEGG.
The result in this part
is stored in DE_condition.
Example of pseudotime (Left) and trajectory (Right) analysis
The result in this part is stored in
Pseudotime_analysis
Example of subcell types for Mono cells.
This section is customized and will only be conducted when the client specifically requests subtype analysis for a particular cell type. The result in this part is stored in Cell_subtype_analysis
We compared the expression of the genes across conditions by wilcoxon rank sum test, which was visualized by feature plot, violin plot and dot plot, same as the top ranked DE genes in differential expression (DE) analysis between conditions.
This section is customized and will be conducted only when the client specifically requests it and provides a list of genes of interest. The result in this part is stored in Figures/05.Interested_gene_comparison_plots
The QC stats and parameters for cite-seq-count.
We used Seurat to read in hashtag count matrix generated by
CITE-seq-count as a new assay independent from RNA. HTODemux function in
Seurat is applied to assign single cellls back to their sample origin. A
single cell is considered as singlet if the quantile of inferred
‘negative’ distribution is over 0.99. If the cells are assigned as
“negative”, “doublet”, or “unmapped”, they will be removed from the
downstream analysis.
UMAP of hashtag classification (Left) and assignment by max hashtag count sample (Right)
This section is customized and will be conducted only when the client specifically requests it and provides hashtag data.
Table1: The List of software used in the analysis pipeline.
| Software | Version |
|---|---|
| Cellranger | 7.2.0 |
| R-Seurat | 5.0 |
| R-clusterProfiler | 3.18.1 |
| R-slingshot | 1.8.0 |
| R | 4.2.0 |