Step 01 — PDX disambiguation
Separate human cells from mouse cells in the barnyard output
Cell Ranger's filtering is species-blind — it keeps high-UMI barcodes without knowing if they are human or mouse. We run on the raw matrix so we can do our own species-aware filtering first. Using the filtered matrix would mean accepting Cell Ranger's species-blind decisions.
For every barcode, UMIs mapping to human (GRCh38) vs mouse (GRCm39) genes are counted. Barcodes with ≥95% human UMIs are kept as pure human. ≤5% = mouse. 5–95% = multiplet. 0 UMI = empty droplet.
| Sample | Total barcodes | Pure human | Mouse | Multiplet | Empty | Human % | Median MT% |
|---|---|---|---|---|---|---|---|
| Treatment1 | 1,847,320 | 1,042,318 | 171,204 | 312,441 | 321,357 | 56.4% | 2.82% |
| Treatment2 | 1,621,903 | 676,633 | 352,814 | 310,109 | 282,347 | 41.7% | 0.52% |
| Control | 1,762,418 | 1,127,948 | 128,256 | 237,626 | 268,588 | 64.0% | 2.94% |
MT% is calculated from the Cell Ranger filtered matrix while the raw matrix is already loaded. These Cell Ranger MT% values are used for filtering in Step 04 — not the post-DecontX MT% values, which are mathematically altered by ambient RNA removal.
Step 02 — DecontX ambient RNA removal
Remove RNA that leaked from lysed cells and contaminated other droplets
| Sample | Cells processed | Median contamination | Cells >50% contam |
|---|---|---|---|
| Treatment1 | 10,791 | 0.296 — elevated | 3,378 (31.3%) |
| Treatment2 | 2,474 | 0.211 — normal | 143 (5.8%) |
| Control | 8,543 | 0.248 — normal | 1,254 (14.7%) |
Step 03 — DropletQC nuclear fraction filter
Remove damaged cells using the ratio of intronic to total reads
| Sample | Total cells | Healthy | Damaged | Excluded |
|---|---|---|---|---|
| Treatment1 | 10,791 | 10,777 (99.9%) | 14 (0.1%) | 14 |
| Treatment2 | 2,474 | 2,468 (99.8%) | 6 (0.2%) | 6 |
| Control | 8,543 | 8,502 (99.8%) | 19 (0.2%) | 19 |
Step 04 — Seurat QC cell filtering
Apply all QC metrics together, remove mouse genes, build clean Seurat objects
| Filter | Threshold | Source | Rationale |
|---|---|---|---|
| Human genes only | GRCh38_ prefix | DecontX matrix | Remove all 33,696 mouse genes permanently |
| MT% | < 10% | Cell Ranger (Step 01) | Author requirement |
| DecontX contamination | < 0.5 | Step 02 | Remove cells where majority of counts are ambient RNA |
| nFeature_RNA | 200 – 7,000 | Author requirement | Remove empty droplets (<200) and likely doublets (>7,000) |
| nCount_RNA | 500 – 50,000 | Standard | Remove debris and multiplets |
| DropletQC | Exclude flagged | Step 03 | Remove physically damaged cells |
| Sample | Into Step 04 | After all filters | % kept |
|---|---|---|---|
| Treatment1 | 10,777 | 5,184 | 48.1% |
| Treatment2 | 2,468 | 2,001 | 81.1% |
| Control | 8,502 | 6,509 | 76.6% |
| Total | 21,747 | 13,694 | 63.0% |
Step 05 — Harmony integration
Merge all 3 samples and correct batch effects while preserving biology
| Parameter | Value | Meaning |
|---|---|---|
| PCA dimensions | 50 | Number of PCs computed — elbow at ~PC 20 |
| Variable genes | 3,000 | Most variable genes used for PCA |
| Harmony theta | 2 | Diversity penalty — controls strength of sample mixing |
| Harmony max iterations | 10 | Convergence limit |
| Group variable | sample | Correct batch effects by treatment sample |
| Sample | Cells in integrated object |
|---|---|
| Treatment1 | 5,184 |
| Treatment2 | 2,001 |
| Control | 6,509 |
| Total | 13,694 |
Step 06 — Clustering + UMAP
Group cells by transcriptomic similarity and visualise in 2D
Clustering groups transcriptomically similar cells into cell types or states. UMAP reduces the high-dimensional space to 2D for visualisation, preserving local neighbourhood structure. Multiple resolutions were tested (0.2, 0.4, 0.6, 0.8, 1.0) — resolution 0.8 was chosen giving 11 well-balanced clusters.