PDX Pipeline QC Report — Demo Project

Step 01 — PDX disambiguation

Separate human cells from mouse cells in the barnyard output

✓ Complete

Script

In-house customized script

Matrix used

raw_feature_bc_matrix

Total cells kept

21,808

pure human barcodes

Avg human purity

54.0%

across 3 samples

Why raw matrix — not filtered

Cell Ranger's filtering is species-blind — it keeps high-UMI barcodes without knowing if they are human or mouse. We run on the raw matrix so we can do our own species-aware filtering first. Using the filtered matrix would mean accepting Cell Ranger's species-blind decisions.

How disambiguation works

For every barcode, UMIs mapping to human (GRCh38) vs mouse (GRCm39) genes are counted. Barcodes with ≥95% human UMIs are kept as pure human. ≤5% = mouse. 5–95% = multiplet. 0 UMI = empty droplet.

Verified results

Sample	Total barcodes	Pure human	Mouse	Multiplet	Empty	Human %	Median MT%
Treatment1	1,847,320	1,042,318	171,204	312,441	321,357	56.4%	2.82%
Treatment2	1,621,903	676,633	352,814	310,109	282,347	41.7%	0.52%
Control	1,762,418	1,127,948	128,256	237,626	268,588	64.0%	2.94%

Treatment2 had only 41.7% human purity — lowest of all samples. This is biologically meaningful: Treatment2 (proteasome inhibitor) preferentially kills human tumour cells while mouse stromal cells survive better, reducing the human fraction. Not a technical problem.

Why MT% is calculated here

MT% is calculated from the Cell Ranger filtered matrix while the raw matrix is already loaded. These Cell Ranger MT% values are used for filtering in Step 04 — not the post-DecontX MT% values, which are mathematically altered by ambient RNA removal.

QC metrics — pre-DecontX

Outputs

{sample}_pure_human_barcodes.txt {sample}_barnyard_summary.csv {sample}_MT_pct_cellranger.csv plots/{sample}_step01_barnyard_MT.pdf step01_summary.csv

Step 02 — DecontX ambient RNA removal

Remove RNA that leaked from lysed cells and contaminated other droplets

✓ Complete

Package

celda 1.22.0

Matrix used

filtered ∩ human

Cells processed

21,808

across 3 samples

Contam threshold

> 0.5

filtered in Step 04

DecontX vs MT% — why both are needed

MT% and DecontX solve completely different problems. MT% decides which cells to keep. DecontX fixes what those cells appear to express. You need both.

MT% filtering

Removes low-quality CELLS. Answers: "is this a good cell worth keeping?"

DecontX correction

Fixes gene expression COUNTS inside kept cells. Answers: "are this cell's counts accurate?"

Verified results

Sample	Cells processed	Median contamination	Cells >50% contam
Treatment1	10,791	0.296 — elevated	3,378 (31.3%)
Treatment2	2,474	0.211 — normal	143 (5.8%)
Control	8,543	0.248 — normal	1,254 (14.7%)

Decision: filter cells with contamination score > 0.5 in Step 04. DecontX has already corrected counts for cells below this threshold.

Contamination distribution

Outputs

{sample}/decontx_corrected_matrix/ {sample}/decontx_contamination.csv

Step 03 — DropletQC nuclear fraction filter

Remove damaged cells using the ratio of intronic to total reads

✓ Complete

Package

DropletQC 0.0.0.9000

NF threshold

> 0.75

= damaged cell

Total removed

0.18% of all cells

Verified results

Sample	Total cells	Healthy	Damaged	Excluded
Treatment1	10,791	10,777 (99.9%)	14 (0.1%)	14
Treatment2	2,474	2,468 (99.8%)	6 (0.2%)	6
Control	8,543	8,502 (99.8%)	19 (0.2%)	19

Total removed: 39 cells (0.18% of all cells). Small but genuine — these cells would have created spurious states in clustering.

Nuclear fraction distribution

Outputs

{sample}_nuclear_fraction.csv {sample}_dropletqc_exclude.txt {sample}_dropletqc_keep.txt plots/{sample}_step03_nuclear_fraction.pdf

Step 04 — Seurat QC cell filtering

Apply all QC metrics together, remove mouse genes, build clean Seurat objects

✓ Complete

Package

Seurat 5.4.0

Cells before

21,747

into Step 04

Cells after

13,694

passed all filters

Retention rate

62.9%

across all samples

All filters applied

Filter	Threshold	Source	Rationale
Human genes only	GRCh38_ prefix	DecontX matrix	Remove all 33,696 mouse genes permanently
MT%	< 10%	Cell Ranger (Step 01)	Author requirement
DecontX contamination	< 0.5	Step 02	Remove cells where majority of counts are ambient RNA
nFeature_RNA	200 – 7,000	Author requirement	Remove empty droplets (<200) and likely doublets (>7,000)
nCount_RNA	500 – 50,000	Standard	Remove debris and multiplets
DropletQC	Exclude flagged	Step 03	Remove physically damaged cells

QC metrics — post-filter

Verified cells after all filters

Sample	Into Step 04	After all filters	% kept
Treatment1	10,777	5,184	48.1%
Treatment2	2,468	2,001	81.1%
Control	8,502	6,509	76.6%
Total	21,747	13,694	63.0%

Outputs

{sample}_seurat_clean.rds {sample}_final_barcode_summary.csv plots/step04_postQC_violin.pdf

Step 05 — Harmony integration

Merge all 3 samples and correct batch effects while preserving biology

✓ Complete

Method

Harmony

PCs used

elbow at ~PC 20

Variable genes

3,000

used for PCA

Theta

diversity penalty

Parameters used

Parameter	Value	Meaning
PCA dimensions	50	Number of PCs computed — elbow at ~PC 20
Variable genes	3,000	Most variable genes used for PCA
Harmony theta	2	Diversity penalty — controls strength of sample mixing
Harmony max iterations	10	Convergence limit
Group variable	sample	Correct batch effects by treatment sample

Harmony successfully removed sample-level technical variation while preserving biological differences. Cells now cluster by cell type rather than by which sample they came from.

UMAP — before vs after Harmony

Verified results

Sample	Cells in integrated object
Treatment1	5,184
Treatment2	2,001
Control	6,509
Total	13,694

Outputs

integrated_harmony.rds plots/step05_elbow_plot.pdf plots/step05_umap_before_after.pdf step05_summary.csv

Step 06 — Clustering + UMAP

Group cells by transcriptomic similarity and visualise in 2D

✓ Complete

Algorithm

Leiden (alg 4)

Resolution

0.8

11 clusters

Dims used

1:20

elbow cutoff

Total cells

13,694

in final object

Why this step exists

Clustering groups transcriptomically similar cells into cell types or states. UMAP reduces the high-dimensional space to 2D for visualisation, preserving local neighbourhood structure. Multiple resolutions were tested (0.2, 0.4, 0.6, 0.8, 1.0) — resolution 0.8 was chosen giving 11 well-balanced clusters.

Resolution 0.8 selected — 11 clusters with 800–2,100 cells each. Well-balanced and biologically interpretable.

UMAP — clusters and sample identity

Outputs

integrated_harmony_clustered.rds plots/step06_umap_clusters.pdf step06_cluster_summary.csv

Patient-derived xenograft (PDX) Pipeline QC Report — Demo Project

Step 01 — PDX disambiguation

Step 02 — DecontX ambient RNA removal

Step 03 — DropletQC nuclear fraction filter

Step 04 — Seurat QC cell filtering

Step 05 — Harmony integration

Step 06 — Clustering + UMAP