RNA-seq Analysis Report — Demo Drug-Set Kit

📋Project Summary

✓ Overall QC Status: PASS. Trimming improved unique mapping from 62.8% to 68.1%. Valid barcode rate is 95.6% on trimmed reads. mRNA purity is excellent (>91%). Strand specificity confirms reverse-stranded library.

wells

Drug Conditions (A01–H12)

1.31

B read pairs

Reads After Trimming

68.1%

Uniquely Mapped (trimmed)

95.6%

Valid Barcodes

97.0%

Correct Strand Reads

91.2%

mRNA Bases (Picard)

55.2%

Reads Mapped to Gene (unique)

0.44%

Ribosomal Bases

🧬Sample & Library Information

Pool ID	Raw Read Pairs	After Trimming	Avg. Read Length	CB Length	UMI Length	Barcode Wells	Reference
Sample_1	1,416,936,193	1,305,994,649 (92.2%)	151 bp	14 bp	14 bp	96 (A01–H12)	GRCh38 v113

ℹ This is a pooled drug-screening library. All 96 drug conditions were sequenced in a single pooled lane. Individual well (drug condition) assignments are recovered via cell barcodes (CB) embedded in Read 1 (positions 1–14). Positions 15–28 of Read 1 carry the UMI for deduplication.

🔲96-Well Plate Design

Plate Layout (Demo — A01 to H12)

A01

A02

A03

A04

A05

A06

A07

A08

A09

A10

A11

A12

B01

B02

B03

B04

B05

B06

B07

B08

B09

B10

B11

B12

C01

C02

C03

C04

C05

C06

C07

C08

C09

C10

C11

C12

D01

D02

D03

D04

D05

D06

D07

D08

D09

D10

D11

D12

E01

E02

E03

E04

E05

E06

E07

E08

E09

E10

E11

E12

F01

F02

F03

F04

F05

F06

F07

F08

F09

F10

F11

F12

G01

G02

G03

G04

G05

G06

G07

G08

G09

G10

G11

G12

H01

H02

H03

H04

H05

H06

H07

H08

H09

H10

H11

H12

Each well represents one drug condition. Barcodes were provided as a 96-entry whitelist to STARsolo for demultiplexing.

Barcode & UMI Design

Parameter	Value
Barcode (CB) position	R1, bases 1–14
UMI position	R1, bases 15–28
RNA read	R2 (full length)
CB length	14 bp
UMI length	14 bp
Barcode whitelist	96 sequences (A01–H12)
STAR CB_UMI mode	CB_UMI_Simple

🔬FastQC Metrics (Raw Reads)

QC Module Status — Sample_1 (Raw)

QC Module	R1	R2
Basic Statistics	PASS	PASS
Per-base Sequence Quality	FAIL	PASS
Per-sequence Quality Scores	WARN	PASS
Per-base Sequence Content	FAIL	FAIL
Per-sequence GC Content	WARN	FAIL
Sequence Length Distribution	PASS	PASS
Sequence Duplication Levels	WARN	WARN
Adapter Content	PASS	FAIL

Raw Read Stats — Sample_1

Metric	R1	R2
Total Sequences	1,305,994,649	1,305,994,649
Total Bases	194.4 Gbp	196.5 Gbp
% Duplicates	47.6%	91.4%
% GC	36%	42%
Avg. Read Length	148.9 bp	150.5 bp
% FastQC Modules Failed	18.2%	36.4%

ℹ R1 note: R1 carries the 14 bp cell barcode + 14 bp UMI (28 bp total), so per-base sequence content failures in R1 are expected — they reflect the designed barcode structure, not library quality issues. Adapter contamination flagged in R2 is addressed in the trimming step.

✂️Read Trimming (Trimmomatic)

Trimmomatic PE Summary — Sample_1

Category	Read Pairs	%
Input Read Pairs	1,416,936,193	100%
Both Reads Surviving	1,305,994,649	92.17%
Forward Only Surviving	10,717,215	0.76%
Reverse Only Surviving	2,943,041	0.21%
Dropped	97,281,288	6.87%

Settings: ILLUMINACLIP (custom adapter file, 2:30:10:2 keepBothReads), LEADING:3, TRAILING:3, MINLEN:36

Read Survival Rate

✓ 92.17% of read pairs passed trimming. Trimming removed adapter contamination flagged in R2 by FastQC and improved unique alignment rate from 62.76% to 68.07% (+5.3%). The trimmed dataset is used for all downstream analyses.

🗺️STAR Alignment — Raw vs Trimmed Comparison

Alignment Summary Comparison — Sample_1

Metric	Raw Reads	Trimmed Reads	Change
Input Reads	1,416,936,193	1,305,994,649	−110.9M
Uniquely Mapped	889,314,095 (62.76%)	888,946,887 (68.07%)	+5.31%
Mapped to Too Many Loci	73,134,701 (5.16%)	73,918,443 (5.66%)	±
Unmapped: Too Short	443,681,260 (31.31%)	342,036,045 (26.19%)	−5.12%
Unmapped: Other	10,806,137 (0.76%)	1,093,274 (0.08%)	−0.68%
Avg. Mapped Length	144.51 bp	144.39 bp	—
Total Splices	261,050,817	259,658,717	—
Annotated Splices %	95.7%	96.3%	+0.6%
Mismatch Rate	0.47%	0.47%	—
Runtime	5 hr 10 min	3 hr 54 min	−1.3 hr

Mapping Rate Comparison

Read Mapping Distribution (Trimmed — Final)

🏷️Barcode Demultiplexing (STARsolo)

STARsolo Summary — Raw vs Trimmed

Metric	Raw Reads	Trimmed Reads	Change
Total Reads	1,416,936,193	1,305,994,649	—
Reads with Valid Barcodes	92.25%	95.59%	+3.34%
Q30 Bases in CB+UMI	97.01%	96.96%	—
Q30 Bases in RNA Read	86.64%	88.34%	+1.70%
Reads Mapped to Genome (Unique)	62.76%	68.07%	+5.31%
Reads Mapped to Gene (Unique)	50.84%	55.17%	+4.33%
Sequencing Saturation	—	—	—

Barcode QC Chart

✓ After trimming, 95.6% of reads carried a valid barcode from the 96-well whitelist. Q30 quality in the CB+UMI region is 97%, confirming high barcode read accuracy and reliable well demultiplexing. Valid barcodes are matched against the 96-well whitelist for well demultiplexing.

📊Picard RNA Metrics

RNA-seq Quality Metrics — Sample_1

Metric	Value
PF Bases	10,697,866,498
PF Aligned Bases	6,425,697,914
Ribosomal Bases	28,130,072 (0.44%)
Coding Bases	2,102,697,131 (32.7%)
UTR Bases	3,756,799,397 (58.5%)
Intronic Bases	349,730,910 (5.4%)
Intergenic Bases	188,340,404 (2.9%)
% mRNA Bases	91.2%
% Usable Bases	54.8%
% Correct Strand Reads	97.0%
Median CV Coverage	1.096
Median 5′ Bias	0.021
Median 3′ Bias	2.215
Median 5′‑3′ Bias Ratio	0.010

Base Distribution by Genomic Region

⚠ 3′ Bias Detected. Median 3′ bias (2.22) and low 5′‑3′ ratio (0.01) indicate strong 3′ enrichment, characteristic of poly(A)-enriched RNA-seq libraries. Coverage uniformity (CV = 1.10) is within acceptable range. This does not affect differential expression analysis when all samples are prepared consistently.

⚙️Pipeline Summary

Analysis Steps

#	Step	Tool	Input	Status
1	FastQ Quality Control	FastQC + MultiQC	Raw FASTQ (R1, R2)	COMPLETE
2	Adapter & Quality Trimming	Trimmomatic PE	Raw FASTQ	COMPLETE
3	Genome Alignment + Barcode Demux	STAR 2.7.10b (STARsolo)	Trimmed FASTQ + 96-well whitelist	COMPLETE
4	RNA Quality Metrics	Picard CollectRnaSeqMetrics	BAM (subset for picard QC)	COMPLETE
8	Differential Expression	DESeq2 (R)	Count matrix; output refer to mRNAseq demo report	By Request

Delivered Files

File / Directory	Description	Size
`01.FastqQualityCheck/`	FastQC HTML reports (R1, R2) + MultiQC aggregate report	~2.8 MB
`02.BamFiles/Aligned.sortedByCoord.out.bam`	STAR-aligned BAM file, sorted by coordinate	~ 14 GB
`02.BamFiles/Aligned.sortedByCoord.out.bam.bai`	BAM index file	14 MB
`03.rnaMetrics/`	STAR + Picard MultiQC reports; alignment & RNA metrics	~928 KB
`Solo.out/Gene/Summary.csv`	STARsolo run summary (read counts, valid barcodes, mapping rates)	<1 KB
`Solo.out/Gene/raw/barcodes.tsv`	96-well cell barcode whitelist (A01–H12)	1.5 KB
`Solo.out/Gene/raw/features.tsv`	Gene feature list (78,932 genes, GRCh38 v113)	3.3 MB
`Solo.out/Gene/raw/umiDedup-1MM_Directional.mtx`	Per-well UMI count matrix — 1-mismatch directional deduplication (recommended)	18 MB
`Solo.out/Gene/raw/umiDedup-NoDedup.mtx`	Per-well UMI count matrix — no deduplication (comparison)	19 MB
`Solo.out/Barcodes.stats`	Per-barcode read assignment statistics	<1 KB
`Solo.out/Gene/Features.stats`	Per-gene feature assignment statistics	<1 KB

Software & Reference

Tool	Version / Details
STAR	2.7.10b (STARsolo, CB_UMI_Simple mode)
Trimmomatic	PE mode, ILLUMINACLIP + LEADING/TRAILING:3 + MINLEN:36
FastQC / MultiQC	Standard Illumina QC pipeline
Picard	CollectRnaSeqMetrics (3M read downsampled BAM)
Reference Genome	Homo sapiens GRCh38 (Ensembl release 113)
GTF Annotation	Homo_sapiens.GRCh38.113.gtf