RNA-seq Analysis Report
Demo Drug-Set Kit — 96-Well Drug Screening with Barcode Demultiplexing & Quality Control
📋Project Summary
✓
Overall QC Status: PASS. Trimming improved unique mapping from 62.8% to 68.1%. Valid barcode rate is 95.6% on trimmed reads. mRNA purity is excellent (>91%). Strand specificity confirms reverse-stranded library.
96
wells
Drug Conditions (A01–H12)
1.31
B read pairs
Reads After Trimming
68.1%
Uniquely Mapped (trimmed)
95.6%
Valid Barcodes
97.0%
Correct Strand Reads
91.2%
mRNA Bases (Picard)
55.2%
Reads Mapped to Gene (unique)
0.44%
Ribosomal Bases
🧬Sample & Library Information
| Pool ID | Raw Read Pairs | After Trimming | Avg. Read Length | CB Length | UMI Length | Barcode Wells | Reference |
|---|---|---|---|---|---|---|---|
| Sample_1 | 1,416,936,193 | 1,305,994,649 (92.2%) | 151 bp | 14 bp | 14 bp | 96 (A01–H12) | GRCh38 v113 |
ℹ
This is a pooled drug-screening library. All 96 drug conditions were sequenced in a single pooled lane. Individual well (drug condition) assignments are recovered via cell barcodes (CB) embedded in Read 1 (positions 1–14). Positions 15–28 of Read 1 carry the UMI for deduplication.
🔲96-Well Plate Design
Plate Layout (Demo — A01 to H12)
1
2
3
4
5
6
7
8
9
10
11
12
A
A01
A02
A03
A04
A05
A06
A07
A08
A09
A10
A11
A12
B
B01
B02
B03
B04
B05
B06
B07
B08
B09
B10
B11
B12
C
C01
C02
C03
C04
C05
C06
C07
C08
C09
C10
C11
C12
D
D01
D02
D03
D04
D05
D06
D07
D08
D09
D10
D11
D12
E
E01
E02
E03
E04
E05
E06
E07
E08
E09
E10
E11
E12
F
F01
F02
F03
F04
F05
F06
F07
F08
F09
F10
F11
F12
G
G01
G02
G03
G04
G05
G06
G07
G08
G09
G10
G11
G12
H
H01
H02
H03
H04
H05
H06
H07
H08
H09
H10
H11
H12
Each well represents one drug condition. Barcodes were provided as a 96-entry whitelist to STARsolo for demultiplexing.
Barcode & UMI Design
| Parameter | Value |
|---|---|
| Barcode (CB) position | R1, bases 1–14 |
| UMI position | R1, bases 15–28 |
| RNA read | R2 (full length) |
| CB length | 14 bp |
| UMI length | 14 bp |
| Barcode whitelist | 96 sequences (A01–H12) |
| STAR CB_UMI mode | CB_UMI_Simple |
🔬FastQC Metrics (Raw Reads)
QC Module Status — Sample_1 (Raw)
| QC Module | R1 | R2 |
|---|---|---|
| Basic Statistics | PASS | PASS |
| Per-base Sequence Quality | FAIL | PASS |
| Per-sequence Quality Scores | WARN | PASS |
| Per-base Sequence Content | FAIL | FAIL |
| Per-sequence GC Content | WARN | FAIL |
| Sequence Length Distribution | PASS | PASS |
| Sequence Duplication Levels | WARN | WARN |
| Adapter Content | PASS | FAIL |
Raw Read Stats — Sample_1
| Metric | R1 | R2 |
|---|---|---|
| Total Sequences | 1,305,994,649 | 1,305,994,649 |
| Total Bases | 194.4 Gbp | 196.5 Gbp |
| % Duplicates | 47.6% | 91.4% |
| % GC | 36% | 42% |
| Avg. Read Length | 148.9 bp | 150.5 bp |
| % FastQC Modules Failed | 18.2% | 36.4% |
ℹ
R1 note: R1 carries the 14 bp cell barcode + 14 bp UMI (28 bp total), so per-base sequence content failures in R1 are expected — they reflect the designed barcode structure, not library quality issues. Adapter contamination flagged in R2 is addressed in the trimming step.
✂️Read Trimming (Trimmomatic)
Trimmomatic PE Summary — Sample_1
| Category | Read Pairs | % |
|---|---|---|
| Input Read Pairs | 1,416,936,193 | 100% |
| Both Reads Surviving | 1,305,994,649 | 92.17% |
| Forward Only Surviving | 10,717,215 | 0.76% |
| Reverse Only Surviving | 2,943,041 | 0.21% |
| Dropped | 97,281,288 | 6.87% |
Settings: ILLUMINACLIP (custom adapter file, 2:30:10:2 keepBothReads), LEADING:3, TRAILING:3, MINLEN:36
Read Survival Rate
✓
92.17% of read pairs passed trimming. Trimming removed adapter contamination flagged in R2 by FastQC and improved unique alignment rate from 62.76% to 68.07% (+5.3%). The trimmed dataset is used for all downstream analyses.
🗺️STAR Alignment — Raw vs Trimmed Comparison
Alignment Summary Comparison — Sample_1
| Metric | Raw Reads | Trimmed Reads | Change |
|---|---|---|---|
| Input Reads | 1,416,936,193 | 1,305,994,649 | −110.9M |
| Uniquely Mapped | 889,314,095 (62.76%) | 888,946,887 (68.07%) | +5.31% |
| Mapped to Too Many Loci | 73,134,701 (5.16%) | 73,918,443 (5.66%) | ± |
| Unmapped: Too Short | 443,681,260 (31.31%) | 342,036,045 (26.19%) | −5.12% |
| Unmapped: Other | 10,806,137 (0.76%) | 1,093,274 (0.08%) | −0.68% |
| Avg. Mapped Length | 144.51 bp | 144.39 bp | — |
| Total Splices | 261,050,817 | 259,658,717 | — |
| Annotated Splices % | 95.7% | 96.3% | +0.6% |
| Mismatch Rate | 0.47% | 0.47% | — |
| Runtime | 5 hr 10 min | 3 hr 54 min | −1.3 hr |
Mapping Rate Comparison
Read Mapping Distribution (Trimmed — Final)
🏷️Barcode Demultiplexing (STARsolo)
STARsolo Summary — Raw vs Trimmed
| Metric | Raw Reads | Trimmed Reads | Change |
|---|---|---|---|
| Total Reads | 1,416,936,193 | 1,305,994,649 | — |
| Reads with Valid Barcodes | 92.25% | 95.59% | +3.34% |
| Q30 Bases in CB+UMI | 97.01% | 96.96% | — |
| Q30 Bases in RNA Read | 86.64% | 88.34% | +1.70% |
| Reads Mapped to Genome (Unique) | 62.76% | 68.07% | +5.31% |
| Reads Mapped to Gene (Unique) | 50.84% | 55.17% | +4.33% |
| Sequencing Saturation | — | — | — |
Barcode QC Chart
✓
After trimming, 95.6% of reads carried a valid barcode from the 96-well whitelist. Q30 quality in the CB+UMI region is 97%, confirming high barcode read accuracy and reliable well demultiplexing. Valid barcodes are matched against the 96-well whitelist for well demultiplexing.
📊Picard RNA Metrics
RNA-seq Quality Metrics — Sample_1
| Metric | Value |
|---|---|
| PF Bases | 10,697,866,498 |
| PF Aligned Bases | 6,425,697,914 |
| Ribosomal Bases | 28,130,072 (0.44%) |
| Coding Bases | 2,102,697,131 (32.7%) |
| UTR Bases | 3,756,799,397 (58.5%) |
| Intronic Bases | 349,730,910 (5.4%) |
| Intergenic Bases | 188,340,404 (2.9%) |
| % mRNA Bases | 91.2% |
| % Usable Bases | 54.8% |
| % Correct Strand Reads | 97.0% |
| Median CV Coverage | 1.096 |
| Median 5′ Bias | 0.021 |
| Median 3′ Bias | 2.215 |
| Median 5′‑3′ Bias Ratio | 0.010 |
Base Distribution by Genomic Region
⚠
3′ Bias Detected. Median 3′ bias (2.22) and low 5′‑3′ ratio (0.01) indicate strong 3′ enrichment, characteristic of poly(A)-enriched RNA-seq libraries. Coverage uniformity (CV = 1.10) is within acceptable range. This does not affect differential expression analysis when all samples are prepared consistently.
⚙️Pipeline Summary
Analysis Steps
| # | Step | Tool | Input | Status |
|---|---|---|---|---|
| 1 | FastQ Quality Control | FastQC + MultiQC | Raw FASTQ (R1, R2) | COMPLETE |
| 2 | Adapter & Quality Trimming | Trimmomatic PE | Raw FASTQ | COMPLETE |
| 3 | Genome Alignment + Barcode Demux | STAR 2.7.10b (STARsolo) | Trimmed FASTQ + 96-well whitelist | COMPLETE |
| 4 | RNA Quality Metrics | Picard CollectRnaSeqMetrics | BAM (subset for picard QC) | COMPLETE |
| 8 | Differential Expression | DESeq2 (R) | Count matrix; output refer to mRNAseq demo report | By Request |
Delivered Files
| File / Directory | Description | Size |
|---|---|---|
01.FastqQualityCheck/ |
FastQC HTML reports (R1, R2) + MultiQC aggregate report | ~2.8 MB |
02.BamFiles/Aligned.sortedByCoord.out.bam |
STAR-aligned BAM file, sorted by coordinate | ~ 14 GB |
02.BamFiles/Aligned.sortedByCoord.out.bam.bai |
BAM index file | 14 MB |
03.rnaMetrics/ |
STAR + Picard MultiQC reports; alignment & RNA metrics | ~928 KB |
Solo.out/Gene/Summary.csv |
STARsolo run summary (read counts, valid barcodes, mapping rates) | <1 KB |
Solo.out/Gene/raw/barcodes.tsv |
96-well cell barcode whitelist (A01–H12) | 1.5 KB |
Solo.out/Gene/raw/features.tsv |
Gene feature list (78,932 genes, GRCh38 v113) | 3.3 MB |
Solo.out/Gene/raw/umiDedup-1MM_Directional.mtx |
Per-well UMI count matrix — 1-mismatch directional deduplication (recommended) | 18 MB |
Solo.out/Gene/raw/umiDedup-NoDedup.mtx |
Per-well UMI count matrix — no deduplication (comparison) | 19 MB |
Solo.out/Barcodes.stats |
Per-barcode read assignment statistics | <1 KB |
Solo.out/Gene/Features.stats |
Per-gene feature assignment statistics | <1 KB |
Software & Reference
| Tool | Version / Details |
|---|---|
| STAR | 2.7.10b (STARsolo, CB_UMI_Simple mode) |
| Trimmomatic | PE mode, ILLUMINACLIP + LEADING/TRAILING:3 + MINLEN:36 |
| FastQC / MultiQC | Standard Illumina QC pipeline |
| Picard | CollectRnaSeqMetrics (3M read downsampled BAM) |
| Reference Genome | Homo sapiens GRCh38 (Ensembl release 113) |
| GTF Annotation | Homo_sapiens.GRCh38.113.gtf |