miRNA-seq QC Report — Demo-Project

1. Project Overview

Total Samples

27

across 6 groups (A–F)

Total Raw Reads

~236M

across all samples

Avg Reads/Sample

~8.7M

trimmed R1 reads

Avg Mapping Rate

91.5%

reads mapping to miRNAs

Avg Detected miRNAs

604

per sample

DE Comparisons

7

with GO & KEGG enrichment

Study Design: Mouse miRNA expression profiling across six treatment groups — Group A (n=4), Group B (n=4), Group C (n=4), Group D (n=6), Group E (n=5), and Group F (n=4). Seven pairwise differential expression comparisons were performed, followed by Gene Ontology (GO) and KEGG pathway enrichment analysis on predicted target genes of significant DE miRNAs.

Analysis Pipeline

Step	Tool / Method	Output Folder
Adapter trimming	miRDeep2 (mapper.pl)	00.TrimData/
Read QC	FastQC + MultiQC	01.FastqQualityCheck/
Read collapsing & genome mapping	mapper.pl (miRDeep2) + Bowtie	02.mapping/
miRNA quantification	miRDeep2.pl (miRBase v22, GRCm39)	03.mirdeep2/
Count & CPM matrix generation	Custom R script	04.Counts/
Differential expression + GO & KEGG enrichment on target genes	DESeq2 + clusterProfiler (R)	05.DE-GO-KEGG/

2. Sample Information

This project comprises 27 mouse (Mus musculus) samples divided into six treatment groups (A–F): Group A (n=4, Sample-01–04), Group B (n=4, Sample-05–08), Group C (n=4, Sample-09–12), Group D (n=6, Sample-13–18), Group E (n=5, Sample-19–23), and Group F (n=4, Sample-24–27). Libraries were prepared using the Qiagen Small RNA Kit and sequenced on an Illumina platform.

3. Read Quality (FastQC)

FastQC was run on trimmed R1 reads. Very high duplication rates are expected for miRNA-seq due to the limited number of distinct miRNA species. All samples pass basic sequence quality checks.

Full interactive report: 01.FastqQualityCheck/multiqc_report.html

4. Mapping Statistics (miRDeep2 mapper.pl + Bowtie)

Trimmed R1 reads were collapsed and aligned to the mouse genome (GRCm39) using mapper.pl (miRDeep2) with Bowtie. Mapping rates of 60–85% are typical for miRNA-seq to the genome; filtered mapping rates reflect the fraction of reads mapping among adapter-trimmed reads. Full statistics are in mapping.statistics.

Sample	Group	Total Reads (M)	Filtered Reads (M)	Filter Rate	Mapped Reads (M)	Overall Mapping Rate	Filtered Mapping Rate	Detected miRNAs
Sample-01	Group A	9.52	8.69	91.2%	6.99	73.4%	80.53%	622
Sample-02	Group A	10.09	9.25	91.7%	7.02	69.6%	75.88%	626
Sample-03	Group A	7.61	6.98	91.6%	5.49	72.2%	78.77%	632
Sample-04	Group A	8.40	7.95	94.7%	6.62	78.8%	83.22%	579
Sample-05	Group B	9.05	8.14	90.0%	6.65	73.5%	81.70%	627
Sample-06	Group B	9.63	8.99	93.3%	7.39	76.7%	82.20%	613
Sample-07	Group B	6.72	6.14	91.4%	4.70	70.0%	76.56%	570
Sample-08	Group B	7.15	6.66	93.1%	5.50	76.9%	82.60%	606
Sample-09	Group C	10.05	9.14	90.9%	7.58	75.4%	82.95%	607
Sample-10	Group C	10.46	9.63	92.1%	7.59	72.6%	78.78%	595
Sample-11	Group C	8.75	8.11	92.8%	6.41	73.3%	79.00%	607
Sample-12	Group C	11.42	10.52	92.1%	8.34	73.0%	79.29%	597
Sample-13	Group D	8.48	8.01	94.5%	6.74	79.5%	84.15%	601
Sample-14	Group D	9.72	9.16	94.2%	7.68	79.0%	83.81%	614
Sample-15	Group D	8.55	8.08	94.5%	6.73	78.7%	83.27%	611
Sample-16	Group D	9.67	9.17	94.8%	7.77	80.4%	84.79%	606
Sample-17	Group D	9.31	8.70	93.4%	7.35	79.0%	84.55%	630
Sample-18	Group D	7.53	6.84	90.9%	5.82	77.3%	85.04%	580
Sample-19	Group E	8.87	7.17	80.9%	5.30	59.8%	73.88%	589
Sample-20	Group E	7.53	6.24	82.9%	4.68	62.1%	74.91%	598
Sample-21	Group E	10.07	8.77	87.0%	6.41	63.6%	73.13%	618
Sample-22	Group E	7.48	6.68	89.4%	5.41	72.4%	80.98%	596
Sample-23	Group E	9.57	8.94	93.4%	7.29	76.1%	81.53%	634
Sample-24	Group F	8.07	7.58	93.9%	6.31	78.2%	83.25%	628
Sample-25	Group F	8.36	7.74	92.6%	6.50	77.7%	83.94%	588
Sample-26	Group F	6.34	5.60	88.3%	4.64	73.1%	82.78%	561
Sample-27	Group F	8.22	7.74	94.1%	6.45	78.4%	83.36%	563

5. miRNA Quantification (miRDeep2)

miRNA expression was quantified using miRDeep2.pl with the mouse miRBase reference (GRCm39). miRDeep2 scores both known miRNAs and novel miRNA candidates, reporting expression in raw read counts and RPM-normalised values. Per-sample result HTML files and PDF visualisations of miRNA hairpin structures are available in 03.mirdeep2/.

Below is a representative miRDeep2 hairpin structure visualisation for mmu-let-7b (Sample-08, Group C).

Representative miRDeep2 Figure — mmu-let-7b (Sample-08, Group C)

mmu-let-7b hairpin structure — miRDeep2 visualisation showing the read alignment to the predicted hairpin precursor. The 5p and 3p arms are indicated with mapped reads stacked above the secondary structure. Source: 03.mirdeep2/pdfs_*/mmu-let-7b.pdf

Output Files

File / Folder	Description
`expression_*.html`	Per-sample miRDeep2 expression visualisation (interactive HTML)
`result_*.html`	Per-sample miRDeep2 result summary including novel miRNA candidates
`pdfs_*/`	PDF hairpin structure plots for detected miRNAs

6. miRNA Count Matrices

Raw read counts and RPM-normalised values were extracted from miRDeep2 output and merged across all 27 samples. Count and CPM matrices are available in 04.Counts/.

Matrix	File	Description
Raw counts	`04.Counts/rawCount/rawCount--matrix_with_all_samples.txt`	Raw miRNA read counts from miRDeep2
Normalised CPM	`04.Counts/normalizedCount/normalizedCount--matrix_with_all_samples.txt`	RPM-normalised values from miRDeep2
Per-sample raw	`04.Counts/rawCount/*_count.txt`	Individual sample raw count files
Per-sample CPM	`04.Counts/normalizedCount/*_normalizedCount.txt`	Individual sample normalised count files

Detected miRNAs per Sample — Summary Statistics

Across all 27 samples, the number of detected miRNAs (raw count > 0) was highly consistent, reflecting uniform library quality. Summary statistics are shown below.

Statistic	Value
Number of samples	27
Mean	603.6
Median	606.0
Standard deviation	20.9
Minimum	561
Q1 (25th percentile)	589
Q3 (75th percentile)	622
Maximum	634
IQR	33

7. Differential Expression Analysis

DE analysis was performed using DESeq2 (p-value < 0.05, |log₂FC| > 0) for 7 pairwise comparisons. Significant miRNAs were further analysed for GO (Gene Ontology) and KEGG pathway enrichment on predicted target genes. Results are in 05.DE-GO-KEGG/.

Note: The format of each comparison is Test_group vs Control_group (e.g., A vs D means Group A is the test, Group D is the reference/control).

DE Summary — All Comparisons

Comparison	Total Significant miRNAs	Up-regulated	Down-regulated
Group A vs D	96	↑ 47	↓ 49
Group B vs A	201	↑ 113	↓ 88
Group B vs E	151	↑ 96	↓ 55
Group C vs A	275	↑ 141	↓ 134
Group C vs F	112	↑ 67	↓ 45
Group E vs D	178	↑ 62	↓ 116
Group F vs D	208	↑ 106	↓ 102

Representative Figures — A vs D (Group A vs Group D)

The following figures are from the A_vs_D_DE comparison (Group A as test, Group D as reference). This comparison identified 96 significant DE miRNAs (47 up-regulated, 49 down-regulated; p < 0.05, |log₂FC| > 0).

Volcano Plots

Enhanced Volcano Plot — A vs D. Each point represents a miRNA. Red: up-regulated (FC > 0, p < 0.05); Blue: down-regulated (FC < 0, p < 0.05); Grey: not significant. Axes show log₂ fold-change (x) and –log₁₀ p-value (y). Source: A_vs_D_DE/output-enhancedVolcanoPlot.pdf

Interactive Volcano Plot — A vs D. Static preview of the interactive volcano plot. In the full interactive version (Volcano-Plot--A_vs_D--show-gene-names.html), hovering over each point displays the miRNA name, log₂ fold-change, and p-value. Source: A_vs_D_DE/Volcano-Plot--A_vs_D--show-gene-names.html

PCA & Sample Distance

PCA Plot — A vs D. Principal component analysis on variance-stabilised counts (VST). Samples are coloured by group. Source: A_vs_D_DE/output-PCA.pdf

Between-Sample Distance Heatmap — A vs D. Hierarchical clustering of samples based on Euclidean distances from VST data. Source: A_vs_D_DE/output-BetweenSampleDis.pdf

DE Heatmap — Top 50 Genes by Variance

Heatmap of Top 50 Most Variable miRNAs — A vs D. VST-normalised expression of the top 50 miRNAs ranked by expression variance across all samples in this comparison. Rows (miRNAs) and columns (samples) are hierarchically clustered. No p-value filter applied. Source: A_vs_D_DE/output-heatmap-50-top-genes--only-check-expression-variance-no-pvalue-filter-applied.pdf

GO & KEGG Enrichment — A vs D (All Significant miRNA Target Genes)

GO Biological Process Barplot — A vs D. Top enriched Biological Process GO terms for the target genes of all significant DE miRNAs. Bar length = gene count; colour = adjusted p-value. Source: A_vs_D_DE/GO-ALL-TargetGenes/output-BiologicalProcess-barplot.pdf

KEGG Pathway Barplot — A vs D. Top enriched KEGG pathways for target genes of all significant DE miRNAs. Source: A_vs_D_DE/GO-ALL-TargetGenes/output-kegg-barplot.pdf

Output Files per Comparison

File	Description
`output-AnalysisResult.csv`	Full DE table, all miRNAs, sorted by adjusted p-value
`output-AnalysisResult-sig.csv`	Significant DE miRNAs (p < 0.05, \|log₂FC\| > 0)
`output-AnalysisResult-sig-upregulated.csv`	Significant up-regulated miRNAs
`output-AnalysisResult-sig-downregulated.csv`	Significant down-regulated miRNAs
`output-normalized-count.csv`	DESeq2 size-factor normalised counts
`output-PCA.pdf / output-PCA-data.csv`	PCA on variance-stabilised data
`output-heatmap*.pdf`	Sample distance, gene-sample, and top-gene heatmaps
`output-enhancedVolcanoPlot.pdf / Volcano-Plot-*.html`	Static and interactive volcano plots
`output-Pearson-correlation-of-top-2000-genes.pdf`	Pearson correlation heatmap
`output-BetweenSampleDis.pdf`	Between-sample distance heatmap
`GO-ALL-TargetGenes/ GO-UP-TargetGenes/ GO-DOWN-TargetGenes/`	GO & KEGG enrichment results (all, up-regulated, down-regulated)

8. Methods

8.1 Library Preparation

miRNA-seq libraries were prepared using the Qiagen Small RNA Library Kit. Single-end sequencing was performed on an Illumina platform. Only trimmed R1 reads were used for downstream miRNA analysis.

8.2 Adapter Trimming & Read QC

Adapter trimming was performed using miRDeep2 (mapper.pl). Read quality was assessed using FastQC and aggregated with MultiQC.

8.3 Read Collapsing & Genome Mapping

Trimmed R1 reads were processed with mapper.pl (miRDeep2), which removes reads with non-canonical nucleotides, collapses identical reads, and aligns them to the mouse genome (GRCm39) using Bowtie. Key parameters:

Parameter	Value	Description
`-e`	—	Input is a FASTQ file
`-j`	—	Remove reads with non-canonical nucleotides
`-l`	18	Minimum read length after collapsing: 18 bp
`-m`	—	Collapse reads before mapping
`-o`	4	Maximum number of genomic mapping loci per read
`-n`	—	Retain read name information
`-p`	GRCm39 Bowtie index	Reference genome Bowtie index

8.4 miRNA Quantification (miRDeep2)

miRNA expression was quantified with miRDeep2.pl using mouse miRBase reference files (GRCm39 coordinates):

Reference	Description
Mature miRNAs	Mus musculus mature miRNA sequences (miRBase)
Close-species miRNAs	Mature sequences from related rodent species for homology-based detection
Hairpin precursors	Mus musculus hairpin precursor sequences (miRBase)
Genome FASTA	GRCm39 primary assembly

8.5 Count & CPM Matrix Generation

Raw read counts (column 2) and RPM-normalised values (column 6) were extracted from the miRDeep2 expression file for each sample and merged into cross-sample matrices using a custom R script via full outer join on miRNA ID, with missing values set to 0.

8.6 Differential Expression Analysis (DESeq2)

DE analysis was performed using DESeq2 (R/Bioconductor) with raw counts, design = ~condition. Size-factor normalisation and negative-binomial model fitting applied with DESeq().

Significance thresholds:

raw p-value < 0.05
|log₂FoldChange| > 0

Variance-stabilising transformation (VST) was applied for all visualisations (PCA, heatmaps, sample correlation). Plots include volcano plots, heatmaps, PCA, sample distance maps, and p-value histograms.

8.7 GO & KEGG Enrichment Analysis

Significant DE miRNAs were converted to predicted target genes. GO and KEGG enrichment analysis was performed on target genes (full set, up-regulated, and down-regulated subsets) using clusterProfiler (R/Bioconductor). Results are in:

GO-ALL-TargetGenes/ — GO & KEGG enrichment on all significant DE miRNA target genes
GO-UP-TargetGenes/ — GO & KEGG enrichment on up-regulated miRNA target genes
GO-DOWN-TargetGenes/ — GO & KEGG enrichment on down-regulated miRNA target genes

8.8 Software Summary

Tool	Version	Purpose
miRDeep2 (mapper.pl)	—	Adapter trimming
FastQC	v0.12.1	Read quality control
MultiQC	v1.28	FastQC report aggregation
Bowtie	v1.3.1	Read alignment to GRCm39
mapper.pl (miRDeep2)	—	Read collapsing and genome mapping
miRDeep2.pl	—	miRNA quantification (known + novel)
R / DESeq2	v1.46.0	Differential expression analysis
R / clusterProfiler	v4.14.4	GO & KEGG enrichment analysis

9. Deliverable File Structure

Demo-Project-miRNA-analysis/
├── 00.TrimData/
│ └── 27 × *_trimmed.fq ← miRDeep2-trimmed R1 reads
├── 01.FastqQualityCheck/
│ ├── multiqc_report.html ← FastQC MultiQC report
│ ├── 27 × *_fastqc.html / .zip
│ └── multiqc_data/
├── 02.mapping/
│ ├── 27 × *_collapsed.fa ← collapsed unique reads (mapper.pl)
│ ├── 27 × *_collapsed_vs_genome.arf ← Bowtie alignment in ARF format
│ └── mapper_logs/ ← per-sample mapping logs
├── 02.mapping_logs/ ← per-sample mapping log files
├── 03.mirdeep2/
│ ├── 27 × expression_*.html ← miRDeep2 expression visualisations
│ ├── 27 × result_*.html ← miRDeep2 result summaries
│ ├── 27 × pdfs_*/ ← miRDeep2 hairpin structure PDFs per sample
│ └── 03.mirdeep2_logs/ ← per-sample miRDeep2 logs
├── 04.Counts/
│ ├── rawCount/
│ │ ├── 27 × *_count.txt
│ │ └── rawCount--matrix_with_all_samples.txt ← merged raw count matrix
│ └── normalizedCount/
│ ├── 27 × *_normalizedCount.txt
│ └── normalizedCount--matrix_with_all_samples.txt ← merged CPM matrix
├── 05.DE-GO-KEGG/
│ ├── sampleInfo.csv
│ ├── ReadMe--Demo-Project_DE_analysis_filter_condition.txt
│ ├── A_vs_D.txt / B_vs_A.txt / B_vs_E.txt / C_vs_A.txt / C_vs_F.txt / E_vs_D.txt / F_vs_D.txt
│ ├── A_vs_D_DE/ ← Group A vs Group D
│ │ ├── output-AnalysisResult.csv / -sig.csv / -sig-upregulated.csv / -sig-downregulated.csv
│ │ ├── output-normalized-count.csv, output-PCA*.pdf, output-VolcanoPlot*.pdf
│ │ ├── output-enhancedVolcanoPlot.pdf, output-heatmap*.pdf
│ │ ├── output-BetweenSampleDis.pdf, output-Pearson-correlation*.pdf
│ │ ├── Volcano-Plot--A_vs_D--show-gene-names.html
│ │ └── GO-ALL-TargetGenes/ GO-UP-TargetGenes/ GO-DOWN-TargetGenes/
│ ├── B_vs_A_DE/ ← same structure as above
│ ├── B_vs_E_DE/ ← same structure as above
│ ├── C_vs_A_DE/ ← same structure as above
│ ├── C_vs_F_DE/ ← same structure as above
│ ├── E_vs_D_DE/ ← same structure as above
│ └── F_vs_D_DE/ ← same structure as above
└── mapping.statistics ← per-sample mapping QC summary table