This R Markdown aims to provide a brief summary of all diagnostic plots for your RNA-seq experiment. The fastq files are available for download upon request.
Users receiving files from RSCshare are advised to delete all files once they have secure copied them to their own respective drives.
a copy of your data will be securely archived on our end
The multiqc html (separate file), summarises the alignment statistics along with the summary of raw counts generated via STAR;
The raw fastq reads were first processed with trim-galore (Barbraham Institute) package to:
trim_galore --nextseq 20 --gzip --length 50 --paired --fastqc
The filtered reads were then aligned to GRCh38 reference genome with ENSEMBL annotations;
* --paired only applicable for paired-end data; --length 10 for smRNA-seq libraries
A good library should indicate little to no bias across the entire gene body.
Post-normalization, the medians should be consistent across samples and more similar between biological replicates.
An euclidean distance is computed between samples, and the dendrogram is built upon the Ward criterion. We expect this dendrogram to group replicates and separate biological conditions.
Another way of visualizing the experiment variability is to look at the first principal components of the PCA. On this figure, the first principal component (PC1) is expected to separate samples from the different biological conditions, meaning that the biological variability is the main source of variance in the data.