Partek
  • Overview
  • Partek Flow
    • Frequently Asked Questions
      • General
      • Visualization
      • Statistics
      • Biological Interpretation
      • How to cite Partek software
    • Quick Start Guide
    • Installation Guide
      • Minimum System Requirements
      • Single Cell Toolkit System Requirements
      • Single Node Installation
      • Single Node Amazon Web Services Deployment
      • Multi-Node Cluster Installation
      • Creating Restricted User Folders within the Partek Flow server
      • Updating Partek Flow
      • Uninstalling Partek Flow
      • Dependencies
      • Docker and Docker-compose
      • Java KeyStore and Certificates
      • Kubernetes
    • Live Training Event Recordings
      • Bulk RNA-Seq Analysis Training
      • Basic scRNA-Seq Analysis & Visualization Training
      • Advanced scRNA-Seq Data Analysis Training
      • Bulk RNA-Seq and ATAC-Seq Integration Training
      • Spatial Transcriptomics Data Analysis Training
      • scRNA and scATAC Data Integration Training
    • Tutorials
      • Creating and Analyzing a Project
        • Creating a New Project
        • The Metadata Tab
        • The Analyses Tab
        • The Log Tab
        • The Project Settings Tab
        • The Attachments Tab
        • Project Management
        • Importing a GEO / ENA project
      • Bulk RNA-Seq
        • Importing the tutorial data set
        • Adding sample attributes
        • Running pre-alignment QA/QC
        • Trimming bases and filtering reads
        • Aligning to a reference genome
        • Running post-alignment QA/QC
        • Quantifying to an annotation model
        • Filtering features
        • Normalizing counts
        • Exploring the data set with PCA
        • Performing differential expression analysis with DESeq2
        • Viewing DESeq2 results and creating a gene list
        • Viewing a dot plot for a gene
        • Visualizing gene expression in Chromosome view
        • Generating a hierarchical clustering heatmap
        • Performing biological interpretation
        • Saving and running a pipeline
      • Analyzing Single Cell RNA-Seq Data
      • Analyzing CITE-Seq Data
        • Importing Feature Barcoding Data
        • Data Processing
        • Dimensionality Reduction and Clustering
        • Classifying Cells
        • Differentially Expressed Proteins and Genes
      • 10x Genomics Visium Spatial Data Analysis
        • Start with pre-processed Space Ranger output files
        • Start with 10x Genomics Visium fastq files
        • Spatial data analysis steps
        • View tissue images
      • 10x Genomics Xenium Data Analysis
        • Import 10x Genomics Xenium Analyzer output
        • Process Xenium data
        • Perform Exploratory analysis
        • Make comparisons using Compute biomarkers and Biological interpretation
      • Single Cell RNA-Seq Analysis (Multiple Samples)
        • Getting started with the tutorial data set
        • Classify cells from multiple samples using t-SNE
        • Compare expression between cell types with multiple samples
      • Analyzing Single Cell ATAC-Seq data
      • Analyzing Illumina Infinium Methylation array data
      • NanoString CosMx Tutorial
        • Importing CosMx data
        • QA/QC, data processing, and dimension reduction
        • Cell typing
        • Classify subpopulations & differential expression analysis
    • User Manual
      • Interface
      • Importing Data
        • SFTP File Transfer Instructions
        • Import single cell data
        • Importing 10x Genomics Matrix Files
        • Importing and Demultiplexing Illumina BCL Files
        • Partek Flow Uploader for Ion Torrent
        • Importing 10x Genomics .bcl Files
        • Import a GEO / ENA project
      • Task Menu
        • Task actions
        • Data summary report
        • QA/QC
          • Pre-alignment QA/QC
          • ERCC Assessment
          • Post-alignment QA/QC
          • Coverage Report
          • Validate Variants
          • Feature distribution
          • Single-cell QA/QC
          • Cell barcode QA/QC
        • Pre-alignment tools
          • Trim bases
          • Trim adapters
          • Filter reads
          • Trim tags
        • Post-alignment tools
          • Filter alignments
          • Convert alignments to unaligned reads
          • Combine alignments
          • Deduplicate UMIs
          • Downscale alignments
        • Annotation/Metadata
          • Annotate cells
          • Annotation report
          • Publish cell attributes to project
          • Attribute report
          • Annotate Visium image
        • Pre-analysis tools
          • Generate group cell counts
          • Pool cells
          • Split matrix
          • Hashtag demultiplexing
          • Merge matrices
          • Descriptive statistics
          • Spot clean
        • Aligners
        • Quantification
          • Quantify to annotation model (Partek E/M)
          • Quantify to transcriptome (Cufflinks)
          • Quantify to reference (Partek E/M)
          • Quantify regions
          • HTSeq
          • Count feature barcodes
          • Salmon
        • Filtering
          • Filter features
          • Filter groups (samples or cells)
          • Filter barcodes
          • Split by attribute
          • Downsample Cells
        • Normalization and scaling
          • Impute low expression
          • Impute missing values
          • Normalization
          • Normalize to baseline
          • Normalize to housekeeping genes
          • Scran deconvolution
          • SCTransform
          • TF-IDF normalization
        • Batch removal
          • General linear model
          • Harmony
          • Seurat3 integration
        • Differential Analysis
          • GSA
          • ANOVA/LIMMA-trend/LIMMA-voom
          • Kruskal-Wallis
          • Detect alt-splicing (ANOVA)
          • DESeq2(R) vs DESeq2
          • Hurdle model
          • Compute biomarkers
          • Transcript Expression Analysis - Cuffdiff
          • Troubleshooting
        • Survival Analysis with Cox regression and Kaplan-Meier analysis - Partek Flow
        • Exploratory Analysis
          • Graph-based Clustering
          • K-means Clustering
          • Compare Clusters
          • PCA
          • t-SNE
          • UMAP
          • Hierarchical Clustering
          • AUCell
          • Find multimodal neighbors
          • SVD
          • CellPhoneDB
        • Trajectory Analysis
          • Trajectory Analysis (Monocle 2)
          • Trajectory Analysis (Monocle 3)
        • Variant Callers
          • SAMtools
          • FreeBayes
          • LoFreq
        • Variant Analysis
          • Fusion Gene Detection
          • Annotate Variants
          • Annotate Variants (SnpEff)
          • Annotate Variants (VEP)
          • Filter Variants
          • Summarize Cohort Mutations
          • Combine Variants
        • Copy Number Analysis (CNVkit)
        • Peak Callers (MACS2)
        • Peak analysis
          • Annotate Peaks
          • Filter peaks
          • Promoter sum matrix
        • Motif Detection
        • Metagenomics
          • Kraken
          • Alpha & beta diversity
          • Choose taxonomic level
        • 10x Genomics
          • Cell Ranger - Gene Expression
          • Cell Ranger - ATAC
          • Space Ranger
          • STARsolo
        • V(D)J Analysis
        • Biological Interpretation
          • Gene Set Enrichment
          • GSEA
        • Correlation
          • Correlation analysis
          • Sample Correlation
          • Similarity matrix
        • Export
        • Classification
        • Feature linkage analysis
      • Data Viewer
      • Visualizations
        • Chromosome View
          • Launching the Chromosome View
          • Navigating Through the View
          • Selecting Data Tracks for Visualization
          • Visualizing the Results Using Data Tracks
          • Annotating the Results
          • Customizing the View
        • Dot Plot
        • Volcano Plot
        • List Generator (Venn Diagram)
        • Sankey Plot
        • Transcription Start Site (TSS) Plot
        • Sources of variation plot
        • Interaction Plots
        • Correlation Plot
        • Pie Chart
        • Histograms
        • Heatmaps
        • PCA, UMAP and tSNE scatter plots
        • Stacked Violin Plot
      • Pipelines
        • Making a Pipeline
        • Running a Pipeline
        • Downloading and Sharing a Pipeline
        • Previewing a Pipeline
        • Deleting a Pipeline
        • Importing a Pipeline
      • Large File Viewer
      • Settings
        • Personal
          • My Profile
          • My Preferences
          • Forgot Password
        • System
          • System Information
          • System Preferences
          • LDAP Configuration
        • Components
          • Filter Management
          • Library File Management
            • Library File Management Settings
            • Library File Management Page
            • Selecting an Assembly
            • Library Files
            • Update Library Index
            • Creating an Assembly on the Library File Management Page
            • Adding Library Files on the Library File Management Page
            • Adding a Reference Sequence
            • Adding a Cytoband
            • Adding Reference Aligner Indexes
            • Adding a Gene Set
            • Adding a Variant Annotation Database
            • Adding a SnpEff Variant Database
            • Adding a Variant Effect Predictor (VEP) Database
            • Adding an Annotation Model
            • Adding Aligner Indexes Based on an Annotation Model
            • Adding Library Files from Within a Project
            • Microarray Library Files
            • Adding Prep kit
            • Removing Library Files
          • Option Set Management
          • Task Management
          • Pipeline managment
          • Lists
        • Access
          • User Management
          • Group Management
          • Licensing
          • Directory Permissions
          • Access Control Log
          • Failed Logins
          • Orphaned files
        • Usage
          • System Queue
          • System Resources
          • Usage Report
      • Server Management
        • Backing Up the Database
        • System Administrator Guide (Linux)
        • Diagnosing Issues
        • Moving Data
        • Partek Flow Worker Allocator
      • Enterprise Features and Toolkits
        • REST API
          • REST API Command List
      • Microarray Toolkit
        • Importing Custom Microarrays
      • Glossary
    • Webinars
    • Blog Posts
      • How to select the best single cell quality control thresholds
      • Cellular Differentiation Using Trajectory Analysis & Single Cell RNA-Seq Data
      • Spatial transcriptomics—what’s the big deal and why you should do it
      • Detecting differential gene expression in single cell RNA-Seq analysis
      • Batch remover for single cell data
      • How to perform single cell RNA sequencing: exploratory analysis
      • Single Cell Multiomics Analysis: Strategies for Integration
      • Pathway Analysis: ANOVA vs. Enrichment Analysis
      • Studying Immunotherapy with Multiomics: Simultaneous Measurement of Gene and Protein
      • How to Integrate ChIP-Seq and RNA-Seq Data
      • Enjoy Responsibly!
      • To Boldly Go…
      • Get to Know Your Cell
      • Aliens Among Us: How I Analyzed Non-Model Organism Data in Partek Flow
    • White Papers
      • Understanding Reads in RNA-Seq Analysis
      • RNA-Seq Quantification
      • Gene-specific Analysis
      • Gene Set ANOVA
      • Partek Flow Security
      • Single Cell Scaling
      • UMI Deduplication in Partek Flow
      • Mapping error statistics
    • Release Notes
      • Release Notes Archive - Partek Flow 10
  • Partek Genomics Suite
    • Installation Guide
      • Minimum System Requirements
      • Computer Host ID Retrieval
      • Node Locked Installation
        • Windows Installation
        • Macintosh Installation
      • Floating/Locked Floating Installation
        • Linux Installation
          • FlexNet Installation on Linux
        • Installing FlexNet on Windows
        • License Server FAQ's
        • Client Computer Connection to License Server
      • Uninstalling Partek Genomics Suite
      • Updating to Version 7.0
      • License Types
      • Installation FAQs
    • User Manual
      • Lists
        • Importing a text file list
        • Adding annotations to a gene list
        • Tasks available for a gene list
        • Starting with a list of genomic regions
        • Starting with a list of SNPs
        • Importing a BED file
        • Additional options for lists
      • Annotation
      • Hierarchical Clustering Analysis
      • Gene Ontology ANOVA
        • Implementation Details
        • Configuring the GO ANOVA Dialog
        • Performing GO ANOVA
        • GO ANOVA Output
        • GO ANOVA Visualisations
        • Recommended Filters
      • Visualizations
        • Dot Plot
        • Profile Plot
        • XY Plot / Bar Chart
        • Volcano Plot
        • Scatter Plot and MA Plot
        • Sort Rows by Prototype
        • Manhattan Plot
        • Violin Plot
      • Visualizing NGS Data
      • Chromosome View
      • Methylation Workflows
      • Trio/Duo Analysis
      • Association Analysis
      • LOH detection with an allele ratio spreadsheet
      • Import data from Agilent feature extraction software
      • Illumina GenomeStudio Plugin
        • Import gene expression data
        • Import Genotype Data
        • Export CNV data to Illumina GenomeStudio using Partek report plug-in
        • Import data from Illumina GenomeStudio using Partek plug-in
        • Export methylation data to Illumina GenomeStudio using Partek report plug-in
    • Tutorials
      • Gene Expression Analysis
        • Importing Affymetrix CEL files
        • Adding sample information
        • Exploring gene expression data
        • Identifying differentially expressed genes using ANOVA
        • Creating gene lists from ANOVA results
        • Performing hierarchical clustering
        • Adding gene annotations
      • Gene Expression Analysis with Batch Effects
        • Importing the data set
        • Adding an annotation link
        • Exploring the data set with PCA
        • Detect differentially expressed genes with ANOVA
        • Removing batch effects
        • Creating a gene list using the Venn Diagram
        • Hierarchical clustering using a gene list
        • GO enrichment using a gene list
      • Differential Methylation Analysis
        • Import and normalize methylation data
        • Annotate samples
        • Perform data quality analysis and quality control
        • Detect differentially methylated loci
        • Create a marker list
        • Filter loci with the interactive filter
        • Obtain methylation signatures
        • Visualize methylation at each locus
        • Perform gene set and pathway analysis
        • Detect differentially methylated CpG islands
        • Optional: Add UCSC CpG island annotations
        • Optional: Use MethylationEPIC for CNV analysis
        • Optional: Import a Partek Project from Genome Studio
      • Partek Pathway
        • Performing pathway enrichment
        • Analyzing pathway enrichment in Partek Genomics Suite
        • Analyzing pathway enrichment in Partek Pathway
      • Gene Ontology Enrichment
        • Open a zipped project
        • Perform GO enrichment analysis
      • RNA-Seq Analysis
        • Importing aligned reads
        • Adding sample attributes
        • RNA-Seq mRNA quantification
        • Detecting differential expression in RNA-Seq data
        • Creating a gene list with advanced options
        • Visualizing mapped reads with Chromosome View
        • Visualizing differential isoform expression
        • Gene Ontology (GO) Enrichment
        • Analyzing the unexplained regions spreadsheet
      • ChIP-Seq Analysis
        • Importing ChIP-Seq data
        • Quality control for ChIP-Seq samples
        • Detecting peaks and enriched regions in ChIP-Seq data
        • Creating a list of enriched regions
        • Identifying novel and known motifs
        • Finding nearest genomic features
        • Visualizing reads and enriched regions
      • Survival Analysis
        • Kaplan-Meier Survival Analysis
        • Cox Regression Analysis
      • Model Selection Tool
      • Copy Number Analysis
        • Importing Copy Number Data
        • Exploring the data with PCA
        • Creating Copy Number from Allele Intensities
        • Detecting regions with copy number variation
        • Creating a list of regions
        • Finding genes with copy number variation
        • Optional: Additional options for annotating regions
        • Optional: GC wave correction for Affymetrix CEL files
        • Optional: Integrating copy number with LOH and AsCN
      • Loss of Heterozygosity
      • Allele Specific Copy Number
      • Gene Expression - Aging Study
      • miRNA Expression and Integration with Gene Expression
        • Analyze differentially expressed miRNAs
        • Integrate miRNA and Gene Expression data
      • Promoter Tiling Array
      • Human Exon Array
        • Importing Human Exon Array
        • Gene-level Analysis of Exon Array
        • Alt-Splicing Analysis of Exon Array
      • NCBI GEO Importer
    • Webinars
    • White Papers
      • Allele Intensity Import
      • Allele-Specific Copy Number
      • Calculating Genotype Likelihoods
      • ChIP-Seq Peak Detection
      • Detect Regions of Significance
      • Genomic Segmentation
      • Loss of Heterozygosity Analysis
      • Motif Discovery Methods
      • Partek Genomics Suite Security
      • Reads in RNA-Seq
      • RNA-Seq Methods
      • Unpaired Copy Number Estimation
    • Release Notes
    • Version Updates
    • TeamViewer Instructions
  • Getting Help
    • TeamViewer Instructions
Powered by GitBook
On this page
  • Selecting Methods
  • Recommended Methods
  • Normalization Methods
  • Normalization Report
  • Normalization methods
  • Feature distribution table
  • Expression signal
  • Sample histogram
  • References
  • Additional Assistance
Export as PDF
  1. Partek Flow
  2. User Manual
  3. Task Menu
  4. Normalization and scaling

Normalization

PreviousImpute missing valuesNextNormalize to baseline

Last updated 7 months ago

Raw read counts are generated after quantification for each feature on all samples. These read counts need to be normalized prior to differential expression detection to ensure that samples are comparable.

This chapter covers the implementation of each normalization method. The Normalize counts option is available on the context-sensitive menu (Figure 1) upon selection of any quantified output data node or an imported count matrix:

  • Gene counts

  • Transcript counts

  • MicroRNA counts

  • Cufflinks quantification

  • Quantification

The format of the output is the same as the input data format, the node is called Normalized counts. This data node can be selected and normalized further using the same task.

Selecting Methods

Select whether you want your data normalized on a per sample or per feature basis (Figure 2). Some transformations are performed on each value independently of others e.g. log transformation, and you will get an identical result regardless of your choice.

The following normalization methods will generate different results depending on whether the transformation was performed on samples or on features:

  • Divided by mean, median, Q1, Q3, std dev, sum

  • Subtract mean, median, Q1, Q3, std dev, sum

  • Quantile normalization

Note that each task can only perform normalization on samples or features. If you wish to perform both transformations, run two normalization tasks successively. To normalize the data, click on a method from the left panel, then drag and drop the method to the right panel. Add all normalization methods you wish to perform. Multiple methods can be added to the right panel and they will be processed in the order they are listed. You can change the order of methods by dragging each method up or down. To remove a method from the Normalization order panel, click the minus button to the right of the method. Click Finish, when you are done choosing the normalization methods you have chosen.

Recommended Methods

For some data nodes, recommended methods are available:

  • Data nodes resulting from Quantify to annotation model (Partek E/M) or Quantify to reference (Partek E/M) are raw read counts, the recommendation is Total Count, Add 0.0001

  • Cufflinks quantification data node output FPKM normalized read counts, the recommendation is Add 0.0001

If available, the Recommended button will appear. Clicking the button will populate the right panel (Figure 3).

Normalization Methods

Below is the notation that will be used to explain each method:

Symbol
Meaning

S

Sample (or cell for single cell data node)

F

Feature

Xsf

Value of sample S from feature F (if normalization is performed on a quantification data node, this would be the raw read counts)

TXsf

transformed value of Xsf

C

Constant value

b

Base of log

  • Absolute value TXsf = | Xsf |

  • Add TXsf = Xsf + C a constant value C needs to be specified

  • Antilog TXsf = bxsf A log base value b needs to be specified from the drop-down list; any positive number can be specified when Custom value is chosen

  • Arcsinh TXsf =arcsinh (Xsf) The hyperbolic arcsine (arcsinh) transformation is often used on flow cytometry data

  • CLR (centered log ratio) TXsf =ln((Xsf +1)/geom (Xsf +1) +1) geom is geometric mean of either observation or feature. This method can be applied on protein expression data.

  • CPM (counts per million) TXsf = (106 x Xsf)/TMRs where Xsf here is the raw read of sample S on feature F, and TMRs is the total mapped reads of sample S. If quantification is performed on an aligned reads data node, total mapped reads is the aligned reads. If quantification is generated from imported read count text file, the total mapped reads is the sum of all feature reads in the sample.

  • Divided by When mean, median, Q1, Q3, std dev, or sum is selected, the corresponding statistics will be calculated based on the transform on sample or features option Example: If transform on Samples is selected, Divide by mean is calculated as: TXsf = Xsf/Ms where Ms is the mean of the sample. Example: If transform on Features is selected, Divide by mean is calculated as: TXsf = Xsf/Mf where Mf is the mean of the feature.

  • Log TXsf = logbXsf A log base value b needs to be specified from the drop-down list; any positive number can be specified when Custom value is chosen

  • Logit TXsf=logb(Xsf/(1-Xsf)) A log base value b needs to be specified from the drop-down list; any positive number can be specified when Custom value is chosen

  • Lower bound A constant value C needs to be specified, if Xsf is smaller than C, then TXsf= C; otherwise, TXsf = Xsf

  • **Median ratio (DESeq2 only), Median ratio (edgeR) **These approaches are slightly different implementations of the method proposed by Anders and Huber (2010). The idea is as follows: for each feature, its expression is divided by the feature geometric mean expression across the samples. Then, for a given sample, one takes median of these ratios across the features and obtains a sample specific size factor. The normalized expression is equal to the raw expression divided by the size factor. Median ratio (DESeq2 only) is present in R, DESeq2 package, under the name of "ratio". This method should be selected if DESeq2 differential analysis will be used for downstream analysis, since it is not per million scale, not recommended to be used in any other differential analysis methods except for DESeq2. Median ratio (edgeR) is present in R, edgeR package under the name of “RLE”. It is very similar to Median ratio (DESeq2 only) method, but it uses per million scale.

  • Multiply by TXsf = Xsf x C A constant value C needs to be specified

  • Quantile normalization, a rank based normalization method. For instance, if transformation is performed on samples, it first ranks all the features in each sample. Say vector Vs is the sorted feature values of sample S in ascending order, it calculates a vector that is the average of the sorted vectors across all samples --- Vm, then the values in Vs is replaced by the value in Vm in the same rank. Detailed information can be found in [1].

  • Rank This transformation replaces each value with its rank in the list of sorted values. The smallest value is replaced by 1 and the largest value is replaced by the total number of non-missing values, N. If there are no tied values, the results in a perfectly uniform distribution. In the case of ties, all tied values receive the mean rank.

  • Rlog Regularied log transformation is the method implemented in DESeq2 package under the name of rlog. It applies a transformation to remove the dependence of the variance on mean. It should not be applied on zero inflated data such as single cell RNA-seq raw count data. The output of this task should not be used for differential expression analysis, but rather for data exploration, like clustering etc.

  • Round Round the value to the nearest integer.

  • RPKM (Reads per kilobase of transcript per million mapped reads [2]) TXsf = (109 * Xsf)/(TMRs*Lf) Where Xsf is the raw read of sample S on feature F, TMRs is the total mapped reads of sample S, Lf is the length of the feature F,

    If quantification is performed on an aligned reads data node, total mapped reads is the aligned reads. If quantification is generated from imported read count text file, the total mapped reads is the sum of all feature reads in the sample. If the feature is a transcript, transcript length Lf is the sum of the lengths of all the exons. If the feature is a gene, gene length is the distance between the start position of the most downstream exon and the stop position of the most upstream exon. See Bullard et al. for additional comparisons with other normalization packages [3]

    For paired reads, the normalization option will show up as FPKM (Fragments per kilobase per million mapped reads) rather than RPKM. However, the calculations are the same.

  • Subtract When mean, median, Q1, Q3, std dev or sum is selected, the corresponding statistics will be calculated based on the transform on sample or features option Example: If transform on Samples is selected, Subtract mean is calculated as: TXsf = Xsf - Ms where Ms is the mean of the sample Example: If transform on Features is selected, Subtract mean is calculated as: TXsf = Xsf - Mf where Mf is the mean of the feature

  • TMM (Trimmed mean of M-values) The scaling factors is produced according to the algorithm described in Robinson et al [4]. The paper by Dillies et al. [5] contains evidence that TMM has an edge over other normalization methods. The reference sample is randomly selected. When perform the trimming, for M values (fold change), the upper 30% and lower 30% are removed; for A values (absolute expression), the upper 5% and lower 5% are removed.

  • TPM (Transcripts per million as described in Wagner et al [6]) The following steps are performed:

    1. Normalize the reads by the feature length. Here length is measured in kilobases but the final TPM values do not depend on the length unit. RPKsf = Xsf / Lf;

    2. Obtain a scaling factor for sample s as Ks = 10-6 ∑Ff=1 RPKsf

    3. Divide raw reads by the length and the scaling factor to get TPM TXsf = Xsf / Lf / Ks

  • Upper quartile

  • The method is exactly the same as the LIMMA package [7]. The following is the simple summarization of the calculation:

    1. Remove all the features that have 0 reads in all samples.

    2. Calculate the effective library size per sample: effective library size = (raw library size (in millions))*((upper quartile for a particular sample)/ (geometric mean of upper quartiles in all the samples))

    3. Get the normalized counts by dividing the raw counts per feature by the effective library size (for the respective sample)

Normalization Report

The Normalization report includes the Normalization methods used, a Feature distribution table, Box-whisker plots of the Expression signal before and after normalization, and Sample histogram charts before and after normalization. Note that all visualizations are disabled for results with more than 30 samples.

Normalization methods

A summary of the normalization methods performed. They are listed by the order they were performed.

Feature distribution table

A table that presents descriptive statistics on each sample, the last row is the grand statistics across all samples (Figure 4).

Expression signal

These box-whisker plots show the expression signal distribution for each sample before and after normalization. When you mouse over on each bar in the plot, a balloon would show detailed percentile information (Figure 5).

Sample histogram

A histogram is displayed for data before and after it is normalized. Each line is a sample, where the X axis is the range of the data in the node and the Y-axis is the frequency of the value within the range. When you mouse over a circle which represent a center of an interval, detailed information will appear in a balloon (Figure 6). It includes:

  • The sample name.

  • The range of the interval, “[ “represent inclusive, “)” represent exclusive.

  • The frequency value within the interval

References

  1. Bolstad BM, Irizarry RA, Astrand M, Speed, TP. A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics. 2003; 19(2): 185-193.

  2. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008; 5(7): 621–628.

  3. Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010; 11: 94.

  4. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11: R25.

  5. Dillies MA, Rau A, Aubert J et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013; 14(6): 671-83.

  6. Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abundance using RNA-seq data. Theory Biosci. 2012; 131(4): 281-5.

  7. Ritchie ME, Phipson B, Wu D et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43(15):e97.

Additional Assistance

Poscounts (Deseq2 only) Deseq2 size factor estimate option. Comparing with Median ratio, poscount method can be used when all genes contain a sample with a zero. It calculates a modified geometric mean by taking the nth root of the product of the non-zero counts. It is not per million scale. Here is the .

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

details
our support page
Figure 1. When a data node containing quantified data is selected, Normalization becomes available on the context sensitive menu
Figure 2. Transformation can be done on samples or on features
Figure 3. Normalization using Partek’s recommended method
Figure 4. Feature distribution statistic information on each sample and across all the samples
Figure 5. Box-whisker plot displays expression signal distribution for each sample
Figure 6. Sample histogram. Mousing over shows detailed information about the interval. This includes sample name, range and frequency of the selected sample.