Partek
  • Overview
  • Partek Flow
    • Frequently Asked Questions
      • General
      • Visualization
      • Statistics
      • Biological Interpretation
      • How to cite Partek software
    • Quick Start Guide
    • Installation Guide
      • Minimum System Requirements
      • Single Cell Toolkit System Requirements
      • Single Node Installation
      • Single Node Amazon Web Services Deployment
      • Multi-Node Cluster Installation
      • Creating Restricted User Folders within the Partek Flow server
      • Updating Partek Flow
      • Uninstalling Partek Flow
      • Dependencies
      • Docker and Docker-compose
      • Java KeyStore and Certificates
      • Kubernetes
    • Live Training Event Recordings
      • Bulk RNA-Seq Analysis Training
      • Basic scRNA-Seq Analysis & Visualization Training
      • Advanced scRNA-Seq Data Analysis Training
      • Bulk RNA-Seq and ATAC-Seq Integration Training
      • Spatial Transcriptomics Data Analysis Training
      • scRNA and scATAC Data Integration Training
    • Tutorials
      • Creating and Analyzing a Project
        • Creating a New Project
        • The Metadata Tab
        • The Analyses Tab
        • The Log Tab
        • The Project Settings Tab
        • The Attachments Tab
        • Project Management
        • Importing a GEO / ENA project
      • Bulk RNA-Seq
        • Importing the tutorial data set
        • Adding sample attributes
        • Running pre-alignment QA/QC
        • Trimming bases and filtering reads
        • Aligning to a reference genome
        • Running post-alignment QA/QC
        • Quantifying to an annotation model
        • Filtering features
        • Normalizing counts
        • Exploring the data set with PCA
        • Performing differential expression analysis with DESeq2
        • Viewing DESeq2 results and creating a gene list
        • Viewing a dot plot for a gene
        • Visualizing gene expression in Chromosome view
        • Generating a hierarchical clustering heatmap
        • Performing biological interpretation
        • Saving and running a pipeline
      • Analyzing Single Cell RNA-Seq Data
      • Analyzing CITE-Seq Data
        • Importing Feature Barcoding Data
        • Data Processing
        • Dimensionality Reduction and Clustering
        • Classifying Cells
        • Differentially Expressed Proteins and Genes
      • 10x Genomics Visium Spatial Data Analysis
        • Start with pre-processed Space Ranger output files
        • Start with 10x Genomics Visium fastq files
        • Spatial data analysis steps
        • View tissue images
      • 10x Genomics Xenium Data Analysis
        • Import 10x Genomics Xenium Analyzer output
        • Process Xenium data
        • Perform Exploratory analysis
        • Make comparisons using Compute biomarkers and Biological interpretation
      • Single Cell RNA-Seq Analysis (Multiple Samples)
        • Getting started with the tutorial data set
        • Classify cells from multiple samples using t-SNE
        • Compare expression between cell types with multiple samples
      • Analyzing Single Cell ATAC-Seq data
      • Analyzing Illumina Infinium Methylation array data
      • NanoString CosMx Tutorial
        • Importing CosMx data
        • QA/QC, data processing, and dimension reduction
        • Cell typing
        • Classify subpopulations & differential expression analysis
    • User Manual
      • Interface
      • Importing Data
        • SFTP File Transfer Instructions
        • Import single cell data
        • Importing 10x Genomics Matrix Files
        • Importing and Demultiplexing Illumina BCL Files
        • Partek Flow Uploader for Ion Torrent
        • Importing 10x Genomics .bcl Files
        • Import a GEO / ENA project
      • Task Menu
        • Task actions
        • Data summary report
        • QA/QC
          • Pre-alignment QA/QC
          • ERCC Assessment
          • Post-alignment QA/QC
          • Coverage Report
          • Validate Variants
          • Feature distribution
          • Single-cell QA/QC
          • Cell barcode QA/QC
        • Pre-alignment tools
          • Trim bases
          • Trim adapters
          • Filter reads
          • Trim tags
        • Post-alignment tools
          • Filter alignments
          • Convert alignments to unaligned reads
          • Combine alignments
          • Deduplicate UMIs
          • Downscale alignments
        • Annotation/Metadata
          • Annotate cells
          • Annotation report
          • Publish cell attributes to project
          • Attribute report
          • Annotate Visium image
        • Pre-analysis tools
          • Generate group cell counts
          • Pool cells
          • Split matrix
          • Hashtag demultiplexing
          • Merge matrices
          • Descriptive statistics
          • Spot clean
        • Aligners
        • Quantification
          • Quantify to annotation model (Partek E/M)
          • Quantify to transcriptome (Cufflinks)
          • Quantify to reference (Partek E/M)
          • Quantify regions
          • HTSeq
          • Count feature barcodes
          • Salmon
        • Filtering
          • Filter features
          • Filter groups (samples or cells)
          • Filter barcodes
          • Split by attribute
          • Downsample Cells
        • Normalization and scaling
          • Impute low expression
          • Impute missing values
          • Normalization
          • Normalize to baseline
          • Normalize to housekeeping genes
          • Scran deconvolution
          • SCTransform
          • TF-IDF normalization
        • Batch removal
          • General linear model
          • Harmony
          • Seurat3 integration
        • Differential Analysis
          • GSA
          • ANOVA/LIMMA-trend/LIMMA-voom
          • Kruskal-Wallis
          • Detect alt-splicing (ANOVA)
          • DESeq2(R) vs DESeq2
          • Hurdle model
          • Compute biomarkers
          • Transcript Expression Analysis - Cuffdiff
          • Troubleshooting
        • Survival Analysis with Cox regression and Kaplan-Meier analysis - Partek Flow
        • Exploratory Analysis
          • Graph-based Clustering
          • K-means Clustering
          • Compare Clusters
          • PCA
          • t-SNE
          • UMAP
          • Hierarchical Clustering
          • AUCell
          • Find multimodal neighbors
          • SVD
          • CellPhoneDB
        • Trajectory Analysis
          • Trajectory Analysis (Monocle 2)
          • Trajectory Analysis (Monocle 3)
        • Variant Callers
          • SAMtools
          • FreeBayes
          • LoFreq
        • Variant Analysis
          • Fusion Gene Detection
          • Annotate Variants
          • Annotate Variants (SnpEff)
          • Annotate Variants (VEP)
          • Filter Variants
          • Summarize Cohort Mutations
          • Combine Variants
        • Copy Number Analysis (CNVkit)
        • Peak Callers (MACS2)
        • Peak analysis
          • Annotate Peaks
          • Filter peaks
          • Promoter sum matrix
        • Motif Detection
        • Metagenomics
          • Kraken
          • Alpha & beta diversity
          • Choose taxonomic level
        • 10x Genomics
          • Cell Ranger - Gene Expression
          • Cell Ranger - ATAC
          • Space Ranger
          • STARsolo
        • V(D)J Analysis
        • Biological Interpretation
          • Gene Set Enrichment
          • GSEA
        • Correlation
          • Correlation analysis
          • Sample Correlation
          • Similarity matrix
        • Export
        • Classification
        • Feature linkage analysis
      • Data Viewer
      • Visualizations
        • Chromosome View
          • Launching the Chromosome View
          • Navigating Through the View
          • Selecting Data Tracks for Visualization
          • Visualizing the Results Using Data Tracks
          • Annotating the Results
          • Customizing the View
        • Dot Plot
        • Volcano Plot
        • List Generator (Venn Diagram)
        • Sankey Plot
        • Transcription Start Site (TSS) Plot
        • Sources of variation plot
        • Interaction Plots
        • Correlation Plot
        • Pie Chart
        • Histograms
        • Heatmaps
        • PCA, UMAP and tSNE scatter plots
        • Stacked Violin Plot
      • Pipelines
        • Making a Pipeline
        • Running a Pipeline
        • Downloading and Sharing a Pipeline
        • Previewing a Pipeline
        • Deleting a Pipeline
        • Importing a Pipeline
      • Large File Viewer
      • Settings
        • Personal
          • My Profile
          • My Preferences
          • Forgot Password
        • System
          • System Information
          • System Preferences
          • LDAP Configuration
        • Components
          • Filter Management
          • Library File Management
            • Library File Management Settings
            • Library File Management Page
            • Selecting an Assembly
            • Library Files
            • Update Library Index
            • Creating an Assembly on the Library File Management Page
            • Adding Library Files on the Library File Management Page
            • Adding a Reference Sequence
            • Adding a Cytoband
            • Adding Reference Aligner Indexes
            • Adding a Gene Set
            • Adding a Variant Annotation Database
            • Adding a SnpEff Variant Database
            • Adding a Variant Effect Predictor (VEP) Database
            • Adding an Annotation Model
            • Adding Aligner Indexes Based on an Annotation Model
            • Adding Library Files from Within a Project
            • Microarray Library Files
            • Adding Prep kit
            • Removing Library Files
          • Option Set Management
          • Task Management
          • Pipeline managment
          • Lists
        • Access
          • User Management
          • Group Management
          • Licensing
          • Directory Permissions
          • Access Control Log
          • Failed Logins
          • Orphaned files
        • Usage
          • System Queue
          • System Resources
          • Usage Report
      • Server Management
        • Backing Up the Database
        • System Administrator Guide (Linux)
        • Diagnosing Issues
        • Moving Data
        • Partek Flow Worker Allocator
      • Enterprise Features and Toolkits
        • REST API
          • REST API Command List
      • Microarray Toolkit
        • Importing Custom Microarrays
      • Glossary
    • Webinars
    • Blog Posts
      • How to select the best single cell quality control thresholds
      • Cellular Differentiation Using Trajectory Analysis & Single Cell RNA-Seq Data
      • Spatial transcriptomics—what’s the big deal and why you should do it
      • Detecting differential gene expression in single cell RNA-Seq analysis
      • Batch remover for single cell data
      • How to perform single cell RNA sequencing: exploratory analysis
      • Single Cell Multiomics Analysis: Strategies for Integration
      • Pathway Analysis: ANOVA vs. Enrichment Analysis
      • Studying Immunotherapy with Multiomics: Simultaneous Measurement of Gene and Protein
      • How to Integrate ChIP-Seq and RNA-Seq Data
      • Enjoy Responsibly!
      • To Boldly Go…
      • Get to Know Your Cell
      • Aliens Among Us: How I Analyzed Non-Model Organism Data in Partek Flow
    • White Papers
      • Understanding Reads in RNA-Seq Analysis
      • RNA-Seq Quantification
      • Gene-specific Analysis
      • Gene Set ANOVA
      • Partek Flow Security
      • Single Cell Scaling
      • UMI Deduplication in Partek Flow
      • Mapping error statistics
    • Release Notes
      • Release Notes Archive - Partek Flow 10
  • Partek Genomics Suite
    • Installation Guide
      • Minimum System Requirements
      • Computer Host ID Retrieval
      • Node Locked Installation
        • Windows Installation
        • Macintosh Installation
      • Floating/Locked Floating Installation
        • Linux Installation
          • FlexNet Installation on Linux
        • Installing FlexNet on Windows
        • License Server FAQ's
        • Client Computer Connection to License Server
      • Uninstalling Partek Genomics Suite
      • Updating to Version 7.0
      • License Types
      • Installation FAQs
    • User Manual
      • Lists
        • Importing a text file list
        • Adding annotations to a gene list
        • Tasks available for a gene list
        • Starting with a list of genomic regions
        • Starting with a list of SNPs
        • Importing a BED file
        • Additional options for lists
      • Annotation
      • Hierarchical Clustering Analysis
      • Gene Ontology ANOVA
        • Implementation Details
        • Configuring the GO ANOVA Dialog
        • Performing GO ANOVA
        • GO ANOVA Output
        • GO ANOVA Visualisations
        • Recommended Filters
      • Visualizations
        • Dot Plot
        • Profile Plot
        • XY Plot / Bar Chart
        • Volcano Plot
        • Scatter Plot and MA Plot
        • Sort Rows by Prototype
        • Manhattan Plot
        • Violin Plot
      • Visualizing NGS Data
      • Chromosome View
      • Methylation Workflows
      • Trio/Duo Analysis
      • Association Analysis
      • LOH detection with an allele ratio spreadsheet
      • Import data from Agilent feature extraction software
      • Illumina GenomeStudio Plugin
        • Import gene expression data
        • Import Genotype Data
        • Export CNV data to Illumina GenomeStudio using Partek report plug-in
        • Import data from Illumina GenomeStudio using Partek plug-in
        • Export methylation data to Illumina GenomeStudio using Partek report plug-in
    • Tutorials
      • Gene Expression Analysis
        • Importing Affymetrix CEL files
        • Adding sample information
        • Exploring gene expression data
        • Identifying differentially expressed genes using ANOVA
        • Creating gene lists from ANOVA results
        • Performing hierarchical clustering
        • Adding gene annotations
      • Gene Expression Analysis with Batch Effects
        • Importing the data set
        • Adding an annotation link
        • Exploring the data set with PCA
        • Detect differentially expressed genes with ANOVA
        • Removing batch effects
        • Creating a gene list using the Venn Diagram
        • Hierarchical clustering using a gene list
        • GO enrichment using a gene list
      • Differential Methylation Analysis
        • Import and normalize methylation data
        • Annotate samples
        • Perform data quality analysis and quality control
        • Detect differentially methylated loci
        • Create a marker list
        • Filter loci with the interactive filter
        • Obtain methylation signatures
        • Visualize methylation at each locus
        • Perform gene set and pathway analysis
        • Detect differentially methylated CpG islands
        • Optional: Add UCSC CpG island annotations
        • Optional: Use MethylationEPIC for CNV analysis
        • Optional: Import a Partek Project from Genome Studio
      • Partek Pathway
        • Performing pathway enrichment
        • Analyzing pathway enrichment in Partek Genomics Suite
        • Analyzing pathway enrichment in Partek Pathway
      • Gene Ontology Enrichment
        • Open a zipped project
        • Perform GO enrichment analysis
      • RNA-Seq Analysis
        • Importing aligned reads
        • Adding sample attributes
        • RNA-Seq mRNA quantification
        • Detecting differential expression in RNA-Seq data
        • Creating a gene list with advanced options
        • Visualizing mapped reads with Chromosome View
        • Visualizing differential isoform expression
        • Gene Ontology (GO) Enrichment
        • Analyzing the unexplained regions spreadsheet
      • ChIP-Seq Analysis
        • Importing ChIP-Seq data
        • Quality control for ChIP-Seq samples
        • Detecting peaks and enriched regions in ChIP-Seq data
        • Creating a list of enriched regions
        • Identifying novel and known motifs
        • Finding nearest genomic features
        • Visualizing reads and enriched regions
      • Survival Analysis
        • Kaplan-Meier Survival Analysis
        • Cox Regression Analysis
      • Model Selection Tool
      • Copy Number Analysis
        • Importing Copy Number Data
        • Exploring the data with PCA
        • Creating Copy Number from Allele Intensities
        • Detecting regions with copy number variation
        • Creating a list of regions
        • Finding genes with copy number variation
        • Optional: Additional options for annotating regions
        • Optional: GC wave correction for Affymetrix CEL files
        • Optional: Integrating copy number with LOH and AsCN
      • Loss of Heterozygosity
      • Allele Specific Copy Number
      • Gene Expression - Aging Study
      • miRNA Expression and Integration with Gene Expression
        • Analyze differentially expressed miRNAs
        • Integrate miRNA and Gene Expression data
      • Promoter Tiling Array
      • Human Exon Array
        • Importing Human Exon Array
        • Gene-level Analysis of Exon Array
        • Alt-Splicing Analysis of Exon Array
      • NCBI GEO Importer
    • Webinars
    • White Papers
      • Allele Intensity Import
      • Allele-Specific Copy Number
      • Calculating Genotype Likelihoods
      • ChIP-Seq Peak Detection
      • Detect Regions of Significance
      • Genomic Segmentation
      • Loss of Heterozygosity Analysis
      • Motif Discovery Methods
      • Partek Genomics Suite Security
      • Reads in RNA-Seq
      • RNA-Seq Methods
      • Unpaired Copy Number Estimation
    • Release Notes
    • Version Updates
    • TeamViewer Instructions
  • Getting Help
    • TeamViewer Instructions
Powered by GitBook
On this page
  • Filtering cells
  • Filter features
  • Normalization
  • PCA
  • Graph-based clustering
  • t-SNE
  • Coloring the t-SNE scatter plot
  • Selecting cells on the t-SNE scatter plot
  • Filtering cells on the t-SNE scatter plot
  • Classifying cells
  • Comparing gene expression between cell types
  • Generating a heatmap
  • Performing enrichment analysis
  • Pipeline
  • Additional Assistance
Export as PDF
  1. Partek Flow
  2. Tutorials

Analyzing Single Cell RNA-Seq Data

PreviousSaving and running a pipelineNextAnalyzing CITE-Seq Data

Last updated 3 months ago

This tutorial presents an outline of the basic series of steps for analyzing a single cell RNA-Seq experiment in Partek Flow starting with the count matrix file.

This tutorial includes only one sample, but the same steps will be followed when analyzing multiple samples. For notes on a few aspects specific to a multi-sample analysis, please see our tutorial.

If you are new to Partek Flow, please see for information about data transfer and import and for information about the Partek Flow user interface.

Filtering cells

An important step in analyzing single cell RNA-Seq data is to filter out low quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few reads to be analyzed. You can do this in Partek Flow using the Single cell QA/QC task.

  • Click on the Single cell data node

  • Click on the QA/QC section of the task menu

  • Click on Single cell QA/QC

A task node, Single cell QA/QC, is produced. Initially, the node will be semi-transparent to indicate that it has been queued, but not completed. A progress bar will appear on the Single cell QA/QC task node to indicate that the task is running (Figure 1).

  • Click the Single cell QA/QC node once it finishes running

  • Double-click the Task report in the task menu

The Single cell QA/QC report includes interactive violin plots showing the value of every cell in the project on several quality measures (Figure 2).

There can be four plots: number of read counts per cell, number of detected genes per cell, the percentage of mitochondrial reads per cell, and the percentage of ribosomal counts.

The plots will be shaded to reflect the selection. Cells that are excluded will be shown as dim dots on all plots.

The read counts per cell and number of detected genes per cell are typically used to filter out potential doublets - if a cell as an unusually high number of total counts or detected genes, it may be a doublet. The mitochondrial reads percentage can be used to identify cells damaged during cell isolation - if a cell has a high percentage of mitochondrial counts, it is likely damaged or dying and may need to be excluded.

Filter features

A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes (features). Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depends on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options. The Filter features step can also be performed before normalization or after normalization.

  • Click the data node containing count matrix

  • Click Filtering in the task menu

  • Click Filter features

There are four categories of filter available - noise reduction, statistics based, feature metadata, and feature list.

The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The feature list filter allows you to filter your data set to include or exclude particular genes.

For example, you can use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file.

  • Click the Noise reduction filter check box

  • Set the Noise reduction filter to Exclude features where value <= 0 in at least 99.9% of cells using the drop-down menus and text boxes

  • Click Finish to apply the filter (Figure 3)

This results node, Filtered counts, will be the starting point for the next stage of analysis.

Normalization

Because different cells will have a different number of total counts, it is important to normalize the data prior to downstream analysis. For droplet-based single cell isolation and library preparation methods that use a 3' counting strategy, where only the 3' end of each transcript is captured and sequenced, we recommend the following normalization - 1. CPM (counts per million), 2. Add 1, 3. Log2. This accounts for differences in total UMI counts per cell and log transforms the data, which makes the data easier to visualize.

  • Click the Filtered cells results node produced by the Filtered counts task

  • Click Normalization and scaling in the context-sensitive task menu on the right

  • Click Normalization

This adds CPM (counts per million), Add 1, and Log2 to the Normalization order panel. Normalization steps are performed in descending order.

  • Click Finish to apply the normalization (Figure 4 )

A new Normalized counts data node will be produced. You can choose to change the color of this node by right-clicking on the task node then clicking Change color and/or rename the result node by right-clicking and selecting Rename data node.

In the example below, I have changed the color to dark blue and renamed the results node based on the scheme.

For more information on normalizing data in Partek Flow, please see the Normalize Counts section of the user manual.

PCA

Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. Because PCA is used to reduce the dimensionality of the data prior to clustering as part of a standard single cell analysis workflow, it is useful to examine the results of PCA for your data set prior to clustering.

  • Click the Filtered counts node

  • Click Exploratory analysis in the task menu

  • Click PCA from the drop-down list

You can choose Features contribute equally to standardize the genes prior to PCA or allow more variable genes to have a larger effect on the PCA by choosing by variance. By default, we take variance into account and focus on the most variable genes.

If you have multiple samples, you can choose to run PCA for each sample individually or for all samples together by selecting or not selecting the Split by sample option (Figure 5).

  • Click Finish to run

A new PCA task node will be produced.

  • Double-click the PCA task node to open the 3D PCA scatter plot in data viewer (Figure 6)

Beside PCA coordinates of the cells, PCA task report also includes, the Scree plot, the component loadings table, and the PC projections table.

The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering, UMAP and t-SNE.

Note that Partek Flow suggests appropriate data for each plot type that is chosen so only PCA results will be available to select from for the Scree plot.

  • Mouse over the Scree plot to identify the point where additional PCs offer little additional information

In this data set, a reasonable cut-off could be set anywhere between 7 and 20 PCs.

Viewing the genes correlated with each PC can be useful when choosing how many PCs to include.

To display PCA projects table, click on the Table drop-down list in the Content icon under Configure and choose PCA projections (Figure 9)

PCA projections table contains each row as an observation (a cell in this case), each column represents one principal component (Figure 10). This table can be downloaded as text file, the same way as the component loading table.

Graph-based clustering

Graph-based clustering identifies groups of similar cells using PC values as the input. By including only the most informative PCs, noise in the data set is excluded, improving the results of clustering.

  • Click the PCA data node

  • Click Exploratory analysis in the task menu

  • Click Graph-based clustering

Clustering can be performed on each sample individually or on all samples together. Here, we are working with a single sample.

  • Check Compute biomarkers to compute features that are highly expressed when comparing each cluster (Figure 11)

  • Click Configure to access the Advanced options and change the Number of nearest neighbors to 50 and Nearest Neighbor Type to K-NN for this example tutorial.

The Number of principal components should be set based on the your examination of the Scree plot and component loadings table. The default value of 100 is likely exhaustive for most data sets, but may introduce noise that reduces the number of clusters that can be distinguished.

  • Click Finish to run the task

A new Graph-based clusters data and Biomarkers data node will be generated along with the task nodes.

  • Double-click the Graph-based clusters node to see the cluster results and statistics (left screenshot on Figure 12)

  • Double-click the Biomarkers node to see the computed biomarkers if you have selected this option (right screenshot on Figure 12)

The Graph-based clustering result lists the Total number of clusters and what proportion of cells fall into each cluster as well as Maximum modularity which is a measurement of the quality of the clustering result where optimal modularity is 1. The Biomarkers node includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the bottom right of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.

We will use t-SNE to visualize the results of Graph-based clustering.

t-SNE

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction technique that prioritizes local relationships to build a low-dimensional representation of the high-dimensional data that places objects that are similar in high-dimensional space close together in the low-dimensional representation. This makes t-SNE well suited for analyzing high-dimensional data when the goal is to identify groups of similar objects, such as cell types in single cell RNA-Seq data.

  • Click the Graph-based clusters node

  • Click Exploratory analysis in the task menu

  • Click t-SNE

If you have multiple samples, you can choose to run t-SNE for each sample individually or for all samples together using the Split cells by sample option. Please note that this option will not be present if you are running t-SNE on a clustering result. For clarity, clustering results run with all samples together must be viewed together and clustering results run by sample must be viewed by sample.

Like Graph-based clustering, t-SNE takes PC values as its input and further reduces the data down to two or three dimensions. For consistency, you should use the same number of PCs as the input for t-SNE that you used for Graph-based clustering.

  • Click Apply

  • Click Finish to run (Figure 13)

A new t-SNE task node will be produced.

  • Double-click the t-SNE node to open the t-SNE task report (Figure 14). Use the panel on the left to modify the plot or add more plots to this Data viewer session.

The t-SNE scatter plot is interactive and can be viewed for 2D or 3D. The t-SNE plot is 3D by default. You can rotate the 3D plot by left-clicking and dragging your mouse or using Control under Configure. You can zoom in and out using your mouse wheel. You can pan by right-clicking and dragging your mouse. You can use Style to modify color, shape, size, and labeling (e.g. add a fog effect to improve depth perception on the plot). Add a 2D plot clicking New plot, selecting 2D Scatter plot and selecting t-SNE as the source of the data.

Coloring the t-SNE scatter plot

Click on the plot to ensure that the plot window is selected. Click Style under Configure to color the t-SNE.

  • Color by the options in the drop-down menu under Color. You should be on the normalized counts node which can be seen by hovering over or clicking the circle (node) to the right of the drop-down.

  • Click the text field in the drop-down and start typing CD79A then select the gene by clicking on it (Figure 15)

The cells on the plot will be colored based on their expression level of CD79A (Figure 16). In the example in Figure 16, the Style icon has been dragged to a different location on the screen and the legend has also been resized and moved. Resizing the legend can either be done on the legend itself or using the Description icon under Configure.

Clicking a cell on the plot shows the expression values of the cell in the legend. Hovering over a cell on the plot also shows this information and related details (Figure 18).

If you want to color by more than three genes at time, such as by a list of genes that distinguish a particular cell type, you can use the color by Feature list option.

  • Select Feature List from the Color by drop-down

  • Choose Cytotoxic cells from the List drop-down (use List management in Settings to add lists to Partek Flow which will automatically make them available here)

  • Choose PCA from the Metric drop-down

Coloring by a list, in this way, calculates the first three principal components for the gene list and colors the cells on the plot by their values along those three PCs with green for PC1, red for PC2, and blue for PC3 (Figure 19).

Typically, the expression of a set of marker genes will be highly correlated, allowing the first PC to account for a large percentage of the variance between cells for that gene list. As a result, the group of cells characterized by their expression of the genes on the list will separate from the rest of the cells along PC1 and will be colored green (Figure 16). If the gene list is more complex, for example, including marker genes for multiple cell types, there may be several sets of correlated genes accounting for significant amounts of variance, leading to groups of cells being distinguishable along PC2 and PC3 as well. In that case, there may be green, blue, and red groups of cells on the plot. If the gene list does not distinguish any group of cells, all cells will have similar PC values, leading to similarly colored cells on the plot.

In addition to coloring by gene expression and by gene lists, the points can be colored by any cell or sample attribute. Available attributes are listed as options in the Color by drop-down menu. Note that any available options are dependent upon the selected data node. In the following section we will use the attribute Graph-based to color our cells by the clusters identified in the Graph-based clustering task (Figure 20).

Selecting cells on the t-SNE scatter plot

  • Left-click and hold to draw a lasso around a cluster of cells

  • Release and click the starting circle to close the lasso and select the enclosed cells (Figure 21)

You can also create a lasso with straight lines using Lasso mode by clicking, releasing, and clicking again to draw a shape.

By default, selected cells are shown in bold while unselected cells are dimmed (Figure 22). This can be changed to gray selected cells using the Select & Filter tool in the left panel as shown in Figure 22.

  • Double-click any blank section of the scatter plot to clear the selection

Alternatively, you can select cells using any criteria available for the data node that is selected in the Select & Filter tool. To change the data selection click the circle (node) and select the data.

  • Choose Graph-based from the Criteria drop-down menu in the Select & Filter tool after ensuring you on are on the Graph-based cluster node by hovering on the circle (Figure 23). If you are not on the correct node, you need to click the circle and select the data.

This adds check boxes for each level of the attribute (i.e., clusters). Click a check box to select the cells with that attribute level.

  • Click only 2 and 3

This selects cells from Graph-based clusters 2 and 3 (Figure 24). The number of selected cells is listed in the Legend on the plot.

Cells can also be selected based on their gene expression values in the Select & Filter section.

  • Click the circle and select the Normalized counts node which has gene expression data

  • Type cd3d in the text field of the drop-down

  • Click on CD3D to add it as criteria to select from and use the slider or text field to adjust the selected values. Pin the histogram to visualize the distribution during selection.

Very specific selections can be configured by adding criteria in this way. In the example below, Clusters 2 and 3 and high CD3D expression is selected (Figure 25).

Filtering cells on the t-SNE scatter plot

Once a cell has been selected on the plot, it can be filtered. The filter controls can exclude or include (only) any selected cell. Filtering can be particularly useful when you want to use a gene expression threshold to classify a group of cells, but the gene in question is not exclusively expressed by your cell type of interest.

In this example we can filter to include just cells from the selection we have already made.

The plot will update to show only the included cells as seen in Figure 26.

Cells that are not shown on the plot cannot be selected, allowing you to focus on the visible cells. The number of cells shown on the plot out of the total number of original cells is shown in the Legend. You can adjust the view to focus on only the included cells.

Additional inclusion or exclusion filters can be added to focus on a smaller subset of cells.

  • Click Clear filters to remove applied filters

The plot will update to show all cells and return to the original scaling.

Classifying cells

Classifying cells allows to you assign cells to groups that can be used in downstream analysis and visualizations. Commonly, this is used to describe cell types, such as B cells and T cells, but can be used to describe any group of cells that you want to consider together in your analysis, such as cycling cells or CD14 high expressing cells. Each cell can only belong to one class at a time so you cannot create overlapping classes.

To classify a cell, just select it then click Classify selection in the Classify tool.

For example, we can classify a cluster of cells expressing high levels of CD79A as B cells.

  • Set Color by in the Style configuration to the normalized counts node

  • Type CD79A in the search box and select it. Rotate the 3D plot if you need to see this cluster more clearly.

  • Draw a lasso around the cluster of CD79A-expressing cells (Figure 28)

Because most of these cells express CD79A, a B cell marker, and because they cluster together on the t-SNE, suggesting they have similar overall gene expression, we believe that all these cells are B cells.

  • Click Classify under Tools in the left panel

  • Type B cells for the Name

  • Click Save (Figure 29)

  • Color by New classifications under Style (Figure 30) while you are still working on the classifications

To use the classifications in downstream tasks and visualizations, you must first apply them.

  • Click Apply classifications

  • Name the classification (e.g. Classified Cell Types)

  • Click Run to confirm

Once you have added a classification to the project, you can color the t-SNE plot by the Classification.

Here, I classified a few additional cell types using a combination of known marker genes and the clustering results then applied the classification (Figure 31).

Summarize Classifications with the number and percentage of cells from each sample that belong to each classification using an Attribute table under New plot. This is particularly useful when you are classifying cells from multiple samples.

  • Click New plot

  • Select Attribute table and the source of data (Figure 32) which in this case is called Classify result

  • Click on the Normalized counts" node

  • Navigate to the Compute biomarkers task under Statistics in the task menu

  • Follow the task dialogue and click Finish (Figure 33)

  • Double click the Biomarkers node to view the Biomarkers results

Comparing gene expression between cell types

A common goal in single cell analysis is to identify genes that distinguish a cell type. To do this, you can use the differential analysis tools in Partek Flow. I will show how to use the ANOVA test in Partek Flow, a statistical test shown to be highly effective for differential analysis of single cell RNA-Seq data.

  • Click the Normalized counts results node

  • Click Statistics in the toolbox

  • Click Differential Analysis

  • Select ANOVA as the M_ethod to use for differential analysis_

The first page of the configuration dialog asks what attributes you want to include in the statistical test. Here, we only want to consider the Classifications, but in a more complex experiment, you could also include experimental conditions or other sample attributes.

  • Click Classified Cell Types

  • Click Next (Figure 34)

We will make a comparison between NK cells and all the other cell types to identify genes that distinguish NK cells. You can also use this tool to identify genes that differ between two cell types or genes that differ in the same cell type between experimental conditions.

  • Drag NK cells to the top panel

The top panel is the numerator for fold-change calculations so the experimental or test groups should be selected in the top panel.

  • Click all the other classifications in the bottom panel

The bottom panel is the denominator for fold-change calculations so the control group should be selected in the bottom panel.

  • Click Add comparison

This adds the comparison to the statistical test.

  • Click Finish to run the ANOVA task (Figure 35)

  • Double-click the newly generated data node to open the ANOVA task report

The Feature plot viewer will open showing a dot plot for CCL4 which can be modified to summarize the data in different ways (Figure 37). In the image below, the red boxes highlight the changes that were made to configure the plot. This includes overlaying the violins (density plots with the width corresponding to frequency) on the dot plot represented by the Classified Cell Types.

You can switch the grouping of cells. To do this, show the X axis labels then click and drag the labels to reposition the cell types on the plot.

  • Click ANOVA report to return to the table

The table lists all of genes in the data set; using the filter control panel on the left, we can filter to just the genes that are significantly different for the comparison.

  • Click FDR step up and click the arrow next to it

  • Set to 1e-8

Here, we are using a very stringent cutoff to focus only on genes that are specific to NK cells, but other applications may require a less stringent cutoff.

  • Click Fold change and click the arrow next to it

  • Set to -2 to 2

The number of genes at the top of the filter control panel updates to indicate how many genes are left after the filters are applied.

The ANOVA report will close and a new task, the Differential analysis filter, will run and generate a filtered Feature list data node.

Generating a heatmap

Once we have filtered to a list of significantly different genes, we can visualize these genes by generating a heatmap.

  • Click the Filtered feature list data node produced by the Differential analysis filter

  • Click Exploratory analysis in the toolbox

  • Click Hierarchical clustering / heatmap

The hierarchical clustering task will generate the heatmap; choose Heatmap as the plot type. You can choose to Cluster features (genes) and cells (samples) under Feature order and Cell order in the Ordering section. You will almost always want to cluster features as this generates the clear blocks of color that make heatmaps comprehensible. For single cell data sets, you may choose to forgo clustering the cells in favor of ordering them by the attribute of interest. Here, we will not filter the cells, but instead order them by their classification.

  • Click Assign order under Cell order

You can filter samples using the Filtering section of the configuration dialog. Here, we will not filter out any samples or cells.

  • Choose Classification from the Ordering drop-down menu

  • Drag NK cells to the top of the Sample order

  • Click Finish to run (Figure 38)

  • Double-click the Hierarchical cluster task node to open the task report

It may initially be hard to distinguish striking differences in the heatmap. This is common in single cell RNA-Seq data because outlier cells will skew the high and low ends. We can adjust the minimum and maximum of the color scheme to improve the appearance of the heatmap.

  • Click Heatmap

  • Toggle on the Range Min and set to -2

  • Toggle on the Range Max and set to 2

Distinct blocks of red and blue are now more pronounced on the plot. Cells are on rows and genes are on columns. Because of the limited number of pixels on the screen, genes are grouped. You can zoom in using the zoom controls or your mouse wheel if you want to view individual gene rows. We can annotate the plot with cell attributes.

  • Choose Classified Cell Types from the Annotations drop-down menu

  • Change the Annotation font size under Style in the Annotations section

The plot now includes blocks of color along the left edge indicating the classification of the cells. We can transpose the plot to give the cell labels a bit more space.

  • Toggle off the Row labels under Axes to remove the sample labels

Performing enrichment analysis

While a long list of significantly different genes is important information about a cell type, it can be difficult to identify what the biological consequences of these changes might be just by looking at the genes one at a time. Using enrichment analysis, you can identify gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.

  • Click the Feature list data node produced by the Differential analysis filter

  • Click Biological interpretation

  • Click Gene set enrichment

We distribute the gene sets from the Gene Ontology Consortium, but Gene set enrichment can work with any custom or public gene set database.

  • Choose the latest assembly available from the Gene set drop-down

  • Click Finish

  • Double-click the Gene set enrichment task node to open the task report

The Gene set enrichment task report lists gene sets on rows with an enrichment score and p-value for each. It also lists how many genes in the gene set were in the input gene list and how many were not (Figure 40). Clicking the Gene set ID links to the geneontology.org page for the gene set.

In Partek Flow, you can also check for enrichment of KEGG pathways using the Pathway enrichment task. The task is quite similar to the Gene set enrichment task, but uses KEGG pathways as the gene sets.

The task report is similar to the Gene set enrichment task report with enrichment scores, p-values, and the number of genes in and not in the list (Figure 41).

Clicking the KEGG pathway ID in the Pathway enrichment task report opens a KEGG pathway map (Figure 42). The KEGG pathway maps have fold-change and p-value information from the input gene list overlaid on the map, adding a layer of additional information about whether the pathway was upregulated or downregulated in the comparison.

Color are customizable using the control panel on the left and the plot is interactive. Mousing over gene boxes gives the genes accounted for by the box, with genes present in the input list shown in bold, and the coloring gene shown in red (Figure 43).

Clicking a pathway box opens the map of that pathway, providing an easy way to explore related gene networks.

Pipeline

Additional Assistance

Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric. Cells can be filtered either with the plot controls by using the selection tools on the right of the plot (rectangle mode , ellipse mode , or lasso mode ) and selecting a region on one of the plots or by setting thresholds using the Select & Filter tool. Here, we will apply a filter for the number of read counts.

Open the Select & Filter icon in the left panel. The histograms can be pinned while fine tuning the selections (Figure 2). Set the filters to represent the majority of the population (violin width)

Click the filter icon and Apply observation filter then select the Single cell counts data node to run the Filter cells task on the first Single cell counts data node, it generates a Filtered counts task node that generates a Filtered cells results node

Use Save as to give this Data Viewer session a new name (e.g. QA/QC filter) so you can return to this filter at any time and see the exact criteria that has been selected and filtered.

Click to add the recommended normalization scheme

To draw a Scree plot, in Data viewer, choose Scree plot icon available in New plot under Setup on the left panel , choose the PCA data node (Figure 7)

Click the Table option in the New plot icon under Setup and select the PCA data node to open the Component loadings table (Figure 8)

This table lists genes on rows and PCs on columns, the value in this table is correlation coefficient r. The table can be downloaded as a text file by clicking on the Export table data icon on the upper-right corner of the plot.

Coloring by one gene uses the two-color numeric palette, which can be customized by clicking . To color by more than one gene use the Numeric triad option in the drop-down. If you color by more than one gene, the color palette switches to a Green-Red-Blue color scheme with the balance between the three color channels determined by the values of the three genes. For example, a cell that expresses all three genes would be white, a cell that expresses the first two genes would be yellow, and a cell that expresses none of the genes would be black (Figure 17).

The most basic way to select a point on the scatter plot is to click it with the mouse while in pointer mode. To select multiple cells, you can hold Ctrl on your keyboard and click the cells. To select larger groups of cells, you can switch to Lasso mode by clicking in the plot controls on the right hand side. The lasso lets you freely draw a shape to select a cluster of cells.

Click to activate Lasso mode

Click (filter include) to filter to just the selected cells (Figure 26).

Click on the plot controls or toggle on Fit visible in the Axes configuration to rescale the axes to the filtered points

To revert to the original scaling, click the button again or turn off Fit visible with the toggle.

Alternatively, to exclude selected cells, click (filter exclude) (Figure 27)

Click to activate Lasso mode

You can edit the name of a classification or delete it. In this project we use the hosted feature lists for "NK cells", "T cells" and "Monocytes" to classify these cell types by coloring the cells in the t-SNE plot and selecting the cells expressing those genes as shown above. See the documentation for more information on how to add these lists. The classifications you have made are saved as a working draft so if you close the plot and return to it, the classifications will still be there and can be visualized on the plot as "New classification". However, classifications are not available for downstream tasks until you apply them. Continue classifying the clusters and save the Data viewer session until you are ready to apply the classification to the data project.

The ANOVA task report lists genes on rows and the results of the statistical test (p-value, fold change, etc.) on columns (Figure 36). For more information, please see our documentation page on the .

Genes are listed in ascending order by the p-value of the first comparison so the most significant gene is listed first. To view a volcano plot for any comparison, click . To view a violin plot for a gene, click next to the Gene ID.

Click for CCL4

Click to generate a filtered version of the table for downstream analysis

For more information about the ANOVA task, please see the section of our user manual.

Click Transposed under Axes or use the transpose button on the plot to flip the axes

As with any visualization in Partek Flow, the image can be saved as a publication-quality image to your local machine by clicking or sent to a page in the project notebook by clicking . For more information about Hierarchical clustering, please see the section of the user manual.

For information about automating steps in this analysis workflow, please see our documentation page on .

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

list management
ANOVA task report
Differential Gene Expression - ANOVA
Making a Pipeline
our support page
Single Cell RNA-Seq Analysis (Multiple Samples)
Getting Started with Your Partek Flow Hosted Trial
Creating and Analyzing a Project
Filtering cells
Filter features
Normalization
PCA
Graph-based clustering
t-SNE
Coloring the t-SNE scatter plot
Selecting cells on the t-SNE scatter plot
Filtering cells on the t-SNE scatter plot
Classifying cells
Comparing gene expression between cell types
Generating a heatmap
Performing enrichment analysis
Pipeline
Hierarchical Clustering
Figure 1. Analyses tab
Figure 2. Single cell QA/QC plot
Figure 3. Filter features
Figure 4. Normalization settings
Figure 5. Configuring PCA
Figure 6. PCA scatter plot, each dot is a cell
Figure 7. PCA Scree plot
Figure 8. Component loadings
Figure 9. PC projection configuration dialog
Figure 10. PCA project table
Figure 11. Configure Graph-based clustering
Figure 12. Graph-based clustering results
Figure 13. t-SNE configuration
Figure 14. t-SNE plot
Figure 15. Coloring by a gene
Figure 16. Coloring by CD79A expression
Figure 17. Coloring by three genes
Figure 18. Viewing expression values of a cell
Figure 19. Coloring by a list
Figure 20. Coloring by Graph-based attribute
Figure 21. Lassoing cells
Figure 22. Selected cells
Figure 23. Picking an attribute
Figure 24. Selecting by attribute
Figure 25. Selecting by gene expression level
Figure 26. Activating the filter
Figure 27. Filtered t-SNE scatter plot
Figure 28. Selecting a cluster of CD79A-expressing cells
Figure 29. Classifying cells
Figure 30. Color by New classification
Figure 31. Color by Applied classification
Figure 32. Attribute table
Figure 33. Compute biomarkers
Figure 34. Choosing attributes to include in the statistical test
Figure 35. Configuring comparisons in the GSA task
Figure 36. Viewing GSA results
Figure 37. Gene expression plot
Figure 38. Configuring hierarchical clustering
Figure 39. Configurable heat map
Figure 40. Gene set enrichment report
Figure 41. Pathway enrichment report
Figure 42. KEGG pathway map
Figure 43. Viewing pathway map details
Figure 44. Described pipeline shown in the Analyses tab