arrow-left

All pages
gitbookPowered by GitBook
1 of 83

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Quality control for ChIP-Seq samples

We can check the quality of the samples using Partek Genomics Suite before analyzing the data.

hashtag
Strand cross-correlation

In ChIP-Seq, genomic DNA is fragmented and target-protein-bound DNA fragments are purified by immunoprecipitation. These purified fragments are between 100 and 500 base pairs depending on the protocol; however, because ChIP-Seq uses short-read sequencing (25 to 35 base pair reads) to maximize sequencing depth, only the ends of each fragment will be sequenced. Consequently, with single-end sequencing, the forward and reverse strands for the each fragment will be from opposite ends of the fragment. At a protein-binding site, there will be two peaks of read enrichment, one from enrichment of forward strand reads and another from enrichment of reverse strand reads. The average distance between these peaks is termed the effective fragment length. Because the forward and reverse strand peaks are generated from a common set of fragments, the peaks should be roughly symmetrical. By phase shifting the data to the mid-point between the two peaks, a common read density plot can be created that shows single peaks at binding sites.

Strand Cross-Correlation allows us to use the symmetrical distribution of forward and reverse strand fragments calculate the effective fragment length (Kharchenko et al., 2008). The Pearson correlation coefficient between the read densities of the forward and reverse strands is calculated after phase shifts of between 0 and 500 base pairs. This is visualized with the phase shift range on the x-axis and the corresponding Pearson correlation coefficients between forward and reverse strand read densities on the y-axis (Figure 1). High-quality ChIP-Seq data will give a strong peak on the Strand Cross-Correlation plot at the effective fragment length. When calling peaks, the forward and reverse (or paired end) reads are each phase-shifted by the effective fragment length to create a combined read density profile.

For paired-end sequencing, Strand Cross-Correlation is calculated from the distribution of distances between the paired reads from the ends of each fragment.

We will perform Strand Cross-Correlation to identify the effective fragment length we can use when calling read enrichment peaks.

  • Select Strand Cross-Correlation from the QA/QC section of the ChIP-Seq workflow

If you have not run this step before, you will be asked if you would like to create a new QA/QC child spreadsheet.

  • If prompted, select Yes to create a new child spreadsheet for QA/QC

After running Strand Cross-Correlation, the Strand Separation of Samples viewer will open as a new tab (Figure 1).

Figure 1. Strand Cross-Correlation profile plot showing possible effective fragment lengths on the x-axis and resulting Pearson correlation coefficients on the y-axis.

For the chip sample (blue), we can see the peak at 111 base pairs, corresponding to an effective fragment length of 111 base pairs. This number can be determined by examining the values in the strand_correlation spreadsheet (Figure 2), by moving the cursor over the peak in the graph, or by sorting the data in the spreadsheet. The Strand Separation of Samples graph is also useful as a quality control measure. In lower quality ChIP-Seq data, we would also observe a peak at the read length. The ratio between the Pearson correlation coefficient of the effective fragment length peak and the read length peak, normalized with the minimum correlation coefficient, [cc(fragment length) - min(cc)] / [cc(read length) - min(cc)] should be greater than 0.8 to meet the minimum quality standards recommended by the ENCODE project (Landt et al., 2012).

The mock sample (red) does not have an effective fragment length peak because it does not read density peaks to phase shift. It does have a small peak at the sequencing read length of 26 base pairs.

Figure 2. The strand correlation spreadsheet shows the Pearson correlation coefficients for each relative strand shift value (effective fragment length)

hashtag
Checking the distribution of reads

BAM files can contain both aligned and unaligned reads. The spreadsheet created during import shows the number of reads that were aligned to the reference genome. A large number of unaligned reads may be the result of poor quality sequencing data or alignment problems. It may also be useful to know how many reads map to more than one location in the genome if the options used during alignment supported multiple-mapped reads.

  • Select Alignments per read form the QA/QC section of the ChIP-Seq workflow

A new spreadsheet named Alignment_Counts will be generated (Figure 3).

Figure 3. Unaligned reads have been removed from these BAM files and the alignment options did not permit mapping to more than one location

The titles of columns 2. 0 Single End Alignments Per Read and 3. 1 Single End Alignment Per Read indicate that this is single end data. Column 2 shows the number of unaligned reads, while column 3 shows the number of reads that aligned exactly once. If the BAM files used in this tutorial included reads that mapped to more than one location in the genome, there would be additional columns.

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Creating Copy Number from Allele Intensities

The first step in analyzing Affymetrix intensity data is to estimate the number of copies of each marker (allele).

  • Select Create Copy Number (from Allele Intensities Only)

This launches the Copy Number Creation dialog (Figure 1).

Figure 1. Choosing paired samples or unpaired samples

Choosing Paired samples assumes that each sample has its own reference sample with a common sample ID and generates a copy number spreadsheet. Choosing Unpaired samples uses a common reference, either a single sample or a group of samples, to create both a copy number spreadsheet and an allele ratio spreadsheet.

hashtag
Create Copy Number from Pairs

In this tutorial, we have paired tumor-normal samples and thus can use the Paired samples option.

  • Select Paired samples

  • Select OK

The next dialog, Create Copy Number from Pairs, asks you to choose the column shared by each pair and the column that identifies the baseline category (Figure 2).

Figure 2. Creating copy number from pairs

  • Select 3. Tumor for Column

  • Select N for Baseline category

  • Select 4. SubjectID for the Column to match sample pairs

  • Select OK

This will pair samples based on 4. SubjectID, and set the baseline sample as the sample in the pair with a value of N in the 3. Tumor column. The spreadsheet produced (Figure 3) has a row for each tumor sample. In this tutorial, columns 7+ include copy number estimates for each marker. Column 1-6 are identical to the source spreadsheet.

Figure 3. Viewing the paired copy number spreadsheet

hashtag
Create Copy Number from Reference Baseline

Alternatively, if paired samples are not available or appropriate, the Unpaired samples option can be selected in the Copy Number Creation dialog (Figure 1). Selecting this option opens the Unpaired Copy Number dialog (Figure 4).

Figure 4. Viewing the unpaired copy number dialog

There are several options for creating a reference baseline. First, you can use an existing reference file. These may be distributed by the manufacturer of your array, such as Affymetrix or Illumina, or previously created using Partek Genomics Suite from a set of normal samples. Second, you can use the reference file distributed by Partek. Third, you can choose all the samples from a separately imported spreadsheet. Fourth, you can choose a subset of the samples within the current spreadsheet to pool to create a reference.

In each case, every sample in your spreadsheet will be compared to the referece and two spreadsheets will be generated, a copy number spreadsheet and an allele ratio spreadsheet.

For more information about using unpaired samples in copy number calculations, please consult our Unpaired Copy Number Estimation white paper.

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Detecting regions with copy number variation

Starting with copy number estimates for each marker (either taken directly from the vendor’s input file or calculated previously), the next step is to create a list of regions where adjacent markers share the same copy number.

hashtag
Choosing a method for copy number detection

There are two algorithms available for copy number region detection: Genomic Segmentation and Hidden Markov Model (HMM). Both algorithms look for trends across multiple adjacent markers. The genomic segmentation algorithm identifies breakpoints - changes in copy number between two neighboring regions. The HMM algorithm looks for discrete changes of whole number copy number states (e.g., 0, 1, 2 … with no upper limit) and will find regions with those numbers of copies. Therefore, the HMM model performs better in cases of homogeneous samples such as clinical syndromes with underlying copy number aberrations. Genomic segmentation is preferable for heterogeneous samples such as cancer because tumor biopsies often contain “contaminating” healthy tissue and a tumor can have cells with different genomic aberrations.

hashtag
Detecting amplifications and deletions with Genomic Segmentation

The number of copies of each marker created in the previous step will be used to detect the genomic regions with copy number variation, i.e., to identify amplifications and deletions across the genome.

  • Select the IC_IntensitiesSNP6pairedcopynumber spreadsheet in the Analysis tab

  • Select Detect Amplifications and Deletions from the Copy Number Analysis section of the workflow (Figure 1)

Figure 1. Invoking Detect Amplifications and Deletions

The Detect Amplifications and Deletions dialog will give you the option to choose Genomic Segmentation or HMM Region Detection (Figure 2).

Figure 2. Select a method for detecting amplifications and deletions

  • Select Genomic Segmentation

  • Select OK

The Genomic Copy Number Segmentation dialog gives options for setting segmentation parameters and the configuring the region report (Figure 3).

Figure 3. Configuring the Genomic Copy Number Segmentation dialog

  • Set Minimum genomic markers to 50

  • Leave the rest of the parameters set to default values as shown (Figure 3)

  • Select OK

The Genomic Segmentation task is divided into two steps. In the first step, each region is compared to an adjacent region to determine whether both have the same average copy number and whether a breakpoint can be inserted. This is determined by first using a two-sided t-test to compare the average intensities of adjacent regions and then checking whether the corresponding cut-off p-value is below the specified P-value threshold. The genomic size of a region is defined by the number of genomic markers in the region, Minimum genomic markers, while the magnitude of the significant difference between two regions is controlled by Signal to noise, which can be thought of as the difference in copy numbers between the regions. If the t-test is significant, the copy number of the region differs significantly from its nearest neighbors. However, a second step is needed to detemine whether the difference is due to amplificaiton or deletion. In this second step, two one-sided t-tests are used to compare the mean copy number in the region with the expected diploid copy number. For a detailed explanation of the genomic segmenetation procedure, please consult our Genomic Segmentation white paper. For more detailed information about fine-tuning the parameters of your copy number analysis, please consult our guide, Optimizing Copy Number Segmentationarrow-up-right.

The resulting spreadsheet, segmentation, shows one row per genomic region per sample (Figure 4). The columns provide the following information:

1-4: Genomic location of the region

5. Sample ID

6. Description of the copy number change

7. The length of the region (in base pairs)

8. The number of markers in the region

9. Markers density in the region (region length in base pairs divided by the number of markers)

10. Geometric mean of the copy number of all the markers in the region

11. Minimum p-value of the one-sided t-tests of the difference of the copy number in column 10 vs. the diploid range

Figure 4. Viewing the segmentation spreadsheet

If desired, you can use Merge Adjacent Regions under Tools in the main toolbar to combine similar regions.

hashtag
Visualizing regions of interest

Individual regions of interest can be visualized using Chromosome View.

  • Right-click a row header in the segmentation spreadsheet

  • Select Browse to location from the pop-up menu

Alternatively, you can visualize results at the whole chromosome level.

  • Select the segementation spreadsheet

  • Select Chromosome View from the QA/QC section of the workflow

The Genomic Segementation track displays the segmentation results (Figure 5). Each line in the track represents a sample. Amplified, deleted, and unchanged regions are shown in red, blue, and white, respectively. The Profile track now also includes information from the segmentation spreadsheet for the selected sample.

Figure 5. Segmentation results shown as regions of amplification and deletion in each sample

hashtag
Analyzing shared regions of copy number variation

Amplified and deleted regions in each sample have been detected, we can compare the regions across multiple samples to detect copy number changes that are shared by multiple samples.

  • Select Analyze detected segments from the Copy Number Analysis section of the workflow

The Analyze Segments task (Figure 6) can test for associations between copy number variations and sample categories using the χ2 test. In this tutorial, all pairs share the sample phenotype, so we will not test for associations.

alt text

Figure 6. Viewing the Analyze segments dialog

  • Leave all boxes unchecked

  • Select OK to run the Analyze Segements task

The task generates a new spreadsheet, summary (segment-analysis) (Figure 7), with one region per row. The columns provide the following information:

1-4. Genomic locations of the regions

5. Total number of samples

6-7. Number of samples with amplifications and the average amplified copy number, respectively

8-9. Number of samples with deletions and the average deleted copy number, respectively

10. Total number of samples with copy number abberations

11-12. Number of samples with no change in copy number and the average copy number in those samples, respectively

13. Number of markers in the region

14. Length of the region (in base pairs)

15+. Two columns per sample - the average copy number in each sample as well as the copy number change status of the sample sample (e.g., amplified, deleted, unchanged, depending on the copy number and the threshold for unchanged defined in the Genomic Segementation dialog)

A "?" indicates that a region with the particular characterisitic does not exist or cannot be computed. For example, if a region is not amplified in any of the samples, the average amplified copy number will be shows as "?". This list may be filtered to contain only regions that meet user-specified criteria as discussed in the next section of the tutorial.

Figure 7. Viewing the results of Analyze Detected Segments

hashtag
Visualizing shared regions of copy number variation

To get an overiew of the common abberations in the group of samples over the entire genome we can use View Detected Regions.

  • Select View Detected Regions

The View Detected Regions dialog (Figure 7) allows you to select the spreadsheet with genomic regions and choose between histogram and copy number classification plots.

Figure 8. View Detected Regions dialog

  • Select summary (segment-analysis) from the drop-down menu

  • Select View Histogram

  • Select OK

The plot will open in a new tab titled Karyogram View (Figure 8).

Figure 9. Viewing amplification and deletion histograms using Karyogram View

The Karyogram View shows each chromosome with red and blue histograms on either side corresponding to amplification and deletion, repsectively. The histogram height reflects the number of samples that share either amplification of deletion a that particular region. For example, the long arms of chromosomes 3 and 7 are amplified in the majority of samples and most samples share a deletion in the long arm of chromosome 4.

Mousing over the chromosome will give cytoband information, mousing over the histogram will give the number of shared regions at each position and the number of samples sharing the type of variation. Both the menu and display may be used to control which chromosomes are displayed; left-click in the menu to toggle a chromosome on/off and right click in the menu or graph to show only that chromosome.

Alternatively, we can use the Copy Number Classification plot to get a more sample-centric view.

  • Select View Detected Regions

  • Select View Copy Number Classification

  • Select OK

The Copy Number Classificaiton also utilizes Karyogram View to provides an overview of all the samples and the copy number of regions on each chromosome (Figure 9).

Figure 10. Viewing the Copy Number Classification plot

Each sample is drawn as a separate column next to the chromosome. Amplified regions are depicted in red, deleted regions in blue, and regions with no copy number change in white. Sample names are given accross the top of each column. For greater detail, try viewing fewer chromosomes.

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Tutorials

Partek Genomics Suite tutorials provide step-by-step instructions using a supplied data set to teach you how to use the software’s tools. Upon completion of each tutorial, you will be able to apply your knowledge in your own studies.

  • Gene Expression Analysis

  • Gene Expression Analysis with Batch Effects

Differential Methylation Analysis
Partek Pathway
Gene Ontology Enrichment
RNA-Seq Analysis
ChIP-Seq Analysis
Survival Analysis
Model Selection Tool
Copy Number Analysis
Loss of Heterozygosity
Allele Specific Copy Number
Gene Expression - Aging Study
miRNA Expression and Integration with Gene Expression
Promoter Tiling Array
Human Exon Array
NCBI GEO Importer

Loss of Heterozygosity

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

file-pdf
1MB
Analyzing Loss of Heterozygosity.pdf
PDF
arrow-up-right-from-squareOpen

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Promoter Tiling Array

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

file-pdf
1MB
Analysis of a Tiling Regulation Study.pdf
PDF
arrow-up-right-from-squareOpen

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Allele Specific Copy Number

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

file-pdf
1MB
Allele Specific Copy Number Analysis.pdf
PDF
arrow-up-right-from-squareOpen

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

NCBI GEO Importer

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

file-pdf
965KB
Import GEO Experiment.pdf
PDF
arrow-up-right-from-squareOpen

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Gene Expression - Aging Study

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

file-pdf
2MB
Gene Expression Analysis of an Aging Study Using Illumina Microarray Technology.pdf
PDF
arrow-up-right-from-squareOpen

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Alt-Splicing Analysis of Exon Array

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Gene-level Analysis of Exon Array

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Importing Human Exon Array

This document was developed for Partek Genomics Suite version 6.6 software. Documentation for Partek Genomics Suite version 7.0 software is in development and will replace this document.

hashtag
Additional Assistance

If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

Gene Expression Analysis

This tutorial will illustrate:

  • Importing Affymetrix CEL filesarrow-up-right

  • Adding sample informationarrow-up-right

  • Exploring gene expression dataarrow-up-right

Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

hashtag
Description of the Data Set

Down syndrome is caused by an extra copy of all or part of chromosome 21; it is the most common non-lethal trisomy in humans. At the time of the study used in this tutorial, conflicting reports had thrown into doubt whether individuals with Down syndrome have dysregulation of gene expression throughout the genome or primarily in genes from chromosome 21. To address this question, Affymetrix GeneChip™ Human U133A arrays were used to assay 25 samples taken from 10 human subjects, with or without Down syndrome, and 4 different tissues. The data revealed a significant upregulation of chromosome 21 genes at the gene expression level in individuals with Down syndrome; this dysregulation was largely specific to chromosome 21 and not a genome-wide phenomenon.

The raw data is available as experiment number GSE1397 in the .

Data and associated files for this tutorial can be downloaded using this link - (right-click the link and choose "Save Link As" to download the tutorial data).

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Performing hierarchical clustering

The gene list in spreadsheet Down_Syndrome_vs_Normal (A) can be used for hierarchical clustering to visualize patterns in the data.

  • Under the Visualization section in the Gene Expression workflow, select Cluster Based on Significant Genes

The Cluster Significant Genes dialog asks you to specify the type of clustering you want to perform.

  • Choose Hierarchical Clustering and select OK

  • Choose the Down_Syndrome_vs_Normal (A) spreadsheet under the Spreadsheet with differentially expressed genes

  • Choose the Standardize – shift genes to mean of zero and scale to standard deviation of one under the Expression normalization panel (Figure 1)

This option will adjust all the gene intensities such that the mean is zero and the standard deviation is 1.

Figure 1. Configuring Hierarchical Clustering

  • Select OK to generate a Hierarchical Clustering tab (Figure 2)

Figure 2. Hierarchical Clustering of Down_Syndrome_vs_Normal (A)

The graph (Figure 2) illustrates the standardized gene expression level of each gene in each sample. Each gene is represented in one column, and each sample is represented in one row. Genes with no difference in expression have a value of zero and are colored black. Genes with increased expression in Down syndrome samples have positive values and are colored red. Genes with reduced expression in Down syndrome samples have negative values and are colored green. Down syndrome samples are colored red and normal samples are colored orange. On the left-hand side of the graph, we can see that the Down syndrome samples cluster together.

For more information on the methods used for clustering, you can refer to Chapter 8: Hierarchical & Partitioning Clustering in Help > User’s Manual. For a tutorial on configuring the clustering plot, please refer to

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Adding gene annotations

During data importation, the GeneChip annotation file was linked to the imported data. This linked annotation information can be added as new columns to the ANOVA or gene list spreadsheets. For example, we can add additional annotation to the gene list we created from the ANOVA results as follows:

  • In the Down_Syndrome_vs_Normal (A) spreadsheet, right click on the second column header 2. ProbesetID and select Insert Annotation from the pop-up menu (Figure 3)

Figure 1. Inserting an annotation

  • Select Chromosomal Location under the Column Configuration panel (Figure 4). Leave everything else as default

  • Select OK

Figure 2. Adding Chromosomal Location annotation

Interestingly, of the 23 genes of the Down_Syndrome_vs_Normal (A) spreadsheet, 20 genes are located on chromosome 21. This suggests that the gene expression changes associated with Down syndrome observed in this study are primarily located on chromosome 21, not distributed throughout the genome, an important finding of this study.

To save changes to the spreadsheet, select the Save Active Spreadsheet icon ().

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Gene Expression Analysis with Batch Effects

This tutorial will will illustrate:

  • Importing the data set

  • Adding an annotation link

  • Exploring the data set with PCA

Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

hashtag
Description of the Data Set

The data for this tutorial is taken from an experiment that examined the effects of four treatment conditions at two time points on estrogen receptor-positive breast cancer cell lines in vitro. Each treatment/time combination has two replicates and there are two control samples for a total of eighteen samples. Gene expression analysis was performed using the Affymetrix GeneChip_®_ Human U95A array. Values are transformed to log base 2 scale by f(x) = log2(x+1).

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Adding an annotation link

While many types of data sets are automatically linked with appropriate annotation files upon import, if this does not occur, a spreadsheet can be manually linked with an annotation file.

  • Right-click Breast_Cancer.txt in the spreadsheet tree

  • Select Properties (Figure 1)

Figure 1. Selecting file properties for a spreadsheet

Configure the Configure Genomic Properties as shown (Figure 2) with the following steps:

  • Select Gene Expression from the Choose the type of genomic data drop-down menu

  • Select Feature in column label

  • Select Browse...

Figure 2. Configure the genomic properties dialog as shown

There is now an * after the spreadsheet name in the spreadsheet tree. This indicates an unsaved change has been made to the spreadsheet.

  • Select () to save the changes

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

GO enrichment using a gene list

Gene Ontology (GO) enrichment analysis compares a gene list to lists of genes associated with biological processes, cellular compartments, and molecular functions to provide biological insights. Once a list of genes has been created, it is possible to see which GO terms the genes are associated with and whether any GO terms are significantly enriched in the gene list.

  • Select the E2 vs. Control spreadsheet from the spreadsheet tree

  • Select Gene Set Analysis from the Biological Interpretation section of the Gene Expression workflow

  • Select Next > to continue with GO Enrichment

  • Select Next > to continue with 1/E2_vs_Control (E2 vs. Control)

  • Select Next > to continue with default parameter settings

  • Select Next > to continue with the default mapping file

A new spreadsheet 1 (GO-Enrichment.txt) will open as a child spreadsheet of E2 vs. Control (Figure 1).

Figure 1. GO Enrichment results spreadsheet

GO terms are shown in rows and are sorted by ascending enrichment p-value.

To visualize the results, we can launch the Gene Ontology Browser.

  • Select View from the main tool bar

  • Select Gene Ontology Browser

The Gene Ontology Browser will open in a new tab (Figure 2).

Figure 2. Viewing GO enrichment results in the Gene Ontology Browser

The bar chart shows the GO terms with the highest enrichment scores for the gene list.

To learn more about GO enrichment and using the Gene Ontology Browser, please consult the tutorial.

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Open a zipped project

The zipped project file contains several prepared files used in this analysis as well as the annotation information for the BeadChip. The zipped file also contains a Partek project file (.ppj).

  • After downloading the file, go to File > Import > Zipped project... and browse to GO_Enrichment.zip on your local drive

Partek Genomics Suite will automatically unzip the file, read the .ppj file, open and annotate all spreadsheets (Figure 1). The parent spreadsheet (GSE8479-AVGSignal) contains the original intensity data. The first child spreadsheet (ANOVAResults) contains the results of differential gene expression analysis from a 3-way ANOVA. The second child spreadsheet (Gene_List.txt) is a list of significantly differentially expressed genes. When working with your own data, you will need to detect differentially expressed genes and create a gene list yourself.

Figure 1. Viewing the Gene List spreadsheet

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Optional: GC wave correction for Affymetrix CEL files

To normalize for GC content, use the custom import settings during import. Select Customize... and under the Algorithm tab of the Advanced Import dialog, check the Adjust for GC content box (Figure 1).

Figure 1. Adjusting for GC content during CEL file import

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Human Exon Array

  • Importing Human Exon Array

  • Gene-level Analysis of Exon Array

  • Alt-Splicing Analysis of Exon Array

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Hierarchical clustering using a gene list

hashtag
Opening a gene list as a child spreadsheet

Gene lists can be visualized and their ability to distinguish samples evaluated using a hierarchical clustering heat map. Because of the batch effect in this data set, we will perform hierarchical clustering using batch-corrected intensity values. To do this, we need to open the fourtreatments list of differentially expressed genes as a child spreadsheet of the batch-remove spreadsheet

Differential Methylation Analysis

Illumina’s MethylationEPIC array interrogates the methylation status of over 850,000 cytosines in the human genome. Because the MethylationEPIC array is closely related to the Infinium HumanMethylation450 BeadChip, the steps presented in this document can be applied to either platform.

This tutorial illustrates how to:

Optional: Add UCSC CpG island annotations

Partek Genomics Suite software can view annotation .BED files as tracks in the Genome Viewer. We can add a CpG islands track to the Genome Viewer using the UCSC Genome Browser CpG islands annotation.

  • Go to

  • Select Table Browser under Tools in the main command bar of the webpage (Figure 1)

Figure 1. Navigating to the Table Browser at the UCSC Genome Browser website

Adding sample information

Twenty-five CEL files (samples) have been imported into Partek Genomics Suite. Sample information must be added to define the grouping and the goals of the experiment.

  • Select Add Sample Attributes in the Import section of the Gene Expression workflow panel

  • Choose the option Add Attributes from an Existing Column

Performing pathway enrichment

Before performing pathway enrichment, we need to create a gene list from the ANOVA results.

hashtag
Creating a list of significant genes

  • Select Gene Expression from the workflows drop-down menu

Optional: Import a Partek Project from Genome Studio

An Illumina-type project file (.bsc format) can be imported in Illumina’s GenomeStudio® (please note: to process 450K chips, you need GenomeStudio 2010 or newer) and exported using the Partek Methylation Plug-in for GenomeStudio. For more information on the plug-in, please see the . The plug-in creates six files: a Partek project file (*.ppj), an annotation file (*.annotation.txt), files containing intensity values (*.fmt and *.txt), and files containing β-values (*.fmt and *.txt) (Figure 1).

Figure 1. Output of Partek Methylation Plug-in for GenomeStudio

To load all the files automatically, open the .ppj file as follows.

  • Select Methylation from the Workflows drop-down menu

Gene Ontology Enrichment

Gene ontology (GO), enrichment analysis has been incorporated into the gene expression, microRNA expression, exon, copy number, tiling, ChIP-Seq, RNA-Seq, miRNA-Seq and methylation workflows. The Gene Ontology Consortium provides an excellent overview for new and experienced users of GO analysis. In brief, the common nomenclature of genes and gene products has been used to group genes into a functional hierarchy. This enables analyses to be compared across all types of genomic data, even data from different species. A broader understanding of experimental results is possible by grouping genes of interest into biological processes, cellular components and molecular functions of the genes. With the GO enrichment tool in Partek® Genomics Suite® you can take a list of genes (e.g. significantly differentially expressed genes) and see how they group in the functional hierarchy. This is analogous to going from looking at individual trees (genes) to see how the whole forest (gene ontology) is organized.

This tutorial illustrates how to:

Partek Pathway

Partek Pathway provides a visualization tool for pathway enrichment spreadsheets utilizing the KEGG database. This tutorial will illustrate:

Perform data quality analysis and quality control

Principal component analysis (PCA) can be performed to visualize clusters in the methylation data, but also serves as a quality control procedure; outliers within a group could suggest poor data quality, batch effects, mislabeled samples, or uninformative groupings.

  • Select PCA Scatter Plot from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Scatter Plot tab

  • Select 2. Cell Type for Color by

Obtain methylation signatures

The significant CpG loci detected in the previous step actually form a methylation signature that differentiates between LCLs and B cells. We can build and visualize this methylation signature using clustering and a heat map.

  • Select the LCLs_vs_Bcells_CpG_Islands spreadsheet in the spreadsheet pane on the left

  • Select Cluster Based on Significant Genes from the Visualization panel of the Illumina BeadArray Methylation workflow

Creating a list of enriched regions

In this section, we will create a list of peaks significantly enriched in the ChIP sample versus the control sample.

  • Select Create a list of enriched regions from the Peak Analysis section of the ChIP-Seq workflow

  • Select Specify New Criteria (Figure 1)

Figure 1. List creator for ChIP-Seq data allows you to create lists using preset or custom criteria

Importing the data set

The original experiment is listed on the Gene Expression Omnibus as GSE848; however, this tutorial only uses a subset of the original experiment and should be downloaded from the Partek website tutorial page, .

  • Download the zipped project folder, Breast_Cancer-GE.zip

  • Unzip the project folder to C:/Partek Training Data/ or a directory of your choosing

This location should be easily accessible. The unzipped Breast_Cancer-GE project folder and a zipped annotation file will be added to the selected directory.

Exploring the data with PCA

Principal component analysis (PCA) is a way to explore the overall similarity between samples, visualize possible groupings within the data set, and detect outliers.

  • Select PCA Scatter Plot from the QA/QC

Figure 1. Principal component analysis showing total allele intensities of normal (blue) and cancer (red) samples. Each dot represents a single sample.

Each dot on the plot corresponds to a single sample and can be thought of as a summary of all normalized marker intensities for the sample. The first categorical column is used to color the plot; here, tumor samples are shown in red and normal samples are shown in blue.

miRNA Expression and Integration with Gene Expression

This tutorial outlines how to analyze miRNA expression data in Partek Genomics Suite and outlines how miRNA expression data can be integrated with mRNA expression data from gene expression microarrays.

This tutorial illustrates how to:

Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on

Importing Copy Number Data

This tutorial uses a spreadsheet generated after data import, but we will illustrate the steps used to import the data in this section.

  • Select Copy Number from the Workflows drop-down menu

  • Select Import Samples from the Copy Number workflow

The import dialog will open (Figure 1).

Identifying differentially expressed genes using ANOVAarrow-up-right
Creating gene lists from ANOVA resultsarrow-up-right
Performing hierarchical clusteringarrow-up-right
Adding gene annotationsarrow-up-right
Our support pagearrow-up-right
Gene Expression Omnibusarrow-up-right
Gene Expression Analysis tutorial dataarrow-up-right
our support pagearrow-up-right
Detect differentially expressed genes with ANOVA
Removing batch effects
Creating a gene list using the Venn Diagram
Hierarchical clustering using a gene list
GO enrichment using a gene list
Our support pagearrow-up-right
our support pagearrow-up-right
our support pagearrow-up-right

Perform data quality analysis and quality control

  • Detect differentially methylated loci

  • Create a marker list

  • Filter loci with the interactive filter

  • Obtain methylation signatures

  • Visualize methylation at each locus

  • Perform gene set and pathway analysis

  • Detect differentially methylated CpG islands

  • Optional: Add UCSC CpG island annotations

  • Optional: Use MethylationEPIC for CNV analysis

  • Optional: Import a Partek Project from Genome Studio

  • Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support pagearrow-up-right to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Description of the Data Set

    The data set accompanying this document consists of sixteen human samples processed by Illumina MethylationEPIC arrays. The data set is taken from a study of DNA methylation in human B cells and B cells infected with Epstein-Barr virus (EBV).

    Infecting B cells with EBV in vitro transforms them, making them capable of indefinite growth in vitro. These immortalized cell lines are referred to as lymphoblastoid cell lines (LCLs). LCLs behave similarly to activated B cells, making them useful for expanding T cells in vitro. Because EBV is a carcinogen and immortalized cell growth is a hallmark of cancer, examining the effects of EBV transformation on B cell DNA methylation might shed light on the roles of DNA methylation in tumor development.

    The data files can be downloaded from Gene Expression Omnibus using accession number GSE93373arrow-up-right or by selecting this link - Differential Methylation Analysis data setarrow-up-right. To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Import and normalize methylation data
    Annotate samples

    Perform GO enrichment analysis

    Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support pagearrow-up-right to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Description of the Data Set

    This tutorial will provide a step-by-step guide to performing GO enrichment analysis. The data set used is based on 51 subjects run on the Illumina Human Ref-8 BeadChip platform. Twenty-six of the subjects were categorized as "Young" with an age range of 18 to 28. The other 25 subjects were categorized as "Old" with an age range of 65 to 84. Skeletal muscle, a type of striated muscle tissue, was obtained via biopsy from each subject. The total RNA was extracted from the skeletal cells, prepared and run on the BeadChips producing the data that is used for this tutorial.

    The paper this data is based on can be found at PLOSarrow-up-right.

    Data and associated files for this tutorialarrow-up-right can be downloaded by going to Help > On-line Tutorials on the main menu toolbar within the Partek Genomics Suite software. Download the zipped file and store it on your local disk drive. There is no need to manually unzip the directory.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    websitearrow-up-right
    Open a zipped project
    Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support pagearrow-up-right to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Description of the Data Set

    The pathway enrichment analysis illustrated in this user guide uses the miRNA Expression and Integration with Gene Expression data setarrow-up-right. This data set is also used in our miRNA data analysis tutorial.

    Download and save the zipped project folder in an accessible location on your computer. The project folder for the tutorial will be created in the same location the zipped project folder is stored.

    hashtag
    Importing the Data Set

    Import the project using the zipped project importer in Partek Genomics Suite.

    • Select File from the main toolbar

    • Select Import

    • Select Zipped Project...

    • Choose the zipped project folder, miRNA_tutorial_data

    The project will open with three spreadsheets:

    1. Affy_miR_BrainHeart_intensities,

    2. Affy_HuGeneST_BrainHeart_GeneIntensities,

    3. ANOVAResults gene.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Performing pathway enrichment
    Analyzing pathway enrichment in Partek Genomics Suite
    Analyzing pathway enrichment in Partek Pathway
    Hierarchical Clustering Analysis
    our support pagearrow-up-right
    Gene Ontology Enrichment
    our support pagearrow-up-right

    Select Illumina BeadArray Methylation from the Methylation sub-workflows section

  • Select Import Illumina Methylation Data from the Import section

  • Select Load a project following Illumina GenomeStudio export from the Load Methylation Data dialog

  • hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    plug-in user guide
    our support pagearrow-up-right
    Select HG_U95Av2.na36.annot.csv from the microarray libraries folder
  • Select Set Column

  • Select Gene Symbol from the Choose column containing gene symbol/microRNA name dialog

  • Select Homo sapiens and hg19 from the Species and Genome Build drop-down menus

  • our support pagearrow-up-right
    our support pagearrow-up-right
    our support pagearrow-up-right
    Select fourtreatments from the spreadsheet tree
  • Select () to close the spreadsheet

  • Select 1-removeresult (batch-remove) from the spreadsheet tree

  • Select File from the main tool bar

  • Select Open as child...

  • Select fourtreatments using the file browser

  • The fourtreatments spreadsheet will open as a child spreadsheet of batch-remove (Figure 1).

    Figure 1. The fourtreatments spreadsheet is open as a child spreadsheet of bath-remove. Visualizations performed using fourtreatments will pull intensity values from batch-remove.

    Visualizations performed using the fourtreatments spreadsheet will now use intensity values from the batch-remove spreadsheet.

    hashtag
    Hierarchical clustering using a gene list

    To invoke hierarchical clustering, follow the steps below.

    • Select Cluster Based on Significant Genes from the Visualization section of the Gene Expression workflow

    • Select Hierarchical Clustering

    • Select OK

    • Select 1-removeresult/1 (fourtreatments) from the drop-down menu

    • Select Standardize for Expression normalization (Figure 2)

    Figure 2. Configuring the Cluster the significant genes dialog

    • Select OK

    The hiearchical clustering heat map will open in a new tab (Figure 3).

    Figure 3. Hierarchical clustering of genes with significantly different expression across the treatment groups

    Genes without changes in expression are given a value of zero and are colored black. Up-regulated genes have positive values and are displayed in red. Down-regulated genes have negative values and are displayed in green. Each sample is represented in a row while genes are represented as columns. Dendrograms illustrate clustering of samples and genes. To learn more about configuring the hierarchical clustering heat map, see the Hierarchical Clustering Analysis user guide.

    For detailed information about the methods used for clustering, refer to the Partek Manual Chapter 8: Hierarchical & Partitioning Clustering.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    • Configure the Table Browser page as shown (Figure 2)

    Figure 2. Configuring the Table Browser to output CpG Islands BED file

    • Set assembly to Feb. 2009 (GRCh37/hg19)

    • Set group to Regulation

    • Set track to CpG Islands

    • Set table to cpgIslandExt

    • Set output format to BED

    • Set output file to cpg.bed

    • Select get output

    The Output cpgIslandExt as BED page will open.

    • Select get BED to download a compressed folder containing the BED file

    • Unzip the file using 7-Zip, WinRAR, or a similar program of your choice to a location you will be able to find

    Next, we can import the BED file into Partek Genomics Suite.

    • Select Genomic Database... under Import under File in the main toolbar in Partek Genomics Suite (Figure 3)\

    Figure 3. Importing the CpG Islands map BED file

    • Select the file cpg.bed

    The BED file will open as a new spreadsheet.

    • Change the spreadsheet name to UCSC CpG Island Annotation and save it

    For this region list, you can also calculate the average beta values for the probes in each island per sample and detect differential methylated CpG islands regions. Detailed information on how to get average beta value for each CpG can be found in the Determining the average values for a region list section of Starting with a list of genomic regions.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    UCSC Genome Browserarrow-up-right
    Select OK to open the Sample Information Creation dialog

    In this tutorial, the file name (e.g., Down Syndrome-Astrocyte-748-Male-1-U133A.CEL) contains the information about a sample and is separated by hyphens (-). Choosing to split the file name by delimiters will separate the categories into different columns

    • In the Sample Information panel, specify the column labels (Labels 1-4) as Type, Tissue, Subject, and Gender, set each as categorical, and set the other columns as skip (Figure 1). Select OK

    Figure 1. Configuring the Sample Information Creation dialog

    • A dialog window asking if you would like to save the spreadsheet with the new sample attribute will appear. Select Yes

    • Make column 5. (Subject) random by right-clicking on the column header and selecting Properties from the pop-up menu (Figure 2).

    Figure 2. Changing column properties

    • Select the Random Effect check box from the Properties dialog (Figure 3) then select OK.

    Figure 3. Setting column to Random Effect

    The column 5. (Subject) will now be colored red, indicating that it is a random effect.

    • To save changes to the spreadsheet, select the Save Active Spreadsheet icon (). Spreadsheets with unsaved changes have an asterisk next to their name in the spreadsheet tree.

    Note: More details on Random vs. Fixed Effects can be found later in this tutorial under the section Identifying differentially expressed genes using ANOVA.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Select the ANOVAResults gene spreadsheet

  • Select Create Gene List from the Analysis section of the Gene Expression workflow

  • Select Brain vs. Heart from the List Manager dialog (Figure 1) leaving the other options as defaults

  • Select Create

  • Figure 1. Configuring the list manager dialog

    A new list of 420 genes will be created as a child spreadsheet of 1 (ANOVAResults gene).

    • Select Close to exit the List Manager dialog

    hashtag
    Performing pathway enrichment analysis

    • Select the new gene list, Brain vs. Heart

    • Select Pathway Analysis from the Biological Interpretation section of the Gene Expression workflow

    • Select Next > to continue with Pathway Enrichment

    Pathway Enrichment is the only option available for a gene list. To learn more about the other option, Pathway ANOVA, see the Gene Ontology ANOVA tutorial, which follows the same procedure as Pathway ANOVA.

    • Select Next > to continue with the Brain vs. Heart spreadsheet

    • Select Next > to continue with default settings for Fisher's Exact test

    • Select Next > to continue with Homo sapiens and 4. Gene Symbol as parameters

    Partek Pathway will now open. If this is your first time using Partek Pathway on the selected species, Partek Pathway will automatically download the KEGG information needed for the analysis. Once the pathway enrichment calculation has been performed, a new spreadsheet, Pathway-Enrichment.txt, will be added as a child spreadsheet of Brain vs. Heart and Partek Pathway will launch (Figure 2).

    Figure 2. Partek Pathway displaying the most significantly enriched pathway from the gene list

    The pathway currently displayed has the highest enrichment score. Both Partek Genomics Suite and Partek Pathway offer options for analyzing the results of pathway enrichment analysis. The next two sections of the user guide will show the options for analyzing the results of pathway enrichment in each program.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Select 3. Gender for Size by

  • Select () to enable Rotate Mode

  • Left click and drag to rotate the plot and view different angles (Figure 1)

  • Each dot of the plot is a single sample and represents the average methylation status across all CpG loci. Two of the LCLs samples do not cluster with the others, but we will not exclude them for this tutorial.

    Figure 1. Principal components analysis (PCA) showing methylation profiles of the study samples. Each sample is represented by a dot, the axes are first three PCs, the number in parenthesis indicate the fraction of variance explained by each PC. The number at the top is the variance explained by the first three PCs. The samples are colored by levels of 2. Cell Type

    Next, distribution of beta values across the samples can also be inspected by a box-and-whiskers plot.

    • Select Sample Box and Whiskers Chart from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Box and Whiskers tab

    Each box-and-whisker is a sample and the y-axis shows beta-value ranges. Samples in this data set seem reasonably uniform (Figure 2).

    Figure 2. Box and whiskers plot showing distribution of M-values (y-axis) across the study samples (x-axis). Samples are colored by a categorical attribute (Cell Type). The middle line is the median, box represents the upper and the lower quartile, while the whiskers correspond to the 90th and 10th percentile of the data

    An alternative way to take a look at the distribution of beta-values is a histogram.

    • Select Sample Histogram from the QA/QC section of the Illumina BeadArray Methylation workflow to bring up a Histogram tab

    Again, no sample in the tutorial data set stands out (Figure 3).

    Figure 3. Sample histogram. Each sample is a line, beta values are on the horizontal axis and their frequencies on the vertical axis. Two peaks correspond to two probe types (I and II) present on the MethylationEPIC array. Sample colors correspond to a categorical attribute (Cell Type)

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Select Hierarchical Clustering for Specify Method (Figure 1)

    Figure 1. Selecting Heirarchical Clustering for clustering method

    • Select OK

    • Verify that LCLs_vs_Bcells_CpG_Islands is selected in the drop-down menu

    • Verify that Standardize is selected for Expression normalization (Figure 2)

    Figure 2. Selecting spreadsheet and normalization method for clustering

    • Select OK

    The heat map will be displayed on the Hierarchical Clustering tab (Figure 3).

    Figure 3. Hierarchical clustering with heat map invoked on a list of significant CpG loci

    The experimental groups are rows, while the CpG loci from the LCLs vs B cells spreadsheet are columns. Methylation levels are compared between the LCLs and B cells groups. CpG loci with higher methylation are colored red, CpG loci with lower methylation are colored green. LCLs samples are colored orange and B cells samples are colored red in the dendrogram on the the left-hand side of the heat map.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Configure the new criteria as shown (Figure 2).

    • Name the criteria p-value filtered

    • Select 1/regions (peaks) from the Spreadsheet drop-down menu

    • Select 11. p-value(Sample ID vs. mock) from the Column drop-down menu

    • Select significant with FDR of from the include p-values drop-down menu with a value of 0.05

    Figure 2. Creating a criteria that includes regions significantly enriched in ChIP vs. mock

    • Select OK to add the criteria to the criteria list (Figure 3)

    Figure 3. New criteria are added to the criteria list

    • Select Save

    • Select p-value filtered from the list of criteria (Figure 4)

    Figure 4. Choosing criteria to save as lists

    • Select OK

    The new spreadsheet will open (Figure 5).

    Figure 5. Spreadsheet with regions that are significantly enriched in the ChIP sample vs. control

    Other List Creator operations like the Venn Diagram, Union (Or), and Intersection (And) of the lists could be used to create different lists of enriched peaks. For example, you could filter on the intersection between Strand Separation FDR of 0.05 and Peaks not in mock or filter by scaled fold change or apply a minimum number of reads per million. The choice of what peaks you want to consider for downstream analysis depends on the goals and details of your experimental design.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

  • Unzip the included annotation file, HG_U95Av2.na32.annot.rar

  • Move the annotation file, HG_U95Av2.na32.annot, to the microarray libraries folder

  • By default, the microarray libraries folder will be located at C:/Microarray Libraries, but the location may vary depending on your operating system and configuration.

    • Open Partek Genomics Suite

    • Select () from the main command bar

    • Navigate to the tutorial folder, Breast_Cancer-GE

    • Select Breast_Cancer.txt

    • Select Open (Figure 1)

    Figure 1. Opening a data file. The red Partek Genomics Suite icon is shown next to the data file (FMT file format)

    The spreadsheet will open as 1 (Breast_Cancer.txt) (Figure 2).

    Figure 2. Breast_Cancer.txt data file

    The summary at the bottom the spreadsheet shows there are 18 rows and 12,631 columns in the spreadsheet. The first column contains the Filename listing the GEO GSM number. This is also is an identifier for the microarray. Treatment, Time, and Batch are in columns 2, 3, and 4, respectively. Column 6 marks the beginning of the probesets. The data is log2 transformed.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Gene Expression Analysis with Batch Effectsarrow-up-right
    To better view the data, we can rotate the plot.
    • Select to activate Rotate Mode

    • Click and drag to rotate the plot

    Rotating the plot allows us to look for outliers in the data on each of the three principal components (PC1-3). The percentage of the total variation explained by each PC is listed by its axis label. The chart label shows the sum percentage of the total variation explained by the displayed PCs.

    We can see that the peripheral blood samples (normal) cluster together whereas the cancer tissue samples (tumor) are more dispersed and show considerable variability. This corresponds well with the known genomic variability of cancer cells.

    To view the similarity of paired normal and tumor samples from the same patient, we can connect dots by Subject ID.

    • Select 4. SubjectID from the Connect by drop-down menu in the upper right-hand corner of the plot tab

    Paired tumor and normal samples are now connected by lines, illustrating the range of differences between normal and tumor copy number in the data set (Figure 2).

    Figure 2. Lines connect paired tumor and normal samples

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Description of the data set

    The data set for this tutorial includes miRNA from 3 human brain samples and 3 heart samples quantified using the Affymetrix GeneChip miRNA 1.0 array. The same sample set was also processed on GeneChip Human Gene 1.0 ST arrays for mRNA expression.

    For this tutorial, the gene expression and miRNA expression studies have been analyzed and stored in Partek Genomics Suite project (ppj) format as miRNAmRNA integration. The project contains two Partek format files: Affy_miR_BrainHeart_intensities.fmt with the miRNA data and Affy_HuGeneST_BrainHeart_GeneIntensities.fmt with the analyzed mRNA data. There is also an ANOVA results spreadsheet open as a child spreadsheet of Affy_HuGeneST_BrainHeart_GeneIntensities.fmt.

    • Download the miRNA Expression and Integration with Gene Expression data setarrow-up-right and save it in an easily accessible location on your computer

    We can now open the project in Partek Genomics Suite.

    • Select File

    • Select Import

    • Select Zipped Project...

    • Select the miRNA_tutorial_data.zip zipped folder

    The project files will open in the Analysis tab (Figure 1).

    Figure 1. The miRNA tutorial data set

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Analyze differentially expressed miRNAs
    Integrate miRNA and Gene Expression data
    Our support pagearrow-up-right

    Figure 1. Viewing the Import Copy Number Samples dialog

    For Affymetrix arrays, Partek Genomics Suite can import CEL files with allele intensity values and calculate copy number estimates from these intensities. For Agilent, Illumina, NimbleGen, or Affymetrix .CHP files, Partek Genomics Suite can import files containing calculated copy numbers or log ratios.

    For this tutorial, we will not be importing CEL files.

    • Select Cancel to close the import dialog

    Later sections of this tutorial will address starting with copy number or log ratios and performing GC wave correction on Affymetrix CEL files.

    We can now open the tutorial data file.

    • Download the zipped tutorial data folder Overlapping Copy Number with LOHarrow-up-right

    • Unzip the files to an accessible directory

    • Select File from the main menu

    • Select Open...

    • Select the file IC_Intensities_SNP6.fmt

    The spreadsheet will open in the Analysis tab (Figure 2).

    Figure 2. Viewing the tutorial data set spreadsheet

    This spreadsheet was generated from the import of SNP6 CEL files and shows all 20 samples on rows. Columns 1-6 describe the samples with information such as file names, Subject ID, Gender, etc. The other columns are individual markers from the microarray with the log2 normalized intensities associated with each marker (marker labels are column headers). Opening the IC_Intensities_SNP6.fmt file is equivalent to importing the 20 sample files and adding sample attributes.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Filter loci with the interactive filter

    The list, LCLs vs B cells, includes differentially methylated loci for locations across the genome; however, in many cases we may want to focus on loci located in particular regions of the genome. To filter our list to include only regions of interest, we can use the annotations provided by Illumina and the interactive filter in Partek Genomics Suite.

    • Select LCLs_Vs_B_cells from the spreadsheet tree

    • Right-click on the Gene Symbol column

    • Select Insert Annotation (Figure 1)

    Figure 1. Adding an annotation column to the ANOVA results

    • Select the Add as categorical option

    • Select Relation_to_UCSC_CpG_Island (Figure 2)

    CpG islands are regions of the genome with an atypically high frequency of CpG sites. CpG islands and their surrounding regions (termed shelf and shore) include many gene promoters and altered methylation in these regions can have a disproportionate effect on gene expression. For example, hyper-methylation of promoter CpG islands is a common mechanism for down-regulating gene expression in cancer.

    Figure 2. Adding chromosome location to ANOVA results

    • Select OK to add Relation_to_UCSC_CpG_Island as a column in next to 3. Gene Symbol

    • Select () from the quick action bar to save the ANOVA-2way (ANOVA Results) spreadsheet with the added annotation

    Now, we can filter probes by their relation to CpG islands.

    • Select () from the quick action bar to invoke the interactive filter

    • Select 4. Relation_to_UCSC_CpG_Island for Column

    For categorical columns, the interactive filter displays each category of the selected column as a colored bar. For 4. Relation_to_UCSC_CpG_Island, each bar represents one of the categories of the UCSC annotation . To filter out a category, left-click on its bar. Right clicking on a bar will include only the selected category. A pop up balloon will show the category label as you mouse over each bar.

    • Right-click the Island column to filter out other columns (Figure 3)

    Figure 3. Using Interactive Filter tool to filter out probes by annotation. When pointed to a categorical column, the Interactive Filter tool summarises the content of the column by a column chart. Left-click to exclude a category (two columns were excluded, so they are grayed out), right-click to include only

    The yellow and black bar on the right-hand side of the spreadsheet panel shows the fraction of excluded cells in black and included cells in yellow. Right-clicking this bar brings up an option to clear the filter.

    Now that we have filtered out probes that are not in CpG islands, we will create a spreadsheet containing only these probes.

    • Right click on the LCLs vs. B cells spreadsheet in the spreadsheet tree panel (Figure 4)

    Figure 4. Cloning a filtered spreadsheet creates a new spreadsheet with only the included cells

    • Select Clone

    • Rename the new spreadsheet LCLs_vs_B_cells_CpG_Islands using the Clone Spreadsheet dialog

    • Select mvalues from the Create new spreadsheet as a child spreadsheet: drop-down menu (Figure 5)

    Figure 5. Renaming and configuring filtered spreadsheet

    • Select () from the quick action bar to save the filtered spreadsheet

    • Specify a name for the spreadsheet, we chose LCLs_vs_B_cells_CpG_Islands, using the Save File dialog

    • Select Save to save the spreadsheet

    You may want to save the project before proceeding to the next section of the tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Analyzing pathway enrichment in Partek Genomics Suite

    Pathway enrichment generates a results spreadsheet, Pathway-Enrichment.txt, visible in both Partek Genomics Suite (Figure 1) and in Partek Pathway.

    Figure 1. The pathway enrichment spreadsheet is visible in both Partek Genomics Suite (shown here) and Partek Pathway

    hashtag
    Contents of the pathway enrichment spreadsheet

    The spreadsheet includes 13 columns with information for each pathway represented in the source gene list.

    1. Pathway Name - the name of the KEGG pathway

    2. Database - the source database for the pathway annotation

    3. Enrichment score - the negative natural log of the enrichment p-value derived from the contingency table (Fisher's Exact test) or the Chi-squared test

    4. Enrichment p-value - the enrichment p-value derived from the contingency table (Fisher's Exact test) or the Chi-squared test

    5. % genes in pathway that are present - the percentage of genes from the pathway that are present in the source gene list

    6. Tissue score, 7. Replicate score, 8. Brain vs. Heart score - for each factor, interaction, and contrast in the ANVOA results spreadsheet, a separate score is calculated. This is derived form the negative log (base 10) of the average p-value for genes within the pathway for each factor. A high score indicates that the genes that fall into the pathway have a low p-value for the given factor.

    9. # genes in list, in pathway - number of genes from the list in the pathway

    10. # genes not in list, in pathway - number of genes from the pathway, not in the list

    11. # genes in list, not in pathway - number of genes in list, not in the pathway

    12. # genes, not in list, not in pathway - number of genes not in the pathway or the list that are included in KEGG database pathways for the species

    13. Pathway ID - KEGG pathway ID

    hashtag
    Tasks available in Partek Genomics Suite

    In Partek Genomics Suite, we can view several new options that are available for each pathway (row) in the Pathway-Enrichment.txt spreadsheet.

    • Right-click the row header of any row in the Pathway-Enrichment.txt spreadsheet (Figure 2)

    Figure 2. The Pathway-Enrichment.txt spreadsheet in Partek Genomics Suite

    The new options include:

    Export genes in pathway, which creates a child spreadsheet of Pathway-Enrichment.txt that contains all the genes from the selected pathway(s) (Figure 3). This new spreadsheet includes gene symbols and their pathway.

    Figure 3. Spreadsheet with all genes in pathway. Includes gene symbols and pathway.

    Export genes in list and in pathway, which creates a child spreadsheet of Pathway-Enrichment.txt that contains the genes from your list that are present in the selected pathway(s) (Figure 4). This new spreadsheet includes gene symbols and their pathway.

    Figure 4. Spreadsheet with genes only in list and pathway. Includes gene symbols and pathway.

    Create Gene List, which creates a new child spreadsheet of the ANOVA results spreadsheet that contains the genes from your list that are present in the selected pathway(s) (Figure 5). This new spreadsheet includes all information for each gene from the ANOVA results spreadsheet. However, this list does not indicate the pathway of each gene.

    Figure 5. Spreadsheet with genes in list and pathway. Includes all information from ANOVA results for each gene.

    Show Pathway, which opens the selected pathway map in Partek Pathway.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    RNA-Seq Analysis

    RNA-Seq is a high-throughput sequencing technology used to generate information about a sample’s RNA content. Partek Genomics Suite offers convenient visualization and analysis of the high volumes of data generated by RNA-Seq experiments.

    This tutorial illustrates:

    • Importing aligned reads

    • Adding sample attributes

    Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Description of the Data Set

    In this tutorial, you will analyze an RNA-Seq experiment using the Partek Genomics Suite software RNA-Seq workflow. The data used in this tutorial was generated from mRNA extracted from four diverse human tissues (skeletal muscle, brain, heart, and liver) from different donors and sequenced on the Illumina® Genome Analyzer™. The single-end mRNA-Seq reads were mapped to the human genome (hg19), allowing up to two mismatches, using Partek Flow alignment and the default alignment options. The output files of Partek Flow are BAM files which can be imported directly into Partek Genomics Suite 7.0 software. BAM or SAM files from other alignment programs like ELAND (CASAVA), Bowtie, BWA, or TopHat are also supported. This same workflow will also work for aligned reads from any sequencing platform in the (aligned) BAM or SAM file formats.

    Data and associated files for this tutorial can be downloaded by going to Help > On-line Tutorials from the Partek Genomics Suite main menu or using this link - . Once the zipped data directory has been downloaded to your local drive:

    • Unzip the downloaded files to C:\Partek Training Data\RNA-seq or to a directory of your choosing. Be sure to create a directory or folder to hold the contents of the zip file

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Gene Ontology (GO) Enrichment

    With the GO Enrichment feature in Partek Genomics Suite, you can take a list of significantly expressed genes/transcripts and find GO terms that are significantly enriched within the list. For a detailed introduction to GO Enrichment, refer to the GO Enrichment User Guide (Help > On-line Tutorials > User Guides).

    • Select the Diff_Exp_and_Alt_Splice spreadsheet from the spreadsheet tree

    • Select Gene Set Analysis in the Biological Interpretation section of the RNA-Seq workflow (Figure 1)

    Figure 1. Selecting Gene Set Analysis

    • Select GO Enrichment in the Gene Set Analysis dialog (Figure 2)

    • Select Next >

    Figure 2. Selecting the method of analysis

    • Select the spreadsheet 1/Diff_Exp_and_Alt_Splice (Diff Exp and Alt Splice.txt) from the drop-down menu (Figure 3)

    • Select Next >\

    Figure 3. Selecting the spreadsheet that contains the genes you want to test

    • Select Use Fisher's Exact test

    • Select Invoke gene ontology browser on the result

    • Set Restrict analysis to functional groups with more than _ genes to 2 (Figure 4)

    Figure 4. GO Enrichment options

    • Select Default mapping file (Figure 5)

    • Select Next >

    Figure 5. Selecting the mapping file

    A GO-Enrichment spreadsheet, as well as a browser (Figure 6), will be generated with the enrichment score shown for each GO term. Browse through the results to find a functional group of interest by examining the enrichment scores. The higher the enrichment score, the more over represented this functional group is in the input gene list. Alternatively, you may use the Interactive filter on the GO-Enrichment spreadsheet to identify functional groups that have low p-values and perhaps a higher percentage of genes in the group that are present.

    Figure 6. Viewing the Gene Ontology Browser

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Analyzing the unexplained regions spreadsheet

    During a previous section of this tutorial, a spreadsheet named unexplained_regions was generated. This spreadsheet contains locations where reads map to the genome but are not annotated by the transcript database, in this case, RefSeqGene. The unexplained_regions spreadsheet is potentially very interesting as it may contain novel findings.

    • Right click column 6. Average Coverage and select Sort Descending from the menu

    • Select Find Overlapping Genes from the Tools option in the command toolbar (Figure 1)

    Figure 1. Selecting Find Overlapping Genes from Tools in the command toolbar

    • Select Add a new column with the gene nearest to the region in the Find Overlapping Genes dialog (Figure 2)

    • Select OK

    Figure 2. Find Overlapping Genes

    • Select RefSeq****Transcripts – 2017-05-02 from the Output Overlapping Features dialog (Figure 3)

    Please note that it is recommended that you annotate with the same database used when you performed mRNA quantification.

    • Select OK

    Figure 3. Select the database to search for overlapping features

    The closest overlapping feature and the distance to it is now included as columns 7. Overlapping Features and _8. Nearest Features i_n the unexplained_regions spreadsheet.

    Right-clicking on a row header and selecting Browse to Location will show the reads mapped to the chromosome. For this tutorial, a couple of genes are selected to show regions that are located after a known gene or in the intron of a gene.

    • Right-click row 39 and select Browse to location from the pop-up menu

    • Select the Chromosome View tab to view a region within an intron of UNC45B. This may be a novel exon (Figure 4)

    Figure 4. A region within an intron of UNC45B that might be an novel exon

    • Right-click row 12576 and select Browse to location to go to a region that starts 1 bp after CD82.

    • Select () several times to zoom out slightly

    This peak may represent an extended exon (Figure 5).

    Figure 5. A region that starts 1 bp after CD82 that might represent an extended exon

    While RefSeq was used to identify overlapping features, the choice of which database to use will depend on the biological context of your experiment. For example, you may wish to utilize promoter or miRNA databases if you are interested in regulation of expression.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Importing ChIP-Seq data

    Data for this tutorial can be downloaded from the Partek website using this link - ChIP-Seq tutorial dataarrow-up-right. To follow this tutorial, download the two .bam files and unzip them on your local computer using 7-zip, WinRAR, or a similar program.

    • Store the two.bam files at C:\Partek Training Data\ChIP-Seq or to a directory of your choosing. We recommend creating a dedicated folder for the tutorial on a local drive.

    • Select ChIP-Seq from the Workflows drop-down menu (Figure 1)

    Figure 1. Selecting the ChIP-Seq workflow

    • Select Import and Manage Samples from the Import section of the ChIP-Seq workflow

    • Select Browse... or use the file tree to navigate to the folder where you stored the .bam files

    All .bam files in the folder will be selected by default (Figure 2).

    Figure 2. Importing .bam files using the Sequence Import dialog

    • Verify that chip.bam and mock.bam are selected

    • Select OK

    The Sequence Import dialog will launch (Figure 3). This allows us to choose the output file name and destination for the parent spreadsheet, as well as the species, and genome build of the imported samples. By default, the output file destination is the folder the .bam files are located.

    Figure 3. Setting the output file name, species, and genome build during .bam file import

    • Set Output file to ChIP-Seq

    • Set Species to Homo sapiens using the drop-down menu

    • Set Genome build to hg18 using the drop-down menu

    The Bam Samples Manager dialog will open (Figure 4). This dialog can be used to add samples to the project (Add samples), remove samples (Remove samples), to associate multiple files with particular samples (Manage samples), and to map the chromosome names from the input files to the association files (Manage sequence names).

    Figure 4. The Bam Sample Manager can be used to add, remove, and manage files and samples

    • Select Close

    The Sort bam files dialog will open. Sorting is necessary for all imported .bam files, but you can choose to hide this hint in the future by selecting Please don't show me this hint again.

    • Select OK

    The imported spreadsheet will open while the .bam files are sorted. Progress in sorting will be displayed on the progress bar in the lower left-hand corner of the Partek Genomics Suite window. Once sorting has completed, there will be samples on rows with the sample names in column 1. Sample ID and the number of reads mapped to the reference genome for each sample in column 2. Number of allignments (Figure 5).

    Figure 5. Imported .bam files with one sample in each row

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Finding genes with copy number variation

    With a list of amplified or deleted regions in our cohort in hand, one of the more interesting questions to ask is what genes have recurrent amplifications or deletions in the data set. To address this question, we can use the Find overlapping genes function to either add a column to our region list with the genes present in each region or create a new list of genes that overlap the regions.

    Here, we will create a new spreadsheet with genes that overlap the regions in the amplified_or_deleted spreadsheet.

    • Select the amplified_or_deleted spreadsheet in the spreadsheet tree

    • Select Find Overlapping Genes from the Copy Number Analysis section of the workflow

    • Select Create a New Spreadsheet with Genes that Overlap the Regions from the Find Overlapping Genes dialog (Figure 1)

    • Select OK

    Figure 1. Options in Find Overlapping Genes dialog

    To determine what regions in the genome correspond to genes, we need to select an annotation database (Figure 2).

    Figure 2. Viewing the Output Overlapping Features dialog. Database files not present on the computer display Download required in red

    Partek Genomics Suite offers a variety of possibilities including RefSeq, Ensembl, and GENCODE; however, custom annotations can also be used. If the database file has not been downloaded, Download required. Click OK to download the file, will be listed in red beneath the annotation. Selecting OK will automatically download the file and then run the task.

    • Select Ensembl Transcripts release 75

    • Select OK

    A new spreadsheet, gene-list, is created as a child spreadsheet of amplified_or_deleted (Figure 3).

    Figure 3. Viewing the gene-list spreadsheet, a result of overlapping genes with regions of copy number changes. Each row of the table represents one Ensembl transcript

    Each row corresponds to a transcript and the columns are as follows:

    1. Genomic coordinates of the transcript

    4. Coding strand

    5. Transcript ID

    6. Gene Symbol

    7. Minimum distance of the region to the transcription start site with positive values indicating downstream and negative values indicating upstream

    8. Percent overlap with gene indicates how much of the transcript sequence overlaps the region

    9. Percent overlap with region indicates how much of the region is overlapped by the transcript

    10. + Correspond to the columns 1+ in the segment-analysis spreadsheet

    This gene-list spreadsheet is gene-centric and enables genomic integration. For example, GO and Pathway enrichment can be directly invoked on the gene-list spreadsheet to detect functional groups affected by copy number changes. While not detailed in this tutorial, please feel free to explore these options on your own. For rmore information on enrichment analysis, you can consult the tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Survival Analysis

    This tutorial will illustrate:

    • Kaplan-Meier Survival Analysis

    • Cox Regression Analysis

    Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support pagearrow-up-right to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Introduction to Survival Analysis

    Survival analysis is a branch of statistics that deals with modeling of time-to-event. In the context of “survival,” the most common event studied is death; however, any other important biological event could be analyzed in a similar fashion (e.g., spreading of the primary tumor or occurrence/relapse of disease). The significant event should be well-defined and occur at a specific time. As the primary outcome event is typically unfavorable (e.g., death, metastasis, relapse, etc.), the event is called a “hazard.” Survival analysis tries to answer questions such as: What is the proportion of a population who will survive past a certain time (i.e., what is the 5-year survival rate)? What is the rate at which the event occurs? Do particular characteristics have an impact on survival rates (e.g., are certain genes associated with survival)? Is the 5-year survival rate improved in patients treated by a new drug?

    An important feature of survival analysis is the presence of “censored” data. Censored data refers to subjects that have not experienced the event being studied. For example, medical studies often focus on survival of patients after treatment so the survival times are recorded during the study period. At the end of the study period, some patients are dead, some patients are alive, and the status of some patients is unknown because they dropped out of the study. Censored data refers to the latter two groups. The patients who survived until the end of the study or those who dropped out of the study have not experienced the study event "death" and are listed as "censored".

    hashtag
    Tutorial Data Set

    The tutorial data set (236 samples) is a subset of fresh-frozen breast tumor specimens from a population-based cohort of 315 women with breast cancer. The clinicopathological characteristics accompanying each tumor include p53 status (mutant or wild-type), estrogen receptor (ER) status, progesterone receptor (PgR) status, lymph node status, tumor size, and patient age. Gene expression was assessed on Affymetrix® U133A and U133B arrays (Miller LD et al., GSE3494). Please note that Affymetrix data have been chosen for the illustration purposes only, and that the same functionality can be used to analyze any data set. The raw data files (.CEL) have already been imported into PGS; samples with no survival time data, as well as sample attributes irrelevant for the survival analysis, were removed, and the final spreadsheet was saved in Partek Genomics Suite (Survival_Tutorial.fmt and Survival_Tutorial.txt). To go through the tutorial, , unzip the downloaded folder and save it in an easily accessible location on your computer.

    After saving the unzipped file, you can open it in Partek Genomics Suite.

    • Select File from the main toolbar

    • Select Open...

    • Browse to the folder containing the tutorial data set and select the file Survival_Tutorial.fmt

    The data spreadsheet will open (Figure 1). Each row represents a tumor sample from a breast cancer patient. Sample attributes are listed in columns 1-8, while columns 9+ are intensity values for the probe sets listed in the column headers.

    Figure 1. Viewing the sample data (one sample per row) for the survival analysis tutorial

    hashtag
    References

    Miller LD, Smeds J, George J, Vega VB et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. PNAS, 2005; 102(38): 13550-5.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Copy Number Analysis

    This tutorial will illustrate:

    • Importing Copy Number Data

    • Exploring the data with PCA

    • Creating Copy Number from Allele Intensities

    Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Introduction to Copy Number Analysis

    Copy number analysis asks whether there are regions of the genome with altered abundance. Of particular interest are any genes within those regions and how might a change in gene abundance alter phenotype. Partek Genomics Suite software allows these questions to be answered by analyzing a variety of commercially available assays for copy number analysis. SNP-genotyping arrays with closely spaced genomic markers (Affymetrix and Illumina) and comparative genomic hybridization (CGH) arrays (Agilent, NimbleGen, or custom spotted arrays) can be imported into Partek Genomics Suite and analyzed.

    When performing copy number analysis, it is important to remember an inherent limitation of copy number region analysis - the inability to detect copy-neutral events caused by copy-number-neutral loss of heterozygosity (LOH) or copy-number-neutral allelic imbalance. This limitation can be addressed by supplementing copy number analysis with SNP genotyping data. Partek Genomics Suite supports both LOH and allele-specific copy number (AsCN) analysis with dedicated workflows. Tutorials on and analysis are also available.

    hashtag
    Introduction to the tutorial data set

    The example data set consists of 20 paired samples from an ovarian cancer study in which a fresh-frozen tumor sample and peripheral blood sample were obtained from 10 female patients (Ramakrishna et al. 2010). All 20 samples were analyzed using the Affymetrix Genome Wide Human SNP Array 6.0. To download the data set, select this link - . The data set is also used for the LOH and AsCN tutorials. The spreadsheet used in this tutorial was generated by importing SNP6 CEL files and annotating them with attributes for each sample. The experimental goal is to identify copy number changes present in multiple patient tumors.

    hashtag
    References

    Ramakrishna M, Williams LH, Boyle SE, Bearfoot JL et al. Identification of candidate growth promoting genes in ovarian cancer through integrated copy number and expression analysis. PLoS One 2010;5(4).

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Creating a gene list using the Venn Diagram

    The List Manager can be used to generate lists of genes by applying criteria such as fold change and false discovery rate (FDR) adjusted p-value thresholds.

    • Select the Analysis tab

    • Select ANOVAResults in the spreadsheet tree

    • Select Create Gene List from the Analysis section of the Gene Expression workflow (Figure 1)

    Figure 1. Selecting Create Gene List from the Gene Expression workflow

    • Select E2 vs. Control from the Contrast panel of the ANOVA Streamlined tab in the List Manager dialog

    • Deselect the Include size of the change option

    • Set p-value with FDR < to 0.1 (Figure 2)

    Figure 2. Configuring the List Manager using the ANOVA Streamlined filtering options

    There should be ~545 probe(sets)/genes that meet this threshold.

    • Select Create

    A new spreadsheet, E2 vs. Control, will be added as a child spreadsheet of Breast_Cancer.txt.

    • Repeat the steps listed above to create lists for E2+ICI vs. Control (~24 genes), E2+Ral vs. Control (~22 genes), and E2+TOT vs. Control (~177 genes) with the same threashold

    Now we can use the Venn Diagram to create a list of genes that are differentially regulated in all treatment groups.

    • Select the Venn Diagram tab in the List Manager dialog

    The Venn Diagram shows overlap between selected gene lists.

    • Select the four created lists (E-H) in the spreadsheet list in the List Manager dialog by selecting each while holding the Ctrl key on your keyboard

    The Venn Diagram will display the number of overlapping and distinct genes from the four lists (Figure 3).

    Figure 3. Viewing the Venn Diagram with intersections of four lists of significant genes

    The intersection of the four ellipses shows that 14 differentially regulated genes are in common between the four threatment schemes.

    • Select the region intersecting all four ellipses

    • Right-click the intersected region

    • Select Create List From Highlighted Regions

    • Select

    The new list will appear in the spreadsheet tree with a temporary file name (ptpm).

    • Select the temporary list in the spreadsheet tree

    • Select () from the command bar

    • Save the list as fourtreatments

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Optional: Integrating copy number with LOH and AsCN

    Although copy number analysis is a powerful tool for studying genomic aberrations, it lacks the capability to detect changes that are copy-neutral. For example, loss of heterozygosity (LOH) can involve a change in copy number or be copy-neutral. In the former case, LOH could be caused by a hemizygous deletion in which one allele is lost and the other allele remains present (Figure 1, middle panel). This type of LOH can be recognized by copy number analysis or SNP-genotyping. However, in the latter case, an allele is lost initially, but a subsequent amplification of the remaining copy creates a copy-neutral LOH (Figure 1, right panel). This copy-neutral LOH can only be detected when copy number is studied in combination with SNP genotype.

    Figure 1. Possible mechanisms of LOH and their impact on copy number. Left panel: heterozygous SNP; numbers indicate the number of copies of each allele (“normal” allele = green, “mutant” = red). Middle panel: hemizygous deletion leading to the loss of normal allele. Right panel: duplication of the ”mutant” allele. The case in the middle panel changes the copy number, while the case in the right panel is copy-number neutral

    Copy-neutral events can be detect by combining the copy number workflow with the LOH workflow or the Allele-Specific Copy Number (AsCN) workflow to detect allelic imbalance (AI) (advantages of AsCN over LOH are discussed below). With these approaches, the copy number data are supplemented with SNP genotyping data (currently available with Affymetrix® and Illumina® arrays) to label the genomic regions as amplification without LOH/AI, amplification with LOH/AI, deletion without LOH/AI, deletion with LOH/AI, copy-neutral LOH/AI (Figure 2). The last category, copy-neutral LOH/AI, is the added value of the workflow integration.

    An important consideration when choosing between LOH and AsCN analysis is that LOH analysis in the context of cancer has been proven complex and difficult because cancer cells frequently deviate from the diploid state and tumor samples often contain many normal cells. As the proportion of tumor cells in a sample decreases and approaches 50% or less, the capacity to detect the LOH diminishes (Yamamoto et al., Am J Hum Gen 2007). Additionally, in cases where only one of two alleles is amplified, LOH genotyping algorithms fail to call a heterozygote SNP, resulting in a false-positive LOH call.

    Figure 2. Integration of copy number workflow with loss of heterozygosity (LOH) or allelic imbalance (AI) under allele-specific copy number (AsCN) workflows enables the identification of copy-neutral events

    AsCN analysis, on the other hand, enables reliable detection of allelic imbalance in tumor samples even in the presence of large proportions of normal cells. Unlike LOH, it does not require a large set of normal reference samples. For a heterozygous SNP, a balance is expected between the two alleles (1×A and 1×B, or 1:1 ratio). The AsCN algorithm provides an estimated number of copies of each allele and therefore enables the detection of allelic imbalance even in cases when alleles are amplified or deleted (e.g. 3×A and 1×B). Moreover, LOH can be considered a special case of AI (e.g., 1×A, B allele deleted) (Figure 3). Therefore, AsCN should be the preferred workflow for tumor samples.

    Figure 3. Loss of heterozygosity (LOH) as a special case of allelic imbalance. The situation on the left represents a normal heterozygous SNP, with one copy of each allele

    hashtag
    References

    Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 2008 Nov;36(19):e126.

    Ramakrishna M, Williams LH, Boyle SE, Bearfoot JL, Sridhar A, Speed TP, Gorringe KL, Campbell IG. Identification of candidate growth promoting genes in ovarian cancer through integrated copy number and expression analysis. PLoS One. 2010 Apr 8;5(4):e9983.

    Yamamoto G, Nannya Y, Kato M, Sanada M, Levine RL, Kawamata N, Hangaishi A, Kurokawa M, Chiba S, Gilliland DG, Koeffler HP, Ogawa S. Highly sensitive method for genomewide detection of allelic composition in nonpaired, primary tumor specimens by use of Affymetrix single-nucleotide-polymorphism genotyping microarrays. Am J Hum Genet. 2007 Jul;81(1):114-26.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Optional: Additional options for annotating regions

    In addition to annotating regions with overlapping genes, other annotations can be to characterize the regions showing copy number variation.

    For example, Overlap with known SNPs in the Copy Number workflow gives the option of annotation regions with SNPs from dbSNP or a custom SNP database (Figure 1).

    Figure 1. Annotate regions with SNPs from dbSNP

    This task adds two column to the region list spreadsheet - the list of SNPs described in each region and the total number of SNPs in the region. If the list of SNPs is very long, you can output a separate list by right-clicking on the row header and select Create list of dbSNP from the pop-up menu.

    Another option in the workflow is Test for known abnormalities. Selecting this option compares the regions listed in the region list with a database of genomic abnormalities characteristic of particular diseases or syndromes to find possible matches. Annotation options include a Partek-distributed database of 60 syndromes or a custom database (Figure 2). Please note that the included table of known abnormalities is distributed for research use only.

    Figure 2. Test for known abnormalities in your copy number data

    If you like to add a custom database, organize the following information by column: the name of the abnormality, chromosome number, start location, and stop location. The input for the task should be a list of aberrations for every sample; do not include unchanged regions in the input or every syndrome will be shown as positive.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Creating gene lists from ANOVA results

    hashtag
    Creating a gene list with the ANOVA Streamlined list manager

    Now that you have obtained statistical results from the microarray experiment, you can create new spreadsheets containing just those genes that pass certain criteria. This will streamline data management by focusing on just those genes with the most significant differential expression or substantial fold change. The List Manager can be used to specify numerous conditions for selecting genes of interest. In this tutorial, we are going to create a gene list of gene with a fold change between -1.3 to 1.3 that has an unadjusted p-value of < 0.0005.

    Import and normalize methylation data

    To follow this tutorial, download the 32 .idat files (note that two .idat files are generated for each array) and unzip them on your local computer using 7-zip, WinRAR, or a similar program. The .idat files can be downloaded in a zipped folder using this link - .

    • Store the 32 .idat files at C:\Partek Training Data\Methylation or to a directory of your choosing. We recommend creating a dedicated folder for the tutorial

    • Go to the Workflows drop down list, select Methylation (Figure 1)

    Detect differentially methylated CpG islands

    The approach described in previous sections relies on ANOVA to detect differentially methylated CpG sites and takes individual sites as a starting point for interpretation. Since ANOVA compares M values at each site independently, this strategy is robust to type I/type II probe bias.

    An alternative could be to first summarize all the probes belonging to a CpG island region (i.e. island, N-shore, N-shelf, S-shore, S-shelf) and then use ANOVA to compare regions across the groups. Since the summarization will include both type I and type II probes, you may want to split the analysis in two branches and analyze type I and type II probes independently. To do this, we need to annotate each probe as type I or type II.

    • Select the mvalue spreadsheet

    Visualizing differential isoform expression

    Chromosome View in the Partek Genomics Suite software enables visualization of differential expression and alternative splicing results in RNA-Seq data.

    • Select New Track

    • Select Add a track from spreadsheet and select 1/transcripts (RNA-Seq_results.transcripts) from the drop-down menu

    Importing aligned reads

    We will be using the RNA-Seq workflow to analyze RNA-Seq data throughout this tutorial. The commands included in the RNA-Seq workflow are also available form the command toolbar, but may be labeled differently.

    • Select the RNA-Seq workflow by selecting it from the Workflow drop-down menu in the upper right-hand corner of the Partek Genomics Suite window (Figure 1)

    Figure 1. Selecting the RNA-seq workflow

    The Partek Genomics Suite software can import next generation sequencing data that has been aligned to a reference genome. Two standard types of alignment formats can be imported: .BAM and .SAM. It is also possible to concert ELAND .txt files to .BAM files with the converter found in the Tools

    Creating a gene list with advanced options

    The basic method of creating a gene list from ANOVA results based on fold-change and p-value cut-offs is detailed in . Advanced options enable the creation of lists based on more complex criteria. For example, we can use the Create Gene List function to identify transcripts that are both significantly differentially expressed AND alternatively-spliced among the four tissue samples.

    • Select Create Gene List from the Analyze Known Genes panel of the RNA-Seq workflow to invoke the List Manager dialog

    • Select the Advanced tab (Figure 1)

    Exploring the data set with PCA

    Principal Components Analysis (PCA) is an excellent method to visualize similarities and differences between the samples in a data set. PCA can be invoked through a workflow, by selecting () from the main command bar, or by selecting Scatter Plot from the View section of the main toolbar. We will use a workflow.

    • Select Gene Expression from the Workflows drop-down menu

    • Select PCA Scatter Plot from the QA/QC section of the Gene Expression workflow

    Finding nearest genomic features

    In this section, you will learn how to find genomic features (genes) that are near the IP-enriched regions of the data. You will also learn how to classify the peak locations by gene section (5’ UTR, 3’ UTR, Promoter, exon, intron).

    hashtag
    Finding the nearest genomic features

    • Select p-value_filtered from the spreadsheet tree

    Detecting differential expression in RNA-Seq data

    During import, you created a categorical attribute called Tissue and assigned the 4 samples to either the muscle or not muscle groups. This step was to create replicates within a group, albeit this grouping is somewhat artificial and is only used in this tutorial because we want to illustrate ANOVA with a small data set. Replicates are a prerequisite for differential expression analysis using ANOVA.

    • Select Differential Expression Analysis from the Analyze Known Genes section of the RNA-Seq workflow

    The Differential Expression Analysis dialog offers the choice of analyzing at Gene-,Transcript-, or Exon-level.

    Detect differentially methylated loci

    To detect differential methylation between CpG loci in different experimental groups, we can perform an ANOVA test. For this tutorial, we will perform a simple two-way ANOVA to compare the methylation states of the two experimental groups.

    • Select Detect Differential Methylation from the Analysis section of the Illumina BeadArray Methylation workflow

    A new child spreadsheet, mvalue, is created when Detect Differential Methylation is selected. M-values are an alternative metric for measuring methylation. β-values can be easily converted to M-values using the following equation: M-value = log2( β / (1 - β)).

    An M-value close to 0 for a CpG site indicates a similar intensity between the methylated and unmethylated probes, which means the CpG site is about half-methylated. Positive M-values mean that more molecules are methylated than unmethylated, while negative M-values mean that more molecules are unmethylated than methylated. As discussed by

    ChIP-Seq Analysis

    Chromatin Immunoprecipitation Sequencing (ChIP-Seq) uses high-throughput DNA sequencing to map protein-DNA interactions across the entire genome. Partek Genomics Suite offers convenient visualization and analysis of ChIP-Seq data.

    In this tutorial, we will use the Partek Genomics Suite ChIP-Seq workflow to analyze aligned data from a ChIP sample versus a control sample in .bam format.

    This tutorial illustrates:

    RNA-Seq mRNA quantification
    Detecting differential expression in RNA-Seq data
    Creating a gene list with advanced options
    Visualizing mapped reads with Chromosome View
    Visualizing differential isoform expression
    Gene Ontology (GO) Enrichment
    Analyzing the unexplained regions spreadsheet
    Our support pagearrow-up-right
    RNA-Seq Data Analysis tutorial filesarrow-up-right
    our support pagearrow-up-right
    Detecting regions with copy number variation
    Creating a list of regions
    Finding genes with copy number variation
    Optional: Additional options for annotating regions
    Optional: GC wave correction for Affymetrix CEL files
    Optional: Integrating copy number with LOH and AsCN
    Our support pagearrow-up-right
    LOH
    AsCN
    CNV Tutorial Data Setarrow-up-right
    our support pagearrow-up-right
  • Detecting peaks and enriched regions in ChIP-Seq data

  • Creating a list of enriched regions

  • Identifying novel and known motifs

  • Finding nearest genomic features

  • Visualizing reads and enriched regions

  • Note: the workflow described below is enabled in Partek Genomics Suite version 7.0 software. Please fill out the form on Our support pagearrow-up-right to request this version or use the Help > Check for Updates command to check whether you have the latest released version. The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite.

    hashtag
    Description of the Data Set

    The data for this tutorial comes from Johnson et al. 2007, which first described the ChIP-Seq technique.

    This study mapped genomic binding sites for neuron-restrictive silencer factor (NRSF) transcription factor across the genome. There are two samples: an NRSF-enriched ChIP sample (chip.bam) and a control sample of input DNA without antibody immunoenrichment (mock.bam). The chip.bam file contains ~1.7 million mapped reads and the mock.bam file contains ~2.3 million mapped reads. These .bam files contain the aligned genomic locations and sequences of mapped reads. This data set contains reads from a single-end (SE) library; the differences in processing paired-end (PE) reads will be discussed when applicable.

    Data for this tutorial can be downloaded from the Partek website using this link - ChIP-Seq tutorial dataarrow-up-right. To follow this tutorial, download the 2 .bam files and unzip them on your local computer using 7-zip, WinRAR, or a similar program. Because of the large size of the .bam files, we recommend saving them to a local drive instead of trying to access them on a network drive. The first time a .bam file is read by Partek Genomics Suite, the file will be sorted to allow for faster access. Therefore, you must have write permissions for the .bam files after download and on the file folder where they are stored.

    hashtag
    References

    Johnson, D. S., Mortazavi, A., Myers, R. M., & Wold, B. (2007). Genome-Wide Mapping of in Vivo Protein-DNA Interactions (Vol. 316). New York, NY: Science.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Importing ChIP-Seq data
    Quality control for ChIP-Seq samples
    Select Next >
    our support pagearrow-up-right
    Gene Ontology Enrichment
    our support pagearrow-up-right
    download the tutorial data setarrow-up-right
    our support pagearrow-up-right
    Select OK
    our support pagearrow-up-right
    our support pagearrow-up-right
    Close
    to exit the
    List Manager
    dialog
    our support pagearrow-up-right
    our support pagearrow-up-right
    Select OK
    our support pagearrow-up-right
    our support pagearrow-up-right
    our support pagearrow-up-right
    Invoke the List Manager dialog by selecting Create Gene List in the Analysis section of the Gene Expression workflow
  • Ensure that the 1/ANOVA-3way (ANOVAResults) spreadsheet is selected as this is the spreadsheet we will be using to create our new gene list as shown (Figure 1)

  • Select the ANOVA Streamlined tab.

  • Set Contrast: find genes that change between two categories panel, to Down Syndrome vs. Normal and select Have Any Change from the Setting drop-down menu

  • This will find genes with different expression levels in the different types of samples.

    • In the Configuration for “Down Syndrome vs Normal” panel, check that Include size of the change is selected and enter 1.3 into Change > and -1.3 in OR Change <

    • Select Include significance of the change, choose unadjusted p-value from the dropdown menu, and < 0.001 for the cutoff

    The number of genes that pass your cutoff criteria will be shown next to the # Pass field. In this example, 30 genes pass the criteria.

    • Set Save the list as A

    • Select Create to generate the new list A

    • Select Close to view the new gene list spreadsheet

    Figure 1. Creating a gene list from ANOVA results

    The spreadsheet Down_Syndrome_vs_Normal (A) will be created as a child spreadsheet under the Down_Syndrome-GE spreadsheet.

    This gene list spreadsheet can now be used for further analysis such as hierarchical clustering, gene ontology, integration of copy number data, or be exported into other data analysis tools such as pathway analysis.

    You can practice creating new gene list criteria of your own to become familiar with the List Manager tool. For more information, you can always click on the () buttons.

    hashtag
    Creating a gene list from a volcano plot

    Next, we will generate a list of genes that passed a p-value threshold of 0.05 and fold-changes greater than 1.3 using a volcano plot.

    • Select the 1/ANOVA-3way (ANOVAResults) spreadsheet in the Analysis tab. This is the spreadsheet our gene list will be drawn from

    • Select View > Volcano Plot from the Partek Genomics Suite main menu (Figure 2)

    Figure 2. Generating a Volcano Plot from ANOVA results

    • Set X Axis (Fold-Change) to 12. Fold-Change(Down Syndrome vs. Normal), and the Y axis (p-value) to be 10. p-value(Down Syndrome vs. Normal)

    • Select OK to generate a Volcano Plot tab for genes in the ANOVA spreadsheet (Figure 3)

    Figure 3. Volcano plot generated from ANOVA spreadsheet

    In the plot, each dot represents a gene. The X-axis represents the fold change of the contrast (Down syndrome vs. Normal), and the Y-axis represents the range of p-values. The genes with increased expression in Down syndrome samples are on the right side of the N/C (no change) line; genes with reduced expression in Down syndrome samples are on the left. The genes become more statistically significant with increasing Y-axis position. The genes that have larger and more significant changes between the Down syndrome and normal groups are on the upper right and upper left corner.

    In order to select the genes by fold-change and p-value, we will draw a horizontal line to represent the p-value 0.05 and two vertical lines indicating the –1.3 and 1.3-fold changes (cutoff lines).

    • Select Rendering Properties ()

    • Choose the Axes tab

    • Check Select all points in a section to allow Partek Genomics Suite to automatically select all the points in any given section

    • Select the Set Cutoff Lines button and configure the Set Cutoff Lines dialog as shown (Figure 4)

    Figure 4. Setting cutoff lines for -1.3 to 1.3 fold changes and a p-value of 0.05

    • Select OK to draw the cutoff lines

    • Select OK in the Plot Rendering Properties dialog to close the dialog and view the plot

    The plot will be divided into six sections. By clicking on the upper-right section, all genes in that section will be selected.

    • Right-click on the selected region in the plot and choose Create List to create a list including the genes from the section selected (Figure 5). Note that these p-values are uncorrected

    Figure 5. Creating a gene list from a volcano plot

    Note: If no column is selected in the parent (ANOVA) spreadsheet, all of the columns will be included in the gene list; if some columns are selected, only the selected columns will be included in the list.

    • Specify a name for the gene list (example: volcano plot list) and write a brief description about the list.

    The description is shown when you right-click on the spreadsheet > Info > Comments. Here, I have named the list "volcano plot list" and described it as "Genes with >1.3 fold change and p-value <0.05" (Figure 6). The list can be saved as a text file (File > Save As Text File) for use in reports or by downstream analysis software.

    Figure 6. Saving a list created from a volcano plot

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Figure 1. Selecting the methylation workflow
    • Select Microarray Loci Methylation from the Methylation sub-workflows panel (Figure 2)

    Figure 2. Selecting the Illumina BeadArray Methylation workflow

    • That will open Illumina BeadArray Methylation workflow (Figure 3)

    Figure 3. Illumina BeadArray Methylation workflow

    • Select Import Illumina Methylation Data to bring up the Load Methylation Data dialog

    • Select Import human methylation 450/850 .idat files (Figure 4)

    Figure 4. Selecting human methylation 450/850 .idat file type for import

    • Select OK

    • Select Browse... to navigate to the folder where you stored the .idat files

    All .idat files in the folder will be selected by default (Figure 5).

    Figure 5. Selecting .idat files to import

    • Select Add File(s) > to move the files to the idat Files to Process pane of the Import Illumina iDAT Data dialog (Figure 6)

    Figure 6. Confirming selection of .idat files for import

    • Select Next >

    The following dialog (Figure 7) deals with the manifest file, i.e. probe annotation file. If a manifest file is not present locally, it will be downloaded in the Microarray libraries directory automatically. The download will take place in the background, with no particular message on the screen and it may take a few minutes, depending on the internet connection. In the future, you may want to reanalyze a data set using the same version of the manifest file used during the initial analysis, rather than downloading an up-to-date version. To facilitate this, the Manual specify option in the Manifest File section allows you to specify a specific version. For this tutorial, we will leave this on the default settings.

    Figure 7. Selecting manifest file and output file

    By default the output file destination is set to the file containing your .idat files and the name matches the file folder name. The name and location of the output file can be changed using the Output File panel.

    • Select Customize to view advanced options for data normalization

    In the Algorithm tab of the Advanced Import Options dialog (Figure 8), there are two filtering options and five normalization options available. The filters allow you to exclude probes from the X and Y chromosomes or based on detection p-value. In this tutorial, we have male and female samples, so we will apply the X and Y chromosome Filter. We will also filter probes based on detection p-value to exclude low-quality probes.

    • Select Exclude X and Y Chromosomes

    Analysis of differentially methylated loci in humans and mice often excludes probes on the X and Y chromosomes because of the difficulties caused by the inactivation of one X chromosome in female samples.

    • Select Exclude probes using detection p-value and leave the default settings of 0.05 and 1 out of 16 samples.

    We recommend using the default option for normalization; however, advanced users can select their preferred normalization method. Select the () next to each normalization option for details. If you want to import probe intensity, raw probe intensity, probe signals, raw probe signals, or anti-log probe intensity values, they can be added to the data import using the Outputs tab of the Advanced Import Options dialog. Probe intensities and raw probe intensities can be used for advanced troubleshooting purposes and antilog probe intensities can be used for copy number detection. The Outputs tab of the Advanced Import Options dialog also has an option to create NCBI GEO submission spreadsheets from your imported data. For this tutorial, we do not need to import any of these values or create GEO submission spreadsheets.

    Figure 8. Advanced Import Options offers choice of normalization method and additional data outputs

    • Select OK to close the Advanced Import Options dialog

    • Select Import on the Import Illumina iDAT data dialog

    The imported and normalized data will appear as a spreadsheet 1 (Methylation Tutorial) (Figure 9)

    Figure 9. Viewing the imported methylation data in a spreadsheet

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Differential Methylation Analysis data setarrow-up-right
    Select Transform from the main toolbar
  • Select Create Transposed Spreadsheet... from the Transform drop-down menu (Figure 1)

  • Figure 1. Creating a transposed spreadsheet

    • Select Sample ID for Column: and numeric for Data Type:

    • Select OK

    A new temporary spreadsheet will be created with a row for each probe and columns for each sample.

    • Right-click on column 1. ID to bring up the pop-up menu

    • Select Insert Annotation

    • Select Add as categorical

    • Select Infinium_Design_Type and UCSC_CpG_Islands_Name from the Column Configuration options (Figure 2)

    Figure 2. Adding Infinium design type and CpG island annotations

    • Select OK to add the Inifinium design type and UCSC CpG island name as categorical columns on the spreadsheet

    Now, we can use the interactive filter to create separate spreadsheets for type I and type II probes.

    • Select () to launch the interactive filter

    • Select 2. Infinium_Design_Type from the drop-down menu if not selected by default

    • Left-click the type I column to exclude it

    • Right-click the temporary spreadsheet in the spreadsheet tree to bring up the pop-up dialog

    • Select Clone... (Figure 3)

    Figure 3. Creating a probe list with only Infinium type II probes

    • Name the new spreadsheet female_only_typeII_probes

    • Select OK

    • Save the created spreadsheet, we chose the file name female_only_typeII_probes

    • Repeat process to create a spreadsheet for type I probes

    The temporary spreadsheet is no longer needed so we can close it.

    • Close the temporary spreadsheet by selecting it in the file tree and selecting ()

    We can use these spreadsheets to generate lists of M values at CpG island regions

    • Select spreadsheet female_only_typeII_probes

    • Select Stat from the main toolbar

    • Select Column Statistics... under Descriptive (Figure 4)

    Figure 4. Selecting column statistics

    • Add Mean to the Selected Measure(s) panel

    • Select Group By and set it to 3. UCSC_CpG_Islands_Name (Figure 5)

    Figure 5. Configuring column statistics

    • Select OK

    The new temporary spreadsheet has one CpG island region per row (Figure 6), samples on columns, and the values in the cells represent the mean of M values of all the CpG probes in the region.

    Figure 6. New spreadsheet with average M values for probes at each CpG island; probes not at CpG islands are collected into the first row "- Mean"

    Note the first row, with label “– Mean”. It corresponds to all the probes that map outside of UCSC CpG islands. As it is not needed for the downstream analysis, we will remove it.

    • Right-click on the row header for Mean

    • Select Delete to remove the row

    The final step is to transpose the data back to its original orientation.

    • Select Transform from the main toolbar

    • Select Create Transposed Spreadsheet... from the Transform drop-down menu

    • Select 2. Level for Column: and numeric for Data Type:

    • Select OK

    The layout of the new transposed spreadsheet is as follows: one sample per row with CpG island regions on columns; cell entries correspond to mean methylation status of the region (Figure 7). The column with a blank value for the column header is the average of all probes not associated with CpG island regions. You can delete this column if you like.

    Figure 7. Spreadsheet with average M values of probes in each CpG island for each sample

    • Right-click the transposed spreadsheet, 2_transpose

    • Select Save as... from the pop-up menu

    • Name it mvalues_typeII_probes_CpG_islands

    • Close the source temporary spreadsheet by selecting it in the spreadsheet tree and selecting ()

    The mvalues_typeII_probes_CpG_islands spreadsheet can be used as a starting point for ANOVA and other analyses. You can also repeat the steps above to create an equivalent spreadsheet for type I probes.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Select Next > (Figure 1)

    Figure 1. Adding a new track to Chromosome View

    The new track will be added to Chromosome View (Figure 2).

    Figure 2. Viewing isoform proportion track in Chromosome View

    At this point, you may find it useful to alter track properties. Each track can be individually configured. For example, isoform information will be easier to visualize if we remove a few tracks.

    • Select Cytoband (hg19) in the Tracks panel

    • Select Remove Track to remove it form the viewer

    • Repeat for Genomic Label, RefSeq Transcripts - 2017-05-02 (hg19) (-), Legend: Base Colors, and Genome Sequence

    Next, we are going to view a single gene, SLC25A3, with differentially expressed isoforms.

    • Type SLC25A3 in the Plot Position bar at the top of the window and hit Enter. The browser will browse to the gene

    To further improve our visualization of SLC25A3 isoforms, we can modify the remaining tracks.

    • Select RefSeq Transcripts - 2014-01-03 (hg19) (+) from the Tracks panel

    • Change Track height to 60 using the slider

    • Select Apply to change track height

    • Repeat steps to set each Bam Profile track to a height of 40 to complete our changes

    • Move the Isoform proportion track to below the RefSeq Transcripts track by selecting and dragging it up the list (Figure 3)

    Figure 3. Changing tracks in Chromosome View to facilitate visual analysis of isoform porportions

    The Muscle, Br_ain_, Heart, Liver, and genomic label tracks were described in a previous section. Here, the focus is on the Isoform proportion track, which visualizes differential expression and alternative splicing. The reads that are mapped to a certain sample and the proportion of the transcript expressed in that sample are colored to match the Bam Profile track of that sample. In this screenshot, Brain is yellow, Heart is green, Liver is red, and Muscle is orange

    SLC25A3 was reported by Wang, et al., (Nature, 2008) to have “mutually exclusive exons (MXEs)”. The reads mapped to the 3 transcripts of this gene in each of the tissue samples are shown in the Genome Viewer in the isoform proportion track. The relative abundances of the individual transcripts of this gene are shown by the height of the color coded bars on each transcript in the isoform proportion track. Note transcript NM_213611 has low expression while transcripts NM_005888 and NM_002635 have higher expression. Also note that NM_005888 is expressed primarily in the heart and muscle, indicated by the primarily green and orange bars, while NM_002635 is expressed primarily in the brain and liver, indicated by the primarily yellow and red bars.

    For additional tips on using the Chromosome View, refer to Visualizing mapped reads with Chromosome View.

    hashtag
    References

    Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., & Burge, C.B. Alternative isoform regulation in human tissue transcriptomes. Nature, 2008; 456: 470-6.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    menu in the main command bar. The data used in this tutorial was aligned using the Partek® Flow® software and saved as .BAM files.
    • To import the .BAM files, select Import and Manage Samples from the Import section of the RNA-Seq workflow. The Sequence Import dialog box will open (Figure 2)

    Figure 2. Importing .BAM files

    • Select BAM Files (*.bam) from the Files of type drop-down menu if not set by default

    • Use the file browser panel on the left-hand side of the Sequence Dialog or select Browse... to navigate to the folder where you stored the tutorial .BAM files

    • Files with checked boxes next to the file name will be imported. For this tutorial, select brain_fa, heart_fa, liver_fa, and muscle.fa

    • Select OK to confirm the file selection and open the next dialog (Figure 3)

    Figure 3. Viewing the Sequence Import wizard; specify Output file (and directory using Browse), Species, and Genome

    • Configure the dialog as shown (Figure 3)

    Output file provides a name for the top-level spreadsheet. Browse can be used to change the output directory.

    • Select Homo sapiens from the Species drop-down menu

    This will allow us to select a human genome reference assembly alignment.

    • Select hg19 for Genome/Transcriptome reference used to align the reads

    This is the reference genome our tutorial data was aligned to using Partek Flow.

    • Select OK to open the BAM Sample Manager dialog (Figure 4)

    Figure 4. Bam Sample Manager dialog

    The Bam Sample Manager dialog allows additional samples to be added or removed after the initial sample import. To remove a sample, select a sample from the list and then select Remove selected samples. This dialog also allows us to modify samples.

    • Select Manage samples to open the Assign files to samples dialog

    Sample ID is by default set to the file name, which may be too long or uninformative, so the Assign files to samples dialog can be used to give informative names to samples.

    • Change the samples names to Brain, Heart, Liver, and Muscle as shown (Figure 5)

    The Assign files to samples dialog also allows multiple .BAM files to be merged into one sample. This is useful if reads from one sample are split into multiple .BAM files.

    Figure 5. Changing sample names using the Assign files to samples dialog

    • Select OK to close the Assign files to samples dialog

    • Select Close to exit the Bam Sample Manager dialog and view the imported data (Figure 6)

    Figure 6. Viewing the imported data in a spreadsheet

    Additional files can be added to this spreadsheet using the Bam Sample Manager dialog. The Bam Sample Manager dialog can also be used to add imported samples to a separate spreadsheet by selecting a new option in the dialog, Add new experiment.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Figure 1. Creating a gene list using advanced options

    • Select Specify New Criteria to invoke the Configure Criteria dialog (Figure 2)

    Figure 2. Configuring criteria for transcripts with a p-value < 0.05

    • In the Configure Criteria dialog box (Figure 2), provide a name for the list (Diff Exp)

    • Select 1/transcripts (RNA-Seq_results.transcripts) from the_Spreadsheet_ drop-down menu

    • Select 8. p-value(DiffExp) from the Column drop-down menu

    • Set Include p-values to significant with FDR with a value of 0.05

    A list of 38,285 transcripts that pass this criteria will be generated according to the # pass score on the right-hand side of the dialog. If the settings are changed, this number will automatically update.

    • Select OK

    • Repeat the same steps to create a list of transcripts that are likely alternatively spliced, named Alt Splice, using the same p-value cutoff and Column set to 10. p-value (AltSplice) (Figure 3)

    Figure 3. Configuring criteria for a list of alternatively spliced genes

    • Select OK to generate Alt Splice

    • Select both lists in the right-hand panel under the Criteria panel while holding the Ctrl key on your keyboard

    • Select Intersection from the left-hand panel of the List Manager dialog (Figure 4)

    Figure 4. Creating a gene list at the intersection of two criteria

    • Enter a name for the criteria (Diff Exp and Alt Splice)

    • Select OK to close the naming dialog and OK again to close the list creation hint dialog

    • Select Save List from the Manage criteria section of the List Manager dialog (Figure 5)

    Figure 5. Saving a created list criteria

    • Select Diff Exp and Alt Splice in the List Creator dialog (Figure 6)

    Figure 6. Selecting list to save in List Creator dialog

    • Select OK to save the list

    • Select Close to exit the List Manager dialog and view the Diff_Exp_and_Alt_Splice spreadsheet (Figure 7)

    Figure 7. A list of the differentially expressed and alternatively spliced genes is now available for downstream analysis

    This list of differentially expressed and alternatively spliced transcripts will be used later in the tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Creating gene lists from ANOVA results
    The PCA scatter plot will open as a new tab (Figure 1).
    alt text

    Figure 1. Viewing the PCA scatter plot. Each point is a sample. Samples are colored by treatment.

    In this PCA scatter plot, each point represents a sample in the spreadsheet. Points that are close together in the plot are more similar, while points that are far apart in the plot are more dissimilar.

    To better view the data, we can rotate the plot.

    • Select () to activate Rotate Mode

    • Click and drag to rotate the plot

    Rotating the plot allows us to look for outliers in the data on each of the three principal components (PC1-3). The percentage of the total variation explained by each PC is listed by its axis label. The chart label shows the sum percentage of the total variation explained by the displayed PCs.

    We can change the plot properties to better visualize the effects of different variables.

    • Select () to open the Configure__Plot Properties dialog

    • Set Shape to 4. Batch

    • Set Size to 3. Time

    • Set Connect to 5. Treatment Combination

    • Select OK (Figure 2)

    Figure 2. Configuring plot properties to color by treatment, shape by batch, size by time, and connect by treatment combination

    The PCA scatter plot now shows information about treament, batch, and time for each sample (Figure 3).

    Figure 3. PCA scatter plot showing treatment, batch, and time information for each sample. A batch effect is clearly visible.

    PCA is particularly useful for identifying outliers and batch effects in data sets. We can see a batch effect in this dataset as samples separate by batch. To make this more clear, we can add an ellipses by Batch.

    • Select () to open the Configure__Plot Properties dialog

    • Select Ellipsoids from the tab

    • Select Add Ellipse/Ellipsoid

    • Select Ellipse

    • Select Batch from the Categorical Vairable(s) panel and move it to the Group Variable(s) panel

    • Select OK

    • Select OK to close the dialog

    The ellipses help illustrate that the data is spearated by batches (Figure 4).

    Figure 4. Ellipses around batch groups show that samples separate by batch

    Ways to address the batch effect in the data set will be detailed later in this tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    alt text

    Select Find Nearest Genomic Feature from the Peak Analysis section of the ChIP-Seq workflow

    The Output Overlapping Features dialog will open (Figure 1).

    Figure 1. Selecting a database for overlapping features

    With this dialog, you can specify the reference database.

    • Select RefSeq Transcripts 81 - 2017-08-02 or your preferred annotation database

    The promoter region can also be defined. The default settings are appropriate in this case.

    • Select OK

    The resulting spreadsheet, gene-list, is a child of the p-value_filtered spreadsheet (Figure 2). Each row represents a transcript.

    Figure 2. Viewing genes overlapped by regions

    Column 1. transcript chromosome gives the chromosome location of transcript

    Column 2. transcript start gives the start of transcript (inclusive)

    Column 3. transcript stop gives the end of transcript (exclusive)

    Column 4. strand gives the strand of the transcript

    Column 5. Transcript ID gives the identify of the transcript

    Column 6. Gene Symbol gives the identity of the gene

    Column 7. Distance to TSS gives the distance of each enriched region to the transcription start site in base pairs with positive indicating downstream and negative indicating upstream

    Column 8. Percent overlap with gene gives the percent of the gene that overlaps with the region

    Column 9. Percent overlap with region gives the percent of the region that overlaps with the gene

    Column 10.-23. These columns are detailed in Detecting peaks and enriched regions in ChIP-Seq data

    Percent overlap with gene is more likely to close to 1 in cases where one region covers several genes, in histone studies, for example. Percent overlap with region is likely to be close to 1 in cases where a region is relatively small and is found completely within a gene, in transcription factor binding studies, for example. If both columns are close to 1, then the gene and the region have nearly the same start and stop sites. If both columns are close to 0, then the region does not overlap with the gene directly and likely covers only the promoter region.

    hashtag
    Classifying regions by gene section

    Another way to interpret the genomic location of peaks is to use Classify regions by gene selection.

    • Select p-value_filtered from the spreadsheet tree

    • Select Classify regions by gene selection from the Peak Analysis section of the ChIP-Seq workflow

    The Output Overlapping Features dialog will open.

    • Select RefSeq Transcripts 81 - 2017-08-02 or your preferred annotation database

    The promoter region can also be defined. The default settings are appropriate in this case. The results can be further configured to give one result per detected region or one result per genomic feature. The default setting, one result per detected region, is appropriate in this case.

    • Select OK

    A new spreadsheet, gene-classification will be generated (Figure 3).

    Figure 3. Classifying regions by gene section

    Columns 1-6 have the same contents we saw in gene-list.

    Column 7. Gene Section gives the section of the gene that overlaps with the region

    Column 8. Distance to TSS gives the distance of each enriched region to the transcription start site in base pairs with positive indicates downstream and negative indicating upstream

    Column 9. Distance to nearest gene gives the distance of each enriched region to the nearest gene in base pairs with positive indicating downstream and negative indicating upstream

    Column 10. Sample ID gives the sample in which the region is enriched

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Select Gene-level

  • Specify the 1/gene_rpkm (RNA-Seq_results.gene.rpkm) spreadsheet from the Spreadsheet drop-down menu (Figure 1)

  • Figure 1. Choosing the type of differential expression analysis

    • Select OK to open the ANOVA dialog

    Available factors are listed in the Experimental Factor(s) panel on the left-hand side of the dialog.

    • Select Tissue, then select Add Factor > to move Tissue to the ANOVA Factor(s) panel on the right-hand side of the dialog (Figure 2)

    Figure 2. The ANOVA dialog

    If the ANOVA were now performed (without contrasts), a p-value for differential expression would be calculated, but it would only indicate if there are differences within the factor Tissue; it would not inform you which groups are different or give any information on the magnitude of the difference between groups (fold-change or ratio). To get this more specific information, you need to define linear contrasts.

    • Select Contrasts... to open the Configure dialog

    • For Select Factor/Interaction, Tissue will be the only factor available as it was the only factor included in the ANOVA model in the previous step; if multiple factors were included, they could be selected in the Select Factor/Interaction: drop-down menu. The levels in this factor are listed on the Candidate Level(s) panel on the left side of the dialog

    • For this data set, verify that No is selected for Data is already log transformed?

    • Left click to select muscle from the Candidate Level(s) panel and move it to the Group 1 panel (renamed muscle) by selecting Add Contrast Level > in the top half of the dialog. Label 1 will be changed to the subgroup name automatically, but you can also manually specify the label name

    • Select not muscle from the Candidate Level(s) panel and move it to the Group 2 panel (renamed not muscle)

    • The Add Contrast button can now be selected (Figure 3)

    • Select OK to return to the ANOVA dialog

    Figure 3. Defining linear contrasts

    • Select OK to perform the ANOVA as configured (Figure 4)

    Figure 4. Fully configured ANOVA

    Once the ANOVA has been performed on each gene in the data set, an ANOVA child spreadsheet ANOVA-1way (ANOVAResults) will appear under the gene_rpkm spreadsheet (Figure 5). The format of the ANOVA spreadsheet is similar for all workflows. Mouse over each column title for a description of the column contents.

    Figure 5. Viewing ANOVA results

    In this tutorial, the overall p-value for the factor (column 4) is the same as the p-value for the linear contrast (column 5) as there are only two levels within Tissue. If we had more than two groups, the overall p-value and the linear contrast p-values would most likely differ. You can also see the ? symbol in the ratio/fold-change columns (6 and 7) for several genes that also have a low p-value because there are zero reads in one of the groups, thus making it impossible to calculate ratios and fold-changes between groups.

    For using ANOVA with more complicated experimental designs, including multiple factors and linear contrasts, please refer to Identifying differentially expressed genes using ANOVA in the Gene Expression Analysis tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    , the β-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels.

    Because we are performing differential methylation analysis, Partek Genomics Suite automatically creates an M-values spreadsheet to use for statistical analysis.

    • Select 2. Cell Type and 3. Gender from the Experimental Factor(s) panel

    • Select Add Factor > to move 2. Cell Type and 3. Gender to the ANOVA Factor(s) panel (Figure 1)

    Figure 1. ANOVA setup dialog. Experimental factors listed on the left can be added to the ANOVA model.

    • Select Contrasts...

    • Leave Data is already log transformed? set to No

    • Leave Report comparisons as set to Difference

    For methylation data, fold-change comparisons are not appropriate. Instead, comparisons should be reported as the difference between groups.

    • Select 2. Cell Type from the Select Factor/Interaction drop-down menu

    • Select LCLs

    • Select Add Contrast Level > for the upper group

    • Select B cells

    • Select Add Contrast Level > for the lower group

    • Select Add Contrast (Figure 2)

    Figure 2. Configuring ANOVA contrasts

    • Select OK to close the Configuration dialog

    The Contrasts... button of the ANOVA dialog now reads Contrasts Included

    • Select OK to close the ANOVA dialog and run the ANOVA

    If this is the first time you have analyzed a MethylationEPIC array using the Partek Genomics Suite software, the manifest file may need to be configured. If it needs configuration, the Configure Annotation dialog will appear (Figure 3).

    • Select Chromosome is in one column and the physical location is in another column for Choose the column configuration

    • Select Ilmn ID for Marker ID

    • Select CHR for Chromosome i

    • Select MAPINFO for Physical Position

    • Select Close

    This enables Partek Genomics Suite to parse out probe annotations from the manifest file.

    Figure 3. Processing the annotation file. User needs to point to the columns of the annotation file that contain the probe identifier as well as the chromosome and coordinates of the probe.

    The results will appear as ANOVA-2way (ANOVAResults), a child spreadsheet of mvalue. Each row of the spreadsheet represents a single CpG locus (identified by Column ID).

    Figure 4. ANOVA spreadsheet. Each row is a result of an ANOVA at a given CpG locus (identified by the Column ID column). The remaining columns contain annotation and statistical output

    For each contrast, a p-value, Difference, Difference (Description), Beta Difference, and Beta Difference (Description) are generated. The Difference column reports the difference in M-values between the two groups while the Beta Difference column reports the difference in beta values between the two groups.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Du and colleaguesarrow-up-right

    Exploring gene expression data

    At this point in analysis, you should explore the data preliminarily. Do the genes you expected to be differentially regulated appear to have larger or smaller intensity values? Do similar samples resemble each other?

    The latter question can be explored using Principal Components Analysis (PCA), an excellent method for reducing and visualizing high-dimensional data.

    • Select PCA Scatter Plot from the QA/AC section of the Gene Expression workflow

    A Scatter Plot tab containing your PCA plot will open (Figure 1).

    Figure 1. PCA Scatter Plot tab

    In the scatter plot, each point represents a chip (sample) and corresponds to a row on the top-level spreadsheet. The color of the dot represents the Type of the sample; red represents a normal sample and blue represents a Down syndrome sample. Points that are close together in the plot have similar intensity values across the probe sets on the whole chip, while points that are far apart in the plot are dissimilar

    Left-clicking on any point in the scatter plot selects that point. A dash with an identifying row number will appear on the selected PCA plot point. The spreadsheet in the Analysis tab will also jump to the corresponding row.

    While pressing the mouse wheel down, drag the mouse to rotate the plot or select the Rotate Mode icon () on the left side of the Scatter Plot tab. With Rotate Mode selected, press the left mouse button and drag to rotate the plot. Rotating the plot allows you to examine the grouping pattern or outliers of the data on the first three principal components (PCs).

    Scrolling the mouse wheel up or down while the cursor is on the PCA plot will zoom in and out or select the Zoom Mode icon () on the left side of the Scatter Plot tab.

    Selecting the Reset icon () option on the left side of the Scatter Plot tab will return the PCA plot to its original orientation and zoom.

    As you can see from rotating the plot, there is no clear separation between Down syndrome and normal samples in this data since the red and blue samples are not separated in space. However, there are other factors that may separate the data.

    • In the Scatter Plot tab, select the Rendering Properties icon () and configure the plot as shown (Figure 2)

    • Color the points by column 4. Tissue and Size the points by column 3. Type

    • Select OK

    Figure 2. Configuring the PCA scatter plot: Color by Tissue, size by Type

    Notice now that the data are clustered by different tissues (Figure 3).

    Figure 3. PCA scatter plot configured with color by Tissue, size by Type

    Another way to see the cluster pattern is to put an ellipse around the Tissue groups.

    • Open the Plot Rendering Properties dialog and select the Ellipsoids tab

    • Select Add Ellipse/Ellipsoid

    • Select Ellipse in the Add Ellipse/Ellipsoid... dialog

    Figure 4. Adding Ellipses to PCA Scatter Plot

    By rotating this PCA plot, you can see that the data is separated by tissues, and within some of the tissues, the Down syndrome samples and normal samples are separated. For example, in the Astrocyte and Heart tissues, the Down syndrome samples (small dots) are on the left, and the normal samples (large dots) are on the right (Figure 5).

    Figure 5. PCA scatter plot with ellipses, rotated to show separation by Type

    PCA is an example of exploratory data analysis and is useful for identifying outliers and major effects in the data. From the scatter plot, you can see that the tissue is the biggest source of variation. There are many genes that express differently between the tissues, but not as many genes that express differently between type (Down syndrome and normal) across the whole chip.

    The next step is to draw a histogram to examine the samples.

    • Select Sample Histogram in the QA/QC section of the Gene Expression workflow to generate the Histogram tab (Figure 6)

    Figure 6. Histogram tab

    The histogram plots one line for each of the samples with the intensity of the probes graphed on the X-axis and the frequency of the probe intensity on the Y-axis. This allows you to view the distribution of the intensities to identify any outliers. In this dataset, all the samples follow the same distribution pattern indicating that there are no obvious outliers in the data. As demonstrated with the PCA plot, if you click on any of the lines in the histogram, the corresponding row will be highlighted in the spreadsheet 1 (Down_Syndrome-GE). You can also change the way the histogram displays the data by clicking on the Plot Properties button. Feel free to explore these options on your own.

    The decision to discard any samples would be based on information from the PCA plot, sample histogram plot, and QC metrics. To discard a sample and renormalize the data (without the effects of the outlier), start over with importing samples and omit the outlier sample(s) during the .CEL file import.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Perform gene set and pathway analysis

    To perform gene set and pathway analysis, we need to create a list of genes that overlap with differentially methylated CpG loci.

    • Select LCLs_vs_B_cells_CpG_Islands in the spreadsheet tree

    • Select Find Overlapping Genes from the Analysis section of the workflow

    The Output Overlapping Features dialog will open (Figure 1). This dialog allows you to choose the annotation database that will define where gene are located. By default the promoter region will be defined as 5000 base pairs upstream and 3000 base pairs downstream from the transcription start site.

    Figure 1. Selecting Finding Overlapping Genes form the main toolbar

    • Select Ensembl Transcripts release 75 from the Report regions from the specified database options

    • You can select a name for the new list, we have named it gene-list

    • Select OK

    A new spreadsheet will be created as a child spreadsheet (Figure 2)

    Figure 2. Annotating the differentially methylated CpG loci with genes

    Partek Genomics Suite offers several tools to help interpret this list of genes. First, let's look at Gene Set Analysis.

    • Select Gene Set Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow

    • Select GO Enrichment for Select the method of analysis

    • Select Next >

    Figure 3. Configuring the parameters of the test

    • Select Next >

    • Select Default Mapping File for Select the method of mapping genes to genes sets

    • Select Next >

    A new spreadsheet will be created with categories ranked by enrichment score and the Gene Ontology Browser will launch to graphically display the results of the spreadsheet (Figure 4). The results show which gene sets are over represented in the list of genes overlapped by differentially regulated CpG loci between the experimental and control groups.

    Figure 4. GO enrichment browser showing gene groups overrepresented in the list of genes which overlap with differentially methylated CpG loci

    To get a better idea whether genes associated with these GO terms have increated or decreased methylation, we can view the Forest Plot.

    • Select the Forest Plot tab

    Go terms are listed by the number of significantly up-regulated genes, with the percent up-regulated and down-regulated shown in red to green bars. Here, we see that most GO terms show increased methylation in their associated genes (Figure 4).

    Figure 5. Gene Ontology Forest Plot

    Next, we can perform Pathway Analysis to see which pathways are over represented in the gene overlapped by differentially regulated CpG loci.

    • Select gene-list from the spreadsheet tree

    • Select Pathway Analysis from the Biological Interpretation section of the Illumina BeadArray Methylation workflow

    • Select Pathway Enrichment for Select the method of analysis

    The Pathway-Enrichment spreadsheet will be added to the spreadsheet tree in Partek Genomics Suite and the Partek® Pathway™ software will open to provide visualization of the most significantly enriched pathway as a pathway diagram (Figure 5). The color of the gene boxes reflects p-values of the associated differentially methylated CpG loci (bright orange is insignificant, blue is highly significant). The Color by option can be changed another column from the gene-list.txt spreadsheet, such as Difference.

    Figure 6. : Partek Pathway illustrating one of the pathways overrepresented in the list of genes overlapping the differentially methylated CpG sites.

    The Pathway-Enrichment spreadsheet can also be viewed in Partek Pathway by switching to the Pathway-Enrichment section of the menu tree on the left-hand side of the window. From the spreadsheet view, you can select a pathway name to visualize that pathway. Alternatively, you can open a pathway visualization in Partek Pathway from the Pathway-Enrichment spreadsheet in Partek Genomics Suite by right-clicking on a row and selecting Show pathway... from the pop-up menu. Please note that if you have closed Partek Pathway and have reopened it, you will need to import a gene list if you want to color the visualization by attributes form the gene list. For more information about using Partek Pathway, checkout our .

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Importing Affymetrix CEL files

    Download the data from the Partek site to your local disk. The zip file contains both data and annotation files.

    • Unzip the files to C:\Partek Training Data\Down_Syndrome-GE or to a directory of your choosing. Be sure to create a directory or folder to hold the contents of the zip file

    • Copy or move the annotation files (HG-U133A.cdf, HG-U133A.na36.annot, HG-U133A.na36.annot.idx) to C:\Microarray Libraries.

    Copying the annotation files to the default library location is done because newer annotation files that are released after the publication of this tutorial may cause the results to be different than what is shown in the published tutorial. If, however, you prefer to download the latest version, you may omit copying the HG-U133A files to C:\Microarray Libraries.

    • Start Partek® Genomics Suite® and select Gene Expression from the Workflows panel on the right side of the tool bar in the main window (Figure 1)

    Figure 1. Selecting the gene expression workflow

    • Select Import Samples under the Import section of the workflow

    • Select Import from Affymetrix CEL Files and then select OK

    • Select the Browse button and select the C:\Partek Training Data\Down_Syndrome-GE folder. By default, all the files with a .CEL extension are selected (Figure 2)

    Figure 2. Selecting the folder and CEL files for the experiment

    • Select the Add File(s) > button to move all the .CEL files to the right panel. Twenty-five CEL files will be processed

    • Select the Next > button to open the Import Affymetrix CEL Files dialog (Figure 3)

    Figure 3. Configuring import files window

    • Select Customize… to open the Advanced Import Options dialog (Figure 4)

    Figure 4. Configuring the Advanced Import Options dialog

    • Select Library Files… to open the Specify File Locations dialog (Figure 5). This dialog is used to specify the location of the library folder and the annotation files

    Figure 5. Specifying Microarray Library files or change the default library directory

    Partek Genomics Suite will automatically assign the annotation files according to the chip type stored in the .CEL files. If the annotation files are not available in the library directory, Partek Genomics Suite will automatically download and store them in the Default Library File Folder.

    • The default library location can be modified by selecting Change... in the Default Library File Folder panel. By default, the library directory is at C:\Microarray Libraries. This directory is used to store all the external libraries and annotation files needed for analysis and visualization. The library directory can also be modified from Tools > File Manager in the main Partek Genomics Suite menu

    • Select OK (Figure 5) to close the Specify File Locations dialog

    • Select the Outputs tab from the Advanced Import Options

    Figure 6. Specifying Advanced Import Options to create chip images of and extract the scan date from the CEL files

    • In the Extract Time Stamp and Date from CEL File panel, make sure the Date button is selected to extract the chip scan date. This information can help you detect if there are batch effects caused by the process time

    • In the Quality Assess of Gene Expression panel, leave the QC report button unselected. A user guide for the microarray data quality assessment and quality control features is available in the User’s Manual

    • Select OK to exit the Advanced Import Options dialog

    After importing the .CEL files has finished, the result file will open in Partek Genomics Suite as a spreadsheet named 1 (Down_Syndrome-GE). The spreadsheet should contain 25 rows representing the micoarray chips (samples) and over 22,000 columns representing the probe sets (genes) (Figure 7).

    Figure 7. Viewing the main or top-level spreadsheet

    For additional information on importing data into Partek Genomics Suite, see Chapter 4 Importing and Exporting Data in the Partek User’s Manual. The User’s Manual is available from the Partek Genomic Suite software main menu Help > User’s Manual. The (Help > On-line Tutorials > FAQ) may also be helpful. As this tutorial only addresses some topics, you may need to consult the User’s Manual for additional information about other useful features.

    It is recommended that you are familiar with Chapter 6 The Pattern Visualization System of the User’s Manual before going through the next section of the tutorial.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Cox Regression Analysis

    hashtag
    Introducing Cox Regression

    Cox regression (Cox proportional-hazards model) tests the effects of several factors (predictors) on survival time. Predictors that lower the probability of survival at a given time are called risk factors; predictors that increase the probability of survival at a given time are called protective factors. The Cox proportional-hazards model are similar to a multiple logistic regression that considers time-to-event rather than simply whether an event occurred or not.

    In this tutorial, we will use Cox Regression to test the effects of tumor gene expression on survival time while accounting for tumor size.

    hashtag
    Performing Cox Regression Analysis

    To begin, you should have the Survival Tutorial data set open in Partek Genomics Suite .

    • Select Stat from the main toolbar

    • Select Survival Analysis then Cox Regression from the Stat menu (Figure 1)

    Figure 1. Invoking Cox Regression

    The Cox Regression dialog will open. Please note that in this tutorial data set, column 1. Survival (years) indicates the survival time of each patient in years and column 2. Event indicates the event status for each patient, death or censored.

    • Set Time Variable to 1. Survival (years) using the drop-down menu

    • Set Event Variable to 2. Event using the drop-down menu

    Only numeric data are displayed in the Time Variable drop-down list and only categorical data with two categories are displayed in Event Variable.

    • Set Event Status to death using the drop-down menu (Figure 2)

    Event Status should be set to the primary event outcome. All Response Variables will be automatically selected for Predictor. This means that Cox Regression will test every probe set for association with the survival (time-to-event).

    Figure 2. Configuring the Cox Regression dialog

    Co-predictors are numeric or categorical factors that will be included in the regression model. To evaluate the association between tumor size and gene expression, we can add tumor size to the co-predictors list.

    • Select 7. tumor size (mm) from the Candidate(s) panel

    • Select Add Factor > to add it to the Co-predictor(s) panel

    Advanced options such as the inclusion of interactions between predictors and co-predictors can be accessed by selecting Model... (Figure 3) and the Results... button invokes a dialog (Figure 4) with additional output options for the results spreadsheet. We do not need to adjust any of the advanced model or output options for this tutorial.

    Figure 3. Configuring advanced options for Cox Regression

    Figure 4. Configuring output options for Cox Regression

    • Select OK to run Cox Regression (Figure 5)

    Figure 5. Configuring Cox Regression to assess the effect of gene expression and tumor size on survival

    The spreadsheet generated by Cox Regression (Figure 6) includes a row for each probe set; the columns provide the following information:

    1. Column # - Column number of probe set in probe intensities spreadsheet

    2. Probest ID - ID of probe set in probe intensities spreadsheet

    3. HRatio(gene) - Hazard ratio for the probe set

    4. LowCI(gene) - lower 95% confidence boundary of the hazard ratio for the probe set

    5. UpCI(gene) - upper 95% confidence boundary of the hazard ratio for the probe set

    6. p-value(gene) - P-value of the corresponding Chi-squared test. A low value indicates that the predictor (probe set) poses a large hazard or is associated with shortened surivival time

    7. to 10. - Effects of the co-predictor on survival time; for each co-predictor, a similar set of columns is added

    11. modelfit(0) - P-value of the test assessing the overall model fit, i.e., the relationship between survival time, the predictor, and co-predictors in the model. A modelfit value of > 0.05 indicates a low association between the predictor and/or co-predictors with survival time.

    Please note that the Cox Regression results spreadsheet is a temporary file. If you would like to be able to view the spreadsheet again after closing Partek Genomics Suite, be sure to save it by selecting the Save Active Spreadsheet icon ().

    Figure 6. Cox Regression results spreadsheet

    The hazard ratio is an effect size measure used to assess the direction and magnitude of the effect of a predictor variable on the relative likelihood of the event occurring at any given point in time, controlling for other predictors in the model.

    For continuous predictors, such as gene expression values and tumor size, the hazard ratio is the predicted change in the hazard for a unit increase in the predictor. A hazard ratio greater than 1 indicates that the predictor is associated with shorter time-to-event, hazard ratio less than 1 indicates that the predictor is associated with greater time-to-event, and a hazard ratio of 1 indicates that the predictor has no effect on time-to-event. For categorical predictors, the hazard ratio is relative to the reference category.

    For any probe set, we can view a detailed HTML report.

    • Right-click the row header for row 1

    • Select HTML Report from the pop-up menu (Figure 7)

    Figure 7. Invoking an HTML report for a probe set

    The HTML report (Figure 8) will open in your default web browser.

    Figure 8. Cox Regression HTML report

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Creating a list of regions

    In this tutorial, the experimental goal is to identify regions with copy number changes in multiple patients. To do this, we will create a list containing deleted and amplified regions across the genome shared by 8 or more samples.

    • Select Create Region List from the Copy Number Analysis section of the Copy Number workflow

    • Select Specify New Criteria

    We want to include all the amplified regions across the genome shared by at least 8 samples in our first criteria (Figure 1).

    • Set Name to Amplified

    • Set Spreadsheet to 2/segmentation/summary (segment-analysis)

    • Set Column to 6. Total Amplifications using the drop-down menu

    The # pass should be 86, indicating that 86 regions meet the criteria.

    • Select OK

    Figure 1. Configuring the Amplified criteria

    • Select Save to save the list

    • Select OK to confirm that you would like to save Amplified as a list

    • Select Close to exit the List Creator dialog

    Amplified is now open in the Analysis tab as a child spreadsheet of segmentation. Although this list contains regions amplified in 8 or more samples, some samples may also contain deletions in the same regions. For downstream analysis, we may want to filter out these regions to create a final list with only amplified regions. Here, we will use the interactive filter.

    • Select the Amplified spreadsheet

    • Select to open the interactive filter

    • Set the Column drop-down list to 8. Total Deletions

    This will apply a filter excluding any region with deletions (Figure 2).

    Figure 2. Interactive filter excluding regions with deletions.

    The yellow and black bar on the right-hand side of the spreadsheet indicates the porportion of rows that have been filtered. Next, we can save the filtered list.

    • Right-click the Amplified spreadsheet in the spreadsheet trees

    • Select Clone... from the pop-up menu

    • Set the Name of the new spreadsheet to amplified_only

    The new spreadsheet is a temporary file. To keep the spreadsheet, we need to save it.

    • Select amplified_only in the spreadsheet tree

    • Select

    • Set the file name as a****mplified_only

    The amplified_only spreadsheet contains 60 rows and includes regions that were amplified in 8 or more samples and not deleted in any sample.

    To create a list of regions only deleted in 8 or more samples, repeat the above steps for deleted regions. You should create a final list, deleted_only, with 92 regions.

    Next, we can merge the two lists to create a spreadsheet with both deleted and amplified regions.

    • Select File from the main taskbar

    • Select Merge Spreadsheets...

    • Select the Append Rows tab

    Figure 3. Merging amplified and deleted spreadsheets

    • Select the new spreadsheet, amplified_or_deleted in the spreadsheet tree

    • Select to save the spreadsheet

    This spreadsheet, amplified_or_deleted, will be used as the basis for the downstream steps in this analysis.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Optional: Use MethylationEPIC for CNV analysis

    Although the 450K and MethylationEPIC arrays were initially designed to analyze DNA methylation, they are essentially a dense SNP array and can be used for copy number analysis (Feber et al. 2014). The probe intensity data is easily parsed from the idat files by using the Additional Probe Data Spreadsheet Selection dialog (Figure 1) when importing the raw data. Examining the raw intensity data can also be useful for QA/QC purposes.

    Follow the steps for importing Illumina methylation data detailed in Import and normalize methylation data until you reach the Import Illumina iDAT Data dialog with Manifest File and Output File panels (Figure 1).

    Figure 1. Customizing output during data import

    • Select Customize... to open the Advanced Import Options dialog

    • Choose No normalization in the Normalization section of the Algorithm tab

    • Select the Outputs tab (Figure 2)

    Figure 2. Selecting additional probe data to include during data import

    Information about the different output options can be found by selecting the adjacent () icon.

    Detection p-values. This is the confidence score that the signal of a probe was significantly higher than the background defined by negative control probes. Selecting this checkbox produces a spreadsheet ending with '_detectionp' in addition to the spreadsheet containing beta values. Each row of the _detectionp spreadsheet will be a different sample and the sample names will end in '_detectionp'. This spreadsheet can be used to filter out probes that do not show signal above background.

    Probe Intensity. This is the sum of the methlyated and unmethylated intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_probe’ in addition to the spreadsheet containing beta values. Each row of the _probe spreadsheet will be a different sample and the file names will also end in ‘_probe.’ The probe intensity values will be log2 transformed by default (note that the beta values are not log2 transformed).

    Probe Signal. This option will become available if Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_probe.’ The methylated and unmethylated intensities are shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_meth’ or ‘_unmeth’ for methylated and unmethylated values, respectively. The probe intensity values will be log2 transformed by default.

    Raw Probe Intensity. This is the sum of the raw red and green signal intensities per probe. Selecting this checkbox produces a spreadsheet ending with ‘_raw’ in addition to the spreadsheet containing beta values. Each row of the spreadsheet will be a different sample and the file names will also end in ‘_raw.’ The raw probe intensity values will be log2 transformed by default.

    Raw Probe Signal. This option will become available if Raw Probe Intensity is selected. Selecting this checkbox produces a spreadsheet ending with ‘_raw.’ The red and green intensities will be shown on separate rows for each sample, in addition to the summed values. The sample names will end in ‘_red’ or ‘_green’ for red and green values, respectively. The raw probe intensity values will be log2 transformed by default.

    Antilog Probe Intensity Values. Selecting this checkbox will show the probe intensity data without log2 transformation.

    Create NCBI GEO Submission Spreadsheets. Generates matrix processed and matrix signal intensities spreadsheets for GEO submission.

    How you proceed depends on your study design. Here is an example series of steps to prepare the tutorial data set for copy number analysis:

    • Select Probe Intensity and Antilog Probe Intensity Values (Figure 2)

    • Select OK to close the Advanced Import Options dialog

    • Select Import to import the data and perform the selected normalization method

    Configure the Normalize to Baseline 1 dialog as shown (Figure 3)

    • Select Use control set form this spreadsheet

    • Set Control Category to B cells

    • Select Ratio to baseline from the Normalization Method section

    Figure 3. Configuring normalize to baseline

    • Select OK to generate the spreadsheet

    This spreadsheet contains copy number values per probe in log2 space (i.e. diploid = 0). Prior to performing copy number analysis, you can normalize for local GC abundance.

    • Select Transform

    • Select Adjust Based on Local GC Content...

    • Click OK to run Local GC Adjustment (Figure 4)

    Figure 4. Adjusting for local GC content

    The GC adjusted spreadsheet is the starting spreadsheet for copy number analysis. You can now switch over to the Copy number workflow, skip the Create copy number step, and begin with the Detect amplifications and deletions step. Consult the user's guide for the copy number workflow for subsequent steps.

    hashtag
    References

    Feber A, Guilhamon P, Lechner M, et al. Using high-density DNA methylation arrays to profile copy number alterations. Genome Biology. 2014;15(2):R30. doi:10.1186/gb-2014-15-2-r30.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Kaplan-Meier Survival Analysis

    hashtag
    Introduction to Kaplan-Meier

    The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function where time-to-event incidence varies over time in a population. The Kaplan-Meier estimator is displayed as a Kaplan-Meier curve, a series of declining horizontal steps. The Kaplan-Meier curve should approach the true survival curve for the population with a sufficiently large sample size. Kaplan-Meier survival analysis can handle censored data, i.e., data where the event is not observed for some subjects.

    To perform Kaplan-Meier survival analysis, at least two pieces of information (one column each) must be provided for each sample: time-to-event (a numeric factor) and event status (categorical factor with two levels). Event status indicates whether the event occurred or the subject was censored (did not experience the event). Time-to-event indicates the time elapsed between the enrollment of a subject in the study and the occurrence of the event.

    Common examples of Kaplan-Meier analysis include the fraction of patients who remain disease-free after cancer remission. In this case, the event would be disease recurrence and patients would be listed as censored if they do not experience recurrence during the study or if they drop out of the study before experiencing recurrence.

    Partek Genomics Suite does not impose any limitation on the labels used for the event and censored categories; in this tutorial, the events are coded as either "death" or "censored". If a subject is still alive at the end of the study, time-to-event indicates the period between enrollment and the end of the study. If a subject dropped out of the study, time-to-event indicates the period between enrollment and the last recorded time point.

    hashtag
    Performing Kaplan-Meier Survival analysis

    To begin, you should have the Survival Tutorial data set open in Partek Genomics Suite .

    • Select Stat from the main toolbar

    • Select Survival Analysis then Kaplan-Meier from the Stat menu (Figure 1)

    Figure 1. Invoking Kaplan-Meier

    The Kaplan-Meier dialog will open. Please note that in this tutorial data set, column 1. Survival (years) indicates the survival time of each patient in years and column 2. Event indicates the event status for each patient, death or censored.

    • Set Time Variable to 1. Survival (years) using the drop-down menu

    • Set Event Variable to 2. Event using the drop-down menu

    Only numeric data are displayed in the Time Variable drop-down list and only categorical data with two categories are displayed in Event Variable.

    • Set Event Status to death using the drop-down menu (Figure 2)

    Event Status should be set to the primary event outcome.

    Figure 2. Configuring the Kaplan-Meier dialog

    • Select 3. p53 status from the Candidate(s) panel

    • Select Add Factor > to add 3. p53 status to the Strata (Categorical) panel

    This will test the difference in survival rates between the p53 mutants (mutant) and samples with wild-type p53 (wt).

    • Select OK to run the test (Figure 3)

    Figure 3. Configuring the Kaplan-Meier dialog to test the difference in survival rates between patients with different p53 status

    The Kapan-Meier Plot will open in a new tab (Figure 4).

    Figure 4. Kaplan-Meier plot comparing the survival curves between two groups.

    The horizontal axis indicates time-to-event; the vertical axis shows the cumulative percentage of survival. Censoring is shown as a triangle; event occurrence is shown as a step-down in the plot. Partek Genomics Suite performs two statistical tests to compare the survival curves: a log-rank test and the Wilcoxon-Gehan test. Low p-values indicate that the groups have significantly different survival times.

    • Select the Analysis tab to switch to the Kaplan-Meier results spreadsheet (Figure 5)

    Figure 5. Kaplan-Meier spreadsheet. Each row represents occurrence of at least one significant event.

    The spreadsheet is organized into two sections: the analysis of the p53 mutant group and the analysis of the p53 wild type group. Each row represents a time point at which at least one event occurred; the columns provide the following information:

    1. Identifies the group membership (according to the strata)

    2. Survival time corresponds to the entries in column 1. of the original (Survival_Tutorial) spreadsheet. At each given time, at least one event, either death or censored, was recorded.

    3. Probability of Survival: cumulative probability of survival at a given time point (also known as KM survival estimate). Cumulative probability is the probability of surviving all of the intervals before this time point. As time increases, the cumulative survival probabilities decreases as events occur.

    4. Number of group members at risk (i.e., have not experienced the event). The count in each row is calculated by subtracting the number of deaths and censored events in the row above from the number at risk in the row above.

    5. Count of deaths at this time point in the group

    6. Count of censored events at the given time in the group

    7. Total number of deaths in all groups at the given time

    8. Total number of participants at risk in all groups. The count in each row is calculated by subtracting the number of deaths and censored events at the previous time point in both groups from the total number at risk at the previous time point

    9. Natural logarithm of column 3.; also noted as ln(KM)

    10. Natural logarithm of the negative value of column 9., i.e., ln(-ln(KM)). A plot of ln(-ln(KM) vs. ln(t) is often used to test the proportional hazards assumption. To visualize the risk, select this column and select View > Log Log S Plot (Figure 6).

    Please note that the Kaplan-Meier results spreadsheet is a temporary file. If you would like to be able to view the spreadsheet again after closing Partek Genomics Suite, be sure to save it by selecting the Save Active Spreadsheet icon ().

    Figure 6. Log Log S plot of KM data. As the lines are mostly parallel and do not cross, the log-rank test assumptions are valid. The Wilcoxon-Gehan test has more power if the lines had crossed or were not parallel but performs less well when there is extensive censored data

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Adding sample attributes

    Now that the data has been imported, we need to make a few changes to the data annotation before analysis.

    hashtag
    Modifying sample attributes

    Notice that the Sample ID names in column 1 are gray (Figure 1). This indicates that Sample ID is a text factor. Text factors cannot be used as a variable in downstream analysis so we need to change Sample ID to a categorical factor.

    Figure 1. Viewing the imported data in a spreadsheet

    • Right-click on the column header to invoke the pop-up menu

    • Select Properties (Figure 2)

    Figure 2. Changing column properties

    • Configure the Properties of Column 1 in Spreadsheet 1 dialog as shown (Figure 3) with Type set to categorical and Attribute set to factor

    Figure 3. Changing column 1 properties

    • Select OK

    The samples names in column 1 are now black, indicating that they have been changed to a categorical variable. Next, we will add attributes for grouping the data.

    hashtag
    Adding sample attributes

    • From the RNA-seq workflow panel, select Add Sample Attributes to bring up the Add Sample Attributes dialog (Figure 4)

    Figure 4. Add Sample Attributes dialog

    • Select Add a Categorical Attribute

    • Select OK to bring up the Create categorical attribute dialog

    Creating a categorical sample attribute allows us to group samples. This is useful for designating samples as replicates, as members of an experimental group, or as sharing a phenotype of interest. In this tutorial, we have four different samples from different tissues and different donors, but to illustrate the available statistical analysis options, we need to divide the samples into two groups: muscle (Heart and Muscle) and not muscle (Brain and Liver).

    • Set Attribute name: as Tissue

    • Rename Group 1 to muscle and Group 2 to not muscle

    • Select and drag the samples from the Unassigned panel to the correct group panel (Figure 5)

    Figure 5. Creating a categorical attribute

    • Select OK

    • Select No from the Add another attribute? dialog

    • Select Yes from the Save spreadsheet 1 dialog

    The attribute will now appear as a new column in the RNA-seq spreadsheet with the heading Tissue and the groups muscle and not muscle.

    hashtag
    Choosing Sample ID column

    The next available step in the Import panel of the RNA-seq workflow is Choose Sample ID Column_._ Verifying the correct column is designated the Sample ID becomes particularly important when data from multiple experiments is being combined.

    • Select Choose Sample ID Column from the Import panel of the RNA-Seq workflow

    • Select OK (Figure 6)

    Figure 6. Choosing the correct column as Sample ID

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Visualize methylation at each locus

    Partek Genomics Suite enables you to visualize each probe and compare the methylation between the groups at a single CpG site level.

    • Right click row 5_. SBNO2_ in the LCLs_vs_B_Cells_CpG_Islands spreadsheet

    • Select Browse to Location from the pop-up menu

    Figure 1. Browsing to location from spreadsheet with differentially expressed genes

    The Chromosome View tab will open, zoomed in to the selected CpG locus in SBNO2 (Figure 2).

    Figure 2. Viewing location in Genome Viewer

    The Chromosome View visualization is composed of a series of tracks corresponding to annotation files and data files.

    • RefSeq Transcripts 2017-05-02 (hg19) (+): transcripts coded by the positive strand

    • RefSeq Transcripts 2017-05-02 (hg19) (-): transcripts coded by the negative strand

    • Regions: by default, difference in methylation (M-value) between the groups

    To modify a track, select it in the Tracks panel to bring up its configuration options panel below the Tracks panel. Let's modify a few tracks to improve our visualization of the data.

    • Select the Regions track, opens to Profile tab

    • Select Color tab

    • Set Color bars by to Difference (LCLs vs. B cells) (Description)

    This will color regions by up or down methylated.

    • Select the Heatmap (1/mvalue)

    • Select Remove Track

    • Select Bar Chart (Methylation) located directly below the Regions track

    We can now more clearly see the Difference in M values for the region in the Regions track, the heatmap of beta values in the Heatmap track, and the beta value for the loci of the selected sample in the Bar Chart track.

    • Select a sample on the heatmap to view its beta value in the Bar Chart track (Figure 3)

    Figure 3. Modify the tracks of the Genome Viewer to facilitate visual analysis

    The New Track button allows new tracks to be added to the viewer, while the Remove Track button removes the selected track from the viewer. Tracks can be reordered by selecting a track in the Tracks panel and dragging it up or down to move it in the list. In the Chromosome View, select () for selection mode and () for navigation mode. In navigation mode, left-click and draw a box on any track to zoom in. All tracks are synced and will zoom together. Zooming can also be controlled using the interface in the lower right-hand corner of the tab (). View can be reset to the whole chromosome level using reset zoom (). Searching for a gene or transcript in the position box will also zoom directly to its location.

    The available tracks can be supplemented with a special annotation file that can be built using a UCSC annotation file as the basis. Building and viewing the UCSC annotation file is available as an optional section of the tutorial, .

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Detect differentially expressed genes with ANOVA

    Analysis of variance (ANOVA) is a very powerful technique for identifying differentially expressed genes in a multi-factor experiment. In this data set, ANOVA will be used to generate a list of genes that are significantly differentially regulated by each treatment.

    hashtag
    Adding factors and interactions

    When setting up the ANOVA, the primary factors of interest, Treatment and Time, should be included. We will also include the interaction between Treatment and Time, Treatment * Time, because we are interested in whether different treatments behave differently over time. From our exploratory analysis using PCA, we also know that Batch is a major source of variation and needs to be included. Including

    RNA-Seq mRNA quantification

    We are now ready to measure gene expression in our dataset. To do this, we will use the mRNA quantification task in the Analyze Known Genes section of the RNA-Seq workflow. mRNA quantification creates spreadsheets showing expression at exon, transcript, and gene levels and reports raw and normalized reads for each sample.

    Please note that the normalization method used by Partek Genomics Suite is Reads Per Kilobase per Million mapped reads (RPKM) (Mortazavi et al. 2008). In brief, this normalization method counts total reads in a sample, divides by one million to create a per million scaling factor for each sample; then divides the read counts for the feature (exon, transcript, or gene) by the per million scaling factor to normalize for sequencing depth and give a reads per million value; and finally divides reads per million values by the length of the feature (exon, transcript, or gene) in kilobases to normalize for feature size.

    • Select 1 (RNA-Seq) from the spreadsheet tree

    Perform GO enrichment analysis

    One of the main functions of GO enrichment is to find the overrepresentation of functional categories in a gene list. With the Gene_List.txt spreadsheet selected:

    • From the Gene Expression workflow, choose Biological Interpretation followed by Gene Set Analysis

    • Select the GO Enrichment radio button in the Gene Set Analysis dialog (Figure 1) followed by Next

    Detecting peaks and enriched regions in ChIP-Seq data

    Binding sites for the DNA-binding protein of interest are indicated by peaks of enriched sequencing read density. How are peaks calculated from reads in Partek Genomics Suite?

    Using the effective fragment length calculated by Cross Strand-Correlation, each read is extended in the 3' direction by the effective fragment length and overlapping extended reads are merged into single peaks. For paired-end reads, the distance between paired reads is used as the fragment length and overlapping fragments are merged into peaks. For peak detection, the genome is divided into windows of a user-defined size and the number of fragments whose mid-points fall within each window is counted. A model for expected read density (a zero-truncated negative binomial) is used to determine which peaks are significantly enriched over a user-defined false discovery rate (FDR). See the for more information on the peak-finding algorithm and tips for setting the Fragment extension and window sizes.

    • Select spreadsheet 1 (ChIP-Seq) from the spreadsheet tree