# Hierarchical Clustering Analysis

## What is Hierarchical Clustering?

Hierarchical clustering groups similar objects into clusters. To start, each row and/or column is considered a cluster. The two most similar clusters are then combined and this process is iterated until all objects are in the same cluster. Hierarchical clustering displays the resulting hierarchy of the clusters in a tree called a dendrogram. Hierarchical clustering is useful for exploratory analysis because it shows how samples group together based on similarity of features.

Hierarchical clustering is an unsupervised clustering method. Unsupervised clustering methods do not take the identity or attributes of samples into account when clustering. This means that experimental variables such as treatment, phenotype, tissue, number of expected groups, etc. do not guide or bias cluster building. Supervised clustering methods do consider experimental variables when building clusters.

## Visualizing Hierarchical Clustering

To illustrate the capabilities and customization options of hierarchical clustering in Partek Genomics Suite, we will explore an example of hierarchical clustering drawn from the tutorial [Gene Expression Analysis](https://help.partek.illumina.com/partek-genomics-suite/tutorials/gene-expression-analysis). The data set in this tutorial includes gene expression data from patients with or without Down syndrome. Using this data set, 23 highly differentially expressed genes between Down syndrome and normal patient tissues were identified. These 23 differentially regulated genes were then used to perform hierarchical clustering of the samples. Follow the steps outlined in [Performing hierarchical clustering](https://help.partek.illumina.com/partek-genomics-suite/tutorials/gene-expression-analysis/performing-hierarchical-clustering) to perform hierarchical clustering and launch the *Hierarchical Clustering* tab (Figure 1).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-a3d00de773bf4e9876b233f521cd88bcfaea244d%2F2017-07-11%2016_44_07-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 1. Heatmap showing results of hierarchical clustering

The right-hand section of the *Hierarchical Clustering* tab is a heat map showing relative expression of the genes in the list used to perform clustering. The heat map can be configured using the properties panel on the left-hand side of the tab. In this example, the low expression value is colored in green, the high expression value is in red, and the mid-point value between min and max is colored in black.The dendrograms on the left-hand side and top of the heat map show clustering of samples as rows and features (probes/genes in this example) as columns. Columns are labeled with the gene symbol if there is enough space for every gene to be annotated. Rows are colored based on the groups of the first sample categorical attribute in the source spreadsheet. The sample legend below the heat map indicates which colors correspond to which attribute group. In this example, Down syndrome patient samples are red and normal patient samples are orange.

The heat map can be configured using the properties panel on the left-hand side of the *Hierarchical clustering* tab.

## Configuring the Hierarchical Clustering Plot

#### Labeling Sample Groups in the Heat Map

* Select the *Rows* tab
* Verify that *Type* appears in the annotation box
* Set *Width (in pixels)* to **25**

This will increase the width of the color box indicating sample *Type*.

* Select **Show Label**
* Set *Text size* to **12**
* Set *Text* *angle* to **90**

This angle is relative to the x-axis. When set to 90, the text will run along the y-axis.

* Select **Apply**

The sample attributes are now labeled with group titles (Figure 2).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-8e427c0ede6ba5bf4d976434df6b34e8ccb073d1%2F2017-07-12%2009_20_20-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 2. Labeling heat map with sample attribute groups

#### Adding a Sample Attribute to the Heat Map

* Select the *Rows* tab
* Select **Tissue** from the *New Annotation* drop-down menu
* Select **Apply**

Color blocks indicating the tissue of each sample have been added to the row labels and sample legend (Figure 3).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-48a90f0aa8fa775be3c4c348b5e00891b5d80685%2F2017-07-12%2009_25_47-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 3. Sample attributes can be added to the heat map as sample labels

#### Changing the Orientation of the Rows and Columns

By default, Partek Genomics Suite displays samples on rows and features on columns. We can transpose the heat map using the *Heat Map* tab in the plot properties panel.

* Select the *Heat Map* tab
* Select **Transpose rows and columns** in the *Orientation* section
* Select **Apply**

The plot has been transposed with samples on columns and features on rows. The label for the sample groups is now in the vertical orientation because the settings we applied to *Rows* has been applied to *Columns*.

* Select the *Columns* tab
* Select the Type track
* Set *Text* *angle* to **0**
* Select **Apply**

The sample group label for *Type* is now visible (Figure 4).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-6b6e87ace9cedcfab94cab0c902b991acd4f02f9%2F2017-07-12%2009_41_30-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 4. Heat map columns and rows can be transposed

#### Flipping Columns or Rows

Each cluster node has two sub-cluster branches (legs) except for the bottom level in the dendrogram, the order of the two branches (or legs) is arbitrary, so the two sub-clusters position can be flipped within the cluster. This does not change the clustering, only the position of the clusters on the plot.

* Select (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-4ee367238f959a3e3effe464010738e1b7c22757%2F2017-07-12%2009_45_40-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)) from the *Mouse Mode* icon set to activate *Flip Mode*
* Clicking on a line (or drawing a bounding box on a line using left mouse button) that represents a sub-cluster branch (or dendrogram leg) will flip the selected leg with the other one leg within the same parent cluster. In this example, clicking on the bottom line will move it to the top of the heat map (Figure 5).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-9c6d0256959b5d3a42ed78b901a03c8d448fd6ab%2F2017-07-12%2009_54_19-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 5. Rows and columns can be flipped by using Flip Mode to select dendrogram legs

#### Changing Heat Map Colors

The minimum, maximum, and midpoint colors of the heart map intensity plot can be customized.

* Select the *Heat Map* tab
* Set *Min* color to (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-9e09551ebb8cd40d3c323cf70ba17fad14482375%2Fimage2017-7-12%2010_6_32.png?alt=media)) using the color picker tool
* Set *Max* color to (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-7ed20838c0d8154a1044f244966637080e814787%2Fimage2017-7-12%2010_7_17.png?alt=media)) using the color picker tool
* Select **Apply**

The heat map and plot intensity legend now show maximum values in yellow and minimum values in light blue with a black midpoint (Figure 6). The data range can also be customized by changing the values of *Min* and *Max*.

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-b51a83cd91d49145bd67d71815f8415fd25d8d0e%2F2017-07-12%2010_11_37-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 6. Heat map colors for minimum, maximum, and midpoint intensity can be customized

#### Zooming to Selected Rows/Columns

We can use the hierarchical clustering heat map to examine groups of genes that exhibit similar expression patterns. For example, genes that are up-regulated in Down syndrome samples and down-regulated in normal samples.

* Select (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-81bfb1f26b9264056196d2ca6d8e1ffb9ff950d4%2F2017-07-12%2010_21_27-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)) from the *Mouse Mode* icon set to activate *Selection Mode*
* Select on the middle cluster of the rows dendrogram as shown (Figure 7) by clicking on the line or drawing a bounding box around the line

The lines within the selected cluster will be bold and the corresponding columns (or rows) on the spreadsheet in the analysis tab will be highlighted.

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-9166e0d41c47a7a9301482b52aea347fcd64f939%2F2017-07-12%2010_24_28-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 7. Selecting a dendrogram cluster using Selection Mode

* Right-click anywhere in the viewer
* Select **Zoom to Fit Selected Rows**

The same steps can be used to zoom into columns or rows. Here, we have zoomed in on rows, but not columns to show the expression levels of the selected genes for all samples (Figure 8).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-0bbc19443bb1dc97724bad9f4acb1af612ac4b5d%2F2017-07-12%2010_29_57-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)

Figure 8. Viewing only selected genes for all samples

To reset zoom select (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-1be8c85cd231509df8429def4945ea3ec1b2eafb%2F2017-07-12%2010_34_47-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)) on the y-axis to show all rows and the x-axis to show all columns.

* Select (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-1be8c85cd231509df8429def4945ea3ec1b2eafb%2F2017-07-12%2010_34_47-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)) on the y-axis to show all rows
* Left click anywhere in the hierarchical clustering plot to deselect the dendrogram

#### Exporting a List of Genes From a Selected Cluster

Partek Genomics Suite can export a list of genes from any cluster selected, allowing large gene sets to be filtered based on the results of hierarchical clustering.

* Select (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-81bfb1f26b9264056196d2ca6d8e1ffb9ff950d4%2F2017-07-12%2010_21_27-Partek%20Genomics%20Suite%20-%201%20\(Down_Syndrome-GE\).png?alt=media)) from the *Mouse Mode* icon set to activate *Selection Mode*
* Select the bottom cluster of the rows dendrogram
* Right-click to open the pop-up menu
* Select **Create Row List...** (Figure 9)

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-1b27a6fc974ced9a55de8fd412c75461020e0c77%2F2017-07-12%2012_11_06-Photos.png?alt=media)

Figure 9. Creating gene list from selected cluster

* Name the gene set *down in normal*
* Select **OK**
* Save the list as *down in normal*

In the *Analysis* tab, there is now a spreadsheet *row\_list (down in normal.txt)* containing the 6 genes that were in the selected cluster. The same steps can be used to create a list of samples from the hierarchical clustering by selecting clusters on the sample dendrogram.

#### Saving Plot Properties

Once you have created a customized plot, you can save the plot properties as a template for future hierarchical clustering analyses.

* Select the *Save/Load* tab
* Select **Save current...**
* Name the current plot properties template; we selected **Transposed Blue and Yellow**

The new template now appears in the *Save/Load* panel as an option. To load a template, select it in the *Load/Save* panel and select **Load selected**. Note that all properties, including *Min* and *Max* values and sample groups (based on the column number of the attribute in the source spreadsheet) that may not be appropriate for a different data set, will be applied.

#### Exporting the Hierarchical Clustering Plot Image

The hierarchical clustering plot can be exported as a publication quality image.

* Select the *Hierarchical Clustering* tab
* Select **File** from the main toolbar
* Select **Save Image As...** from the drop-down menu
* Select a destination and name for the file
* Select **PNG** or your preferred image type from the pull-down menu
* Select **Save**

## Additional Assistance

If you need additional assistance, please visit [our support page](http://www.partek.com/support) to submit a help ticket or find phone numbers for regional support.
