# Harmony

It is challenging to analyze scRNA-seq data, particularly when they are assayed with different technologies. Because biological and technical differences are interspersed. Harmony\[1] is an algorithm that projects cells into a shared embedding where cells group by cell type rather than dataset-specific conditions. Harmony is able to simultaneously account for multiple experimental and biological factors while integrating different datasets.

Harmony in Flow can be invoked in Batch removal section only if

1. the data has some categorical attributes (only categorical attributes can be included in the model)
2. PCA data node is selected (Figure 1).

<div align="left"><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-4fece853ea42976f260b19fc806736a484cffdf9%2FScreen%20Shot%202020-12-24%20at%201.28.32%20PM.png?alt=media" alt="Figure 1. Harmony task in Batch removal section in Flow."></div>

To run Harmony,

* Click a **PCA** data node
* Click the **Batch removal** section in the toolbox
* Click **Harmony**

You will be prompted to pick some attribute(s) for analysis. The Harmony dialog is similar to the General linear model batch removal. To set up the model, you need to choose which attributes should be considered. For example, in the case of one dataset that has different cell types from multiple batches, the batch may have divergent impacts on different cell types. Here, batch is the attribute *Sample name* and cell type is the attribute *Cell type* (Figure 2).

To remove batch effects with default settings,

* Click **Sample name**
* Click **Add factors**
* Click **Finish**

<div align="left"><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-ef7b5e2d89b2106cca29f3bf4ba1663d938dfc65%2FFigure%202%20(2).png?alt=media" alt="Figure 2. Select factors to remove." width="356"></div>

The output of Harmony is a new data node. This data node contains the Harmony corrected values and can be used as the input for downstream tasks such as Graph-based clustering, UMAP and T-SNE (Figure 3).

![Figure 3. UMAP displays the cells colored by Cell type before (left) and after (right) Harmony integration.](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-f44445bb705235d255cd5a39694b986be24e5b92%2FScreen%20Shot%202020-11-02%20at%202.32.20%20PM.png?alt=media)

Users can click **Configure** to change the default settings In **Advanced options** (Figure 4).

<div align="left"><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-10274d70584bd5efb53010805c8c92a7cc5abdb4%2FScreen%20Shot%202020-10-30%20at%2010.19.05%20AM.png?alt=media" alt="Figure 4. Advanced configure options for Harmony in Flow." width="375"></div>

**Diversity clustering penalty (theta)**: Default theta=2. Higher value of penalty will have stronger correction, which results in better mixing . Zero penalty means no correction. The range of this value is from 0 to positive infinity.

**Number of clusters (nclust)**: Number of clusters in model. Set this to the distinct count of cell types. nclust=1 equivalent to simple linear regression. Use 0 to enable Seurat’s RunHarmony() default setting.

**Width of soft kmeans clusters (sigma)**: The range of this value is from 0 to positive infinity. When set it to 0, an observation will be assigned to 1 cluster (hard clustering). When the value is greater than 0, the observation will be potentially belong to multiple clusters (soft clustering, or fuzzy clustering). Default sigma=0.1. Sigma scales the distance from a cell to cluster centroids. Larger values of sigma result in observations assigned to more clusters. Smaller values of sigma make soft kmeans cluster approach hard clustering.

**Ridge regression penalty (lambda)**: Default lambda=1. Lambda must be strictly positive. Smaller values result in more aggressive correction.

***Random seed**: Use the same random seed to reproduce the results.*

## References

1. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P-r, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods; 2019. <https://doi.org/10.1038/s41592-019-0619-0>.

## Additional Assistance

If you need additional assistance, please visit [our support page](http://www.partek.com/support) to submit a help ticket or find phone numbers for regional support.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.partek.illumina.com/partek-flow/user-manual/task-menu/batch-removal/harmony.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
