> For the complete documentation index, see [llms.txt](https://help.partek.illumina.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://help.partek.illumina.com/partek-flow/tutorials/analyzing-cite-seq-data/classifying-cells.md).

# Classifying Cells

* [Exploratory Analysis Results](#exploratory-analysis-results)
* [T cells](#t-cells)
* [B cells](#b-cells)

We will now examine the results of our exploratory analysis and use a combination of techniques to classify different subsets of T and B cells in the MALT sample.

## Exploratory Analysis Results

* Double click the merged **UMAP** data node
* Under *Configure* on the left, click **Style,** select the **Graph-based cluster** node, and color by the **Graph-based** attribute (Figure 1)

![Figure 1. Color the cells in the UMAP plot by their graph-based cluster assignment](/files/mgLFJQJBvj0Jc0EdtXPY)

The 3D UMAP plot opens in a new data viewer session (Figure 2). Each point is a different cell and they are clustered based on how similar their expression profiles are across proteins and genes. Because a graph-based clustering task was performed upstream, a biomarker table is also displayed under the plot. This table lists the proteins and genes that are most highly expressed in each graph-based cluster. The graph-based clustering found 11 clusters, so there are 11 columns in the biomarker table.

* Click and drag the **2D scatter plot** icon from **New plot** onto the canvas (Figure 2)
* Drop the 2D scatter plot to the **right** of the UMAP plot

![Figure 2. Add a 2D scatter plot and place it to the right of the UMAP plot](/files/z1ZndDxdZSyTGgalK8qC)

* Click **Merged counts** to use as data for the 2D scatter plot (Figure 3)

![Figure 3. Choose Merged counts data to draw the 2D scatter plot](/files/c60vYONwKs3lXHiqUrmF)

A 2D scatter plot has been added to the right of the UMAP plot. The points in the 2D scatter plot are the same cells as in the UMAP, but they are positioned along the x- and y-axes according to their expression level for two protein markers: CD3\_TotalSeqB and CD4\_TotalSeqB, respectively (Figure 4).

![Figure 4. The canvas now has a 2D scatter plot next to the UMAP](/files/zMcb1UGMl2qk4Jq2CnVW)

* In **Select & Filter**, click **Criteria** to change the selection mode
* Click the **blue circle** next to the *Add rule* drop-down menu (Figure 5)

![Figure 5. Click the blue circle to change the data source for the rule selector](/files/ikRViBQWqg2D46DJbcEr)

* Click **Merged counts** to change the data source
* Choose **CD3\_TotalSeqB** from the drop-down list (Figure 6)

![Figure 6. Choose the CD3\_TotalSeqB protein marker as a selection rule](/files/pg4wvZEkkPXFkl5g0SBF)

* Click and drag the **slider** on the CD3D\_TotalSeqB selection rule to include the CD3 positive cells (Figure 7)

![Figure 7. Use the slider to select cells with positive expression for the CD3 protein marker](/files/L3mXHxVgpsoEzutBBHhJ)

As you move the slider up and down, the corresponding points on both plots will dynamically update. The cells with a high expression for the CD3 protein marker (a marker for T cells) are highlighted and the deselected points are dimmed (Figure 8).

![Figure 8. CD3+ cells are selected on both plots](/files/gorAdBSJGvS6WHBnQ9Lj)

* Click **Merged counts** in **Get data** on the left under *Setup*
* Click and drag **CD8a\_TotalSeqB** onto the 2D scatter plot (Figure 9)
* Drop CD8\_TotalSeqB onto the **x-axis** configuration option

![Figure 9. Change the feature plotted on the x-axis to CD8\_TotalSeqB](/files/Dxm7CRJcV3ppxt2EyXvh)

The CD3 positive cells are still selected, but now you can see how they separate into CD4 and CD8 positive populations (Figure 10).

![Figure 10. 2D scatter plot with CD4\_TotalSeqB and CD8\_TotalSeqB features on the axes](/files/LUdSPX2h3tA56pGykUFO)

The simplest way to classifying cell types is to look for the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. Let's compare the resolution power of the CD4 and CD8A gene expression markers compared to their protein counterparts.

* Click the **duplicate plot** icon above the 2D scatter plot (Figure 11)

![Figure 11. Click the duplicate plot icon to make a copy of the 2D scatter plot](/files/U7ugT2fqeOZfYrq8xz2L)

* Click **Merged counts** in the **Get Data** icon under *Setup*
* Search for the **CD4** gene
* Click and drag **CD4** onto the duplicated 2D scatter plot
* Drop the CD4 gene onto the **y-axis** option
* Search for the **CD8A** gene
* Click and drag **CD8A** onto the duplicated 2D scatter plot
* Drop the CD8A gene onto the **x-axis** option

The second 2D scatter plot has the CD8A and CD4 mRNA markers on the x- and y-axis, respectively (Figure 12). The protein expression data has a better dynamic range than the gene expression data, making it easier to identify sub-populations.

![Figure 12. The second 2D scatter plot (bottom) has the CD8 and CD4 genes plotted against each other](/files/qIph511tvWW1HY5E581h)

* On the first 2D scatter plot (with protein markers), click ![](/files/5OX0B6UoQW7mTYErtHSi) in the top right corner
* Manually select the cells with high expression of the CD4\_TotalSeqB protein marker (Figure 13)

More than 2000 cells show positive expression for the CD4 cell surface protein.

![Figure 13. Draw a lasso to manually select CD4+ cells, based on protein expression](/files/zQBwp8L8gPGO2rKPjG1b)

Let's perform the same test on the gene expression data.

* Click ![](/files/sSBYPOuizlIJf7OXvD1R) in the top right of the plot to switch back to pointer mode
* Click on a blank spot on the plot to clear the selection
* On the second 2D scatter plot (with mRNA markers), click ![](/files/5OX0B6UoQW7mTYErtHSi) in the top right corner
* Manually select the cells with high expression of the CD4 gene marker (Figure 14)

![Figure 14. Draw a lasso to manually select CD4+ (mRNA) cells](/files/qBlD5G0kL1HiwhguoJEi)

This time, only 500 cells show positive expression for the CD4 marker gene. This means that the protein data is less sparse (i.e. there fewer zero counts), which further helps to reliably detect sub-populations.

## T cells

Based on the exploratory analysis above, most of the CD3 positive cells are in the group of cells in the right side of the UMAP plot. This is likely to be a group of T cells. We will now examine this group in more detail to identify T cell sub-populations.

* Click ![](/files/ipOd3jnCN34HHiiCK98K) in the top right corner of both 2D scatter plots, to remove them from the canvas
* Click ![](/files/5OX0B6UoQW7mTYErtHSi) in the top right corner of the 3D UMAP plot
* Draw a lasso around the group of putative T cells (Figure 15)

![Figure 15. Select the group of putative T cells](/files/C09ZhbKyCqbFLEd0PIxT)

* Click ![](/files/zdAVcV8BSPFAx45UsZIH) in the **Select & Filter** tool to include the selected points
* Click ![](/files/sSBYPOuizlIJf7OXvD1R) in the top right of the plot to switch back to pointer mode
* Click and drag the plot to rotate it around

![Figure 16. Group of putative T-cells](/files/uRmvB2Y842GVO7O2jWXU)

This group of putative T cells predominantly consists of cells assigned to graph-based clusters 3, 4, and 6, indicated by the colors. Examining the biomarker table for these clusters can help us infer different types of T cell.

* Add the Biomarkers table using the **Table** option in the **New plot** menu, you can drag and reposition the table using the button in the top left corner of the plot ![](/files/y5O9JHnCiFWKaMemy47J).
* Click and drag the bar between the UMAP plot and the biomarker table to resize the biomarker table to see more of it (Figure 17)

If you need to create more space on the canvas, hide the panel words on the left using the arrow ![](/files/30wFobreUYLPYGs4O4QX).

![Figure 17. Resize plots to see more of the biomarker table](/files/V2MlxGxnrl5j8zzTt0xY)

Cluster 6 has several interesting biomarkers. The top biomarker is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Another biomarker is the PD-1 protein, which is expressed in Tfh cells. This protein promotes self-tolerance and is a target for immunotherapy drugs. The TIGIT protein is also expressed in cluster 6 and is another immunotherapy drug target that promotes self-tolerance.

Cluster 4 expresses several marker genes associated with cytotoxicity (e.g. NKG7 and GZMA) and both CD3 and CD8 proteins. Thus, these are likely to be cytotoxic cells.

We can visually confirm these expression patterns and assess the specificity of these markers by coloring the cells on the UMAP plot based on their expression of these markers.

* Click the **duplicate plot** icon above the UMAP plot

We will color the cells on the duplicate by their expression of marker genes, while keeping the original plot colored by graph-based cluster assignment.

* Click and drag the **CXCL13** gene from the biomarker table onto the duplicate UMAP plot
* Drop the CXCL13 gene onto the **Green (feature)** option (Figure 18)

![Figure 18. Click and drag the gene from the biomarker table onto the plot](/files/sBtJhNSoblNcdBCyaCxl)

* Click and drag the **NKG7** gene from the biomarker table onto the duplicate UMAP plot
* Drop the NKG7 gene onto the **Red (feature)** option

The cells with higher CXCL13 and NKG7 expression are now colored green and red, respectively. By looking at the two UMAP plots side by side, you can see these two marker genes are localized in graph-based clusters 6 and 4, respectively (Figure 19).

![Figure 19. The cells in the UMAP plot on the right are colored by their expression of CXCL13 (green) and NKG7 (red) marker genes. These cells belong to graph-based clusters 6 and 4, respectively, shown in the plot on the left](/files/h4iOEAGqTdKG6e63QJIi)

* In **Select & Filter**, click ![](/files/3ri6MoSm3tqb4QpoRgf7) to remove the CD3\_TotalSeqB filtering rule
* Click the **blue circle** next to the *Add criteria* drop-down list
* Search for **Graph** to search for a data source
* Select **Graph-based clustering** (derived from the Merged counts > PCA data nodes)
* Click the **Add criteria** drop-down list and choose **Graph-based** to add a selection rule (Figure 20)

![Figure 20. Change the data source to Graph-based clustering and choose Graph-based from the drop-down list](/files/NcfQOIkfLF0kvKNEvP4Q)

* In the *Graph-based* filtering rule, click **All** to deselect all cells
* Click cluster **6** to select all cells in cluster 6
* Using the **Classify** tool, click **Classify selection**
* Label the cells as **Tfh** **cells** (Figure 21)
* Click **Save**

![Figure 21. Select all cluster 6 cells and classify them as Tfh cells](/files/keMgfrRKgDcbtO1GQH7C)

* Click ![](/files/PMIRI9cDHX7OlasuG1p4) in **Select & Filter** to exclude the cluster 6/Tfh cells
* Click cluster **4** to select all cells in cluster 4
* In the **Classify** icon, click **Classify selection**
* Label the cells as **Cytotoxic cells**
* Click **Save**
* Click ![](/files/PMIRI9cDHX7OlasuG1p4) in **Select & Filter** to exclude the cluster 4/Cytotoxic cells

We can classify the remaining cells as helper T cells, as they predominantly express the CD4 protein marker.

* Click on the **invert selection** icon in either of the UMAP plots (Figure 22)

![Figure 22. Invert the selection to select all remaining cells](/files/QxcskMpAQc3exUJZ0uRL)

* In **Classify**, click **Classify selection**
* Label the cells as **Helper T cells**
* Click **Save**

Let's look at our progress so far, before we classify subsets of B-cells.

* Click the **Clear filters** link in **Select & Filter**
* Select the duplicate UMAP plot (with the cell colored by marker genes)
* Under *Configure* on the left, open **Style** and color the cells by **New classifications** (Figure 23)

![Figure 23. Color by New classifications (T cell subsets)](/files/tYr4m5vf6tbdSFZ7UqRC)

## B cells

In addition to T-cells, we would expect to see B lymphocytes, at least some of which are malignant, in a MALT tumor sample. We can color the plot by expression of a B cell marker to locate these cells on the UMAP plot.

* In the **Get data** icon on the left, click **Merged counts**
* Scroll down or use the search bar to find the **CD19\_TotalSeqB** protein marker
* Click and drag the **CD19\_TotalSeqB** marker over to the UMAP plot on the right
* Drop the CD19\_TotalSeqB marker over the **Color** configuration option on the plot

The cells in the UMAP plot are now colored from grey to blue according to their expression level for the CD19 protein marker (Figure 24). The CD19 positive cells correspond to several graph-based clusters. We can filter to these cells to examine them more closely,

![Figure 24. Cells in UMAP plot colored by their expression of CD19 protein](/files/J05ck7H4ySXrfaRmilxb)

* Click ![](/files/5OX0B6UoQW7mTYErtHSi) in the top right corner of the UMAP plot
* Lasso around the CD19 positive cells (Figure 25)
* Click ![](/files/zdAVcV8BSPFAx45UsZIH) in **Select & Filter** to include the selected points

![Figure 25. Lasso around CD19 positive cells](/files/b8am3iLNIG9zmGeFSwoV)

The plots will rescale to include the selected points. The CD19 positive cells include cells from graph-based clusters 1, 2 and 7 (Figure 26).

![Figure 26. Filtered CD19 positive cells](/files/Q3fD2JNmAdjeHv5NnuIW)

* Find the **CD3\_TotalSeqB** protein marker in the biomarker table
* Click and drag the **CD3\_TotalSeqB** onto the UMAP plot on the right
* Drop the CD3\_TotalSeqB protein marker onto the **Color** configuration option on the plot (Figure 27)

While these cells express T cell markers, they also group closely with other putative B cells and express B cell markers (CD19). Therefore, these cells are likely to be doublets.

![Figure 27. Some cells within the CD19 positive clusters show signs of expressing T-cells markers](/files/kwGTIXQSg7sk6ftQpFiD)

* Select either of the UMAP plots
* Click on the **Select & Filter**
* Find the **CD3\_TotalSeqB** protein marker in the biomarker table
* Click and drag **CD3\_TotalSeqB** onto the **Add criteria** drop-down list in **Select & Filter** (Figure 28)
* Set the minimum threshold to **3** in the CD3\_TotalSeqB selection (Figure 29)
* Click the **Classify** icon then click **Classify selection**
* Label the cells as **Doublets**
* Click **Save**
* Click ![](/files/PMIRI9cDHX7OlasuG1p4) in **Select & Filter** to exclude the selected points

![Figure 28. Click and drag the CD3 protein marker directly onto the Add criteria drop-down list to create a selection criteria](/files/30fF0KVya6iZOwgyB78P)

![Figure 29. Select the remaining CD3 positive doublet cells](/files/IHxQSqVBBY73BRZ7g79i)

The biomarkers for clusters 1 and 2 also show an interesting pattern. Cluster 1 lists IGHD as its top biomarker, while cluster 2 lists IGHA1 as the fourth most significant. Both IGHD (Immunoglobulin Heavy Constant Delta) and IGHA1 (Immunoglobulin Heavy Constant Alpha 1) encode classes of the immunoglobulin heavy chain constant region. IGHD is part of IgD, which is expressed by mature B cells, and IGHA1 is part of IgA1, which is expressed by activated B cells. We can color the plot by both of these genes to visualize their expression.

* Click, drag and drop **IGHD** from the biomarker table onto the **Green (feature)** configuration option on the UMAP plot on the right
* Click, drag and drop **IGHA1** from the biomarker table onto the **Red (feature)** configuration option on the UMAP plot on the right (Figure 30)

![Figure 30. The B cells colored by IGHD (green) and IGHA1 (red) gene expression](/files/obrmyQN2DLpxtlp3snaA)

We can use the lasso tool to select and classify these populations.

* Click ![](/files/5OX0B6UoQW7mTYErtHSi) in the top right corner of the UMAP plot
* Lasso around the IGHD positive cells (Figure 31)
* In the **Classify** icon on the left, click **Classify selection**
* Label the cells as **Mature B cells**
* Click **Save**

![Figure 31. Lasso around the IGHD positive cells](/files/HwZKkICsCwgBjpdAdvv0)

* Lasso around the IGHA1 positive cells (Figure 32)
* In the **Classify** icon on the left, click **Classify selection**
* Label the cells as **Activated B cells**
* Click **Save**

![Figure 32. Select IGHA1 positive cells](/files/ZqOhQOqnGpJ5lMRHtX36)

We can now visualize our classifications.

* Click the **Clear filters** link in the **Select & Filter** icon on the left
* Select the duplicate UMAP plot (with the cell colored by marker genes)
* Under *Configure* on the left, click the **Style** icon and color the cells by **New classifications** (Figure 33)

![Figure 33. UMAP with cells colored by cell types](/files/YuUdUnJB5fo5DV42xp9R)

* Click **Apply classifications** in the **Classify** icon
* Name the attribute **Cell type**
* Click **Run**
* Click **OK** to close the message about a classification task being enqueued

Optionally, you may wish to save this data viewer session if you need to go back and reclassify cells later. To save the session, click the ![](/files/ZsuYfFNhtj9gY9W3dPui) icon on the left and name the session.

## Additional Assistance

If you need additional assistance, please visit [our support page](http://www.partek.com/support) to submit a help ticket or find phone numbers for regional support.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://help.partek.illumina.com/partek-flow/tutorials/analyzing-cite-seq-data/classifying-cells.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.