# Identifying novel and known motifs

With a list of enriched regions, you can now identify recurring patterns or motifs in these regions. Transcription factors bind sites throughout the genome, but each has a characteristic sequence it binds - a consensus sequence that appears in most of its binding sites. By searching for binding site motifs, you can determine the consensus sequence for a transcription factor and predict potential binding locations throughout the genome that may not have been found in your experiment.

Partek Genomics Suite detects *de novo* motifs using the Gibbs motif sampler (Neuwald et al., 1995) and can search for known transcription factor binding sites using a database such as [*JASPAR*](http://jaspar.genereg.net/)*.*

## Discover *de novo* motifs

* Select **Motif Discovery** from the *Peak Analysis* section of the *ChIP-Seq* workflow
* Select **Discover de novo motifs**
* Select **OK**

The *Detect Motifs* dialog will open to allow you to configure the search (Figure 1).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-9d48aa2fceb2b802db6f4d1d6456400eb4df7cc4%2F2017-07-26%2016_12_22-Detect%20Motifs.png?alt=media)

Figure 1. Configuring search parameters for de novo motfis

* Select **1/p-value\_filtered** from the *Spreadsheet with genomic regions* drop-down menu
* Set *Number of Motifs* to **1**
* Set *Discover motifs of length* to **6** *bp to* **16** *bp*
* Set *Result file* to **Motifs**
* Select **OK**

If you have not previously downloaded the reference genome on your computer, you may be asked if you would like to download the .2bit reference genome. If prompted, select **Automatically download a .2bit file** then select **OK**. If Partek Genomics Suite cannot connect to the internet, this option may not be available. If not, you will need to download the .2bit file from the UCSC Genome Browser and import it by selecting **Manually specify a .2bit file** and choosing the downloaded .2bit file. The reference genome map is required to determine which genes overlap the enriched peak regions and to display the aligned sequences in the *Genome Viewer*.

A motif visualization tab, *Sequence Logo,* will open and two spreadsheets will be generated. One spreadsheet, *motifs* *(Motifs),* contains information about the motif. The other, *instances (Motifs\_instances.txt)*, lists the genomic locations of the motif.

## Description of Motif Detection Output

### *Sequence Logo Window*

*The Sequence Logo tab* (Figure 2) opens after motif detection and displays the most significant motif found in the regions listed in the source spreadsheet\_.\_

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-54d8ae0d3135f1d354a9c32b336dd7e18d4007d9%2F2017-08-09%2014_24_28-Partek%20Genomics%20Suite%20-%201_p-value_filtered_motifs_instances%20\(Motifs_instances.tx.png?alt=media)

Figure 2. Viewing the binding site for NRSF. Use the blue arrows to cycle through views of all motif found (if there are more than one). Select Reverse to view the reverse complement sequence.

In this case, the motif finder discovered a motif in the NRSF-enriched regions that is 16 base pairs in length. The height of each position is the relative entropy (in bits) and indicates the importance of a base at a particular location in the binding site.

The title *CT.TCC..GGT.CTG.* is the consensus sequence for the sequence logo. Dots represent positions that contain more than one significant base across all reads in the motif. The dots can be replaced with characters representing the possible bases at each location by selecting **Show nucleotide codes**. A description of the IUPAC nucleotide codes is available at the [UCSC Genome Browser](http://genome.ucsc.edu/goldenPath/help/iupac.html).

To view the reverse complement of the motif, select **Reverse**.

### *Motifs spreadsheet*

The motif information spreadsheet (Figure 3), *Motifs*, lists the information about all motifs discovered during *de novo Motif Detection*. This includes five columns describing each motif.

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-1ddc6c8d7bf8376a74b62379d2a8bef0c5a90665%2F2017-08-09%2014_25_21-Partek%20Genomics%20Suite%20-%201_p-value_filtered_motifs%20\(Motifs\).png?alt=media)

Figure 3. Viewing the Motifs spreadsheet

*1. Counts* gives the summed counts for each base call across all occurrences of the motif in the region list as {A, C, G, T}

*2. Consensus Sequence* gives the consensus sequence of the motif in IUPAC nucleotide codes

*3. Motif ID* gives a unique ID to each discovered motif using its row in the *Motifs* spreadsheet

*4. Log Likelihood Ratio* scores the relative likelihood that the pattern did not occur by chance, with larger numbers indicating that it is less likely to have occurred by chance

*5. Background frequency (A,C,G,T)* gives the frequency of each of the bases in all the sequences of that motif

You can bring up the Sequence Logo visualization of a listed motif by right-clicking on the row header and selecting **Logo View** from the pop-up menu.

### *Motif\_instances spreadsheet*

The \_instances (\_*Motif\_instances)* spreadsheet (Figure 4) is a child spreadsheet of the *Motifs* spreadsheet. It details all the locations of the motif(s) detected in the enriched regions. Each row lists a putative binding site for a motif. The columns give detailed information about the putative binding sites.

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-36fff34f70b718161cfbfd8d39de1b32a7e5fc47%2F2017-08-09%2014_25_52-Partek%20Genomics%20Suite%20-%201_p-value_filtered_motifs_instances%20\(Motifs_instances.tx.png?alt=media)

Figure 4. Viewing the instances spreadsheet

*1-4. chromosome, start, stop, strand* give the position

*5. Motif ID* gives the identity of the motif

*6. instance* gives the sequence of this instance of the motif

*7. score* gives the log ratio of the probability that this sequence was generated by the motif versus the background distribution. A higher number indicates a better chance that the sequence is an instance of the motif.

## Search *JASPAR* for known motifs

* Select **Motif discovery** from the *Peak Analysis* section of the *ChIP-Seq* workflow
* Select **Search for known motifs**
* Select **OK**

*Search for known motifs* will search the JASPAR database for motifs that are over-represented in the list of sequences in the significant regions list. The JASPAR database will download automatically if needed during the *Search for known motifs step.* Downloading the JASPAR database will create a spreadsheet in your experiment named *JASPAR.txt* that contains all of the species-specific motifs in the database. To visualize the motifs, right-click on a row in the *JASPAR.txt* spreadsheet and select *Logo View*.

Before *Search for known motifs* runs, we need to configure the search (Figure 5).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-efb7a64dfe977a8a8143bc5c386f10ced2a64252%2F2017-07-27%2010_48_12-Search%20for%20Motif\(s\)%20in%20Sequences.png?alt=media)

Figure 5. Configuring a search for known motifs in the JASPAR database

* Select **1/p-value\_filtered (p-value filtered.txt)** from the *Choose Region Spreadsheet* drop-down menu
* Select **Search using motifs specified in:** for *Choose Motifs to Search*
* Set *Search using motifs specified in:* to **2 (JASPAR.txt)** using the drop-down menu
* Set *Search for* to **All Motifs** using the drop-down menu
* Set *Sequence Quality >=* to **0.7**
* Name the result file **MotifSearch**
* Select **OK**

Because we are searching for around 1200 motifs, the process will take some time to complete. Progress is displayed in the progress bar in the lower left-hand side of the *Search for Motif(s) in Sequences* dialog (Figure 6).

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-7d9dcdd58f776be213e11a8a25005564a5b523e1%2F2017-07-27%2011_00_11-Partek%20Genomics%20Suite%20-%201_p-value_filtered%20\(p-value%20filtered.txt\).png?alt=media)

Figure 6. Progress in the motif search will display in the progress bar

Two spreadsheets are created, similar to the spreadsheets in the *de novo* motif discovery, the *motif\_summary (MotifSearch)* spreadsheet (Figure 7) and the *motif\_instances (MotifSearch.instance)* spreadsheet.

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-fc157617658b67ef678e7b543664bb34f7d18d35%2F2017-08-10%2009_28_00-Partek%20Genomics%20Suite%20-%201_p-value_filtered_motif_summary%20\(MotifSearch\).png?alt=media)

Figure 7. Viewing the results of motif search

In the *MotifSearch* spreadsheet, each motif used in the motif search is shown. The columns detail the results of the search for each motif that was found in the reads.

*1. Motif* this is the name or ID of the motif

*2. Probability of Occurrence* gives the probability of detecting a false positive for this motif in a random DNA sequence

*3. Expected Number of Outcomes* gives the Probability of Occurrence multiple by the summed length of the reads

*4. Actual Number of Occurrences* gives a count of sequences that match the known motif in the reads

*5. p-value* is the uncorrected p-value (binomial test)

As you can see, REST, which is another name for NRSF, is near the top of the list as one of the most significantly over-represented motifs (Figure 7). This motif agrees with the motif found in the *de novo* motif detection step. Interestingly, other motifs appear a significant number of times in the ChIP-Seq peaks and may represent possible co-factors or regulators.

The *motif\_instances* spreadsheet contains all instances of the motifs from the *motif\_summary* spreadsheet in a format identical to the *instances* spreadsheet from *de novo* motif detection.

## Generating a list of regions containing a motif

While the *motif\_instances* spreadsheet contains every instance of every motif, it may be useful to create a spreadsheet with just instances of one motif or a select group of motifs. Let's do this for both REST motifs.

* Select the **motif\_instances** spreadsheet in the spreadsheet tree
* Right-click the **5. Motif Name** column
* Select **Find / Replace / Select...** from the pop-up menu (Figure 8)

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-6963da04ef61b99e7df552508ca9921bfbf829f4%2F2017-07-27%2011_45_34-.png?alt=media)

Figure 8. Finding all REST peaks (step 1)

* Set *Find What:* to **REST**
* Select **By Columns** for *Search:*
* Select **Only in column** with **5. Motif Name** selected form the drop-down menu
* Select **Select All** (Figure 9)

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-3ec42b57a2d9bf89b4b03bbd0ee9f6136d603163%2F2017-07-27%2011_46_16-Spreadsheet%201_p-value_filtered_motif_summary_motif_instances%20_%20Find_Replace_Sele.png?alt=media)

Figure 9. Selecting all REST instances in motif\_instances spreadsheet (step 2)

This finds and selects every instance of REST in column *5. Motif Name.*

* Select **Close**

In the *motif\_instances* spreadsheet the selected columns are highlighted.

* Right-click on the first highlighted row visible; in this example, we see row 13196
* Select **Filter Include** from the pop-up menu (Figure 10)

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-eb204683b8954cc79aa475aaef6c100d465a7968%2F2017-07-27%2011_54_59-.png?alt=media)

Figure 10. Filtering for selected rows

The spreadsheet will now include 2098 rows and a black and yellow bar will appear on the right-hand side of the spreadsheet (Figure 11). The black and yellow bar is a filter indicator showing the fraction of the spreadsheet currently visible as yellow and the filtered fraction as black.

![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-ad86e628bfccff43f9aca46f45581b0469effa9e%2F2017-07-27%2011_58_16-Partek%20Genomics%20Suite%20-%201_p-value_filtered_motif_summary_motif_instances%20\(MotifS.png?alt=media)

Figure 11. Filtered motif\_instances spreadsheet containing 2098 instances of the REST motifs

To create a spreadsheet that contains only the REST instances, we can clone the *motif\_instances* spreadsheet while the filter is applied.

* Right-click on *motif\_instances* in the spreadsheet navigator
* Select **Clone...** from the pop-up menu
* Set the *Name of resulting* *copy* as **REST**
* Select **1/p-value\_filtered/motif\_summary (MotifSearch)** from the *Create as a child of spreadsheet* drop-down menu
* Select **OK**

This creates a temporary spreadsheet *rest* from the filtered *motif\_instances* spreadsheet. We can now save the new spreadsheet.

* Select **rest** from the spreadsheet tree
* Select (![](https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-795be23ace6ddca21985b239a96af5205d061c7c%2Fimage2017-8-24%209_50_36.png?alt=media)) from the command bar
* Name the file **REST**
* Select **Save**

We can now remove the filter from the source *motif\_instances* spreadsheet.

* Select **motif\_instances** from the spreadsheet tree
* Right-click the filter bar
* Select **Clear Filter**

### References

Neuwald, A. F., Liu, J.S., & Lawrence, C.E. (1995). Gibbs motif sampling: detection of outer membrane repeats (Vol. 4). Protein Science.

## Additional Assistance

If you need additional assistance, please visit [our support page](http://www.partek.com/support) to submit a help ticket or find phone numbers for regional support.
