# Transcript Expression Analysis - Cuffdiff

This option is only available when *Cufflinks quantification* node is selected. Detailed implementation information can be found in the Cuffdiff manual \[5].

When the task is selected, the dialog will display all the categorical attributes more than one subgroups (Figure 1).

<div align="left"><figure><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-3f209b8bf443fd326958d8ddb7315a76710fc9bc%2Fimage%20(12)%20(1)%20(1)%20(1).png?alt=media" alt=""><figcaption><p>Figure 1. Cuffdiff setup dialog. “Select attributes(s) to groups samples” lists the categorical attributes which have at least two levels (e.g. “Cell type” and “Time”)</p></figcaption></figure></div>

When an attribute is selected, pairwise comparisons of all the levels will be performed independently.

Click on **Configure** button in the Advanced options to configure normalization method and library types (Figure 2).

<figure><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-9af0a0d8441f0096252f05452e6afdc1b7cba3c6%2Fimage%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1).png?alt=media" alt=""><figcaption><p>Figure 2. Advanced option of cuffdiff</p></figcaption></figure>

There are three library normalization methods:

* Class-fpkm: library size factor is set to 1, no scaling applied to FPKM values
* Geometric: FPKM are scaled via the median of the geometric means of the fragment counts across all libraries \[6]. This is the default option (and is identical to the one used by DESeq)
* Quartile: FPKMs are scaled via the ratio of the 75 quartile fragment counts to the average 75 quartile value across all libraries

The library types have three options:

* Fr-unstranded: reads from the left-most end of the fragment in transcript coordinates map to the transcript strand, and the right-most end maps to the opposite strand. E.g. standard Illlumina
* Fr-firststrand: reads from the left-most end of the fragment in transcript coordinates map to the transcript strand, and the right-most end maps to the opposite strand. The right-most end of the fragment is the first sequenced or only sequenced for single-end reads. It is assumed that only the strand generated during first strand synthesis is sequenced. E.g. dUPT, NSR, NNSR
* Fr-secondstrand: reads from the left-most end of the fragment in transcript coordinates map to the transcript strand, and the right-most end maps to the opposite strand. The left-most end of the fragment is the first sequenced or only sequenced for single-end reads. It is assumed that only the strand generated during second strand synthesis is sequenced. E.g. Directional Illumina, standard SOLiD.

The report of the cuffdiff task is a table of a feature list p-values, q-value and log2 fold-change information for all the comparisons (Figure 3).

<figure><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-4c63e5f5fb2909e7af18f77d70dd5f723ff8ce20%2Fimage%20(2)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1)%20(1).png?alt=media" alt=""><figcaption><p>Figure 3. Figure 20: Cuffdiff task report. Each row is a feature, p-value, q-value and log2 fold change columns are display for each comparison</p></figcaption></figure>

In the p-value column, besides an actual p-value, which means the test was performed successfully, there is also the following flags which indicate the test was not successful:

* NOTEST: not enough alignments for testing
* LOWDATA: too complex or shallowly sequences
* HIGHDATA: too many fragments in locus
* FAIL: when an ill-conditioned covariance matrix or other numerical exception prevents testing

The table can be downloaded as a text file when clicking the **Download** button on the lower-right corner of the table.

## References

1. Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, JRSS, B, 57, 289-300.
2. Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035.
3. Auer, 2011, A two-stage Poisson model for testing RNA-Seq
4. Burnham, Anderson, 2010, Model selection and multimodel inference
5. Law C, Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 2014 15:R29.
6. <http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#cuffdiff-output-files>
7. Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biology, 2010

## Additional Assistance

If you need additional assistance, please visit [our support page](http://www.partek.com/support) to submit a help ticket or find phone numbers for regional support.
