Difference between revisions of "Enrichment analysis"

From BioUML platform
Jump to: navigation, search
(Added 'Autogenerated pages' category)
m (Protected "Enrichment analysis": Autogenerated page (‎[edit=sysop] (indefinite)))

Revision as of 16:27, 4 April 2013

Contents

Enrichment analysis (GSEA)

Gene set enrichment analysis (GSEA) is an advanced categories classification technique which works with ranked set of genes. A group from classification is considered over-represented if most of input set genes belonging to the group are top-ranked genes. Ranking is specified by user via numerical column (Fold-change values, for example).

Parameters

  • Source data set: input table having Ensembl genes as rows. If your data have different row identifiers, consider using "Convert table" analysis first.
  • Species: species corresponding to the input table.
  • Weight column: column to rank genes by. Gene is considered top-ranked if value in this column is the highest.
  • Classification: classification you want to use. List of classifications may differ depending on software version and your subscription.
  • Minimal hits to group (nmin): minimal number of hits in the group to be included into result.
  • Only over-represented: if checked, under-represented groups (with negative enrichment score) will be excluded from the result.
  • Number of permutations: number of random permutations used for p-value calculation. Bigger values increase p-value precision, but make analysis slower.
  • P-value threshold (Pmax): groups with higher (worse) p-value will be excluded from the result.
  • Result name: name and path of the output table.

Result

As the result of this analysis you will see the table where each row corresponds to the single group. The following columns are always present in the result:

  • ID: Accession number representing given group.
  • Nominal p-value: (P): P-value, calculated for the group using random permutations of the ranks: fraction of random permutations which showed better ES. Only groups for which P ≤ Pmax are included into result.
  • ES: Enrichment score (or the most extreme Kolmogorov-Smirnov score).
  • NES: Normalized enrichment score. It's ES divided by average ES for all random sets which have the same sign.
  • Number of hits (n): Number of genes from the input set matched to the group. Only groups for which n ≥ nmin are included into result.
  • Plot: Click to see the plot. The plot shows how Kolmogorov-Smirnov score (KS) depends on gene rank (r). Axis X shows gene ranks, axis Y shows KS value. KS can be defined recurrently as follows:
    Data-Enrichment-analysis-ks score.png
    Here N is the total number of genes in the input group. The value of ES is the most extreme (maximal by absolute value) KS. Plot example is shown below.
  • Hits: List of Ensembl IDs from the input set matched to the group (number of IDs is always n).

More columns may present for specific classifications (e.g. group description). Column 'Level' if present means minimal number of steps necessary to achieve the root of classification hierarchy (thus higher values mean more specific and smaller groups).

Plot example

Data-Enrichment-analysis-plot example.png

In this example plot is displayed for the set of 10 genes (N = 10) and 3 hits in the group (n = 3), which have ranks 1, 3 and 4. The most extreme KS value (which is enrichment score or ES) equals to KS(4), which can be calculated as follows:

Data-Enrichment-analysis-es calculation.png

References

  1. Gene Set Enrichment Analysis page on the Broad Institute site
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox