Difference between revisions of "Gene expression prediction"

From BioUML platform
Jump to: navigation, search
Line 1: Line 1:
  
 
{| class="wikitable"
 
{| class="wikitable"
!Method, code, references!!Input data!!Algorithm!!Comment
+
!Method, code, references!!Input data!!Algorithm!!Accuracy!!Comment
 
|-
 
|-
 
|INVOKE (R script)<cite>Schmidt217</cite>
 
|INVOKE (R script)<cite>Schmidt217</cite>
Line 20: Line 20:
 
INVOKE offers linear regression with various regularisation techniques (Lasso, Ridge, Elastic net) to infer potentially important transcriptional regulators by predicting gene expression from TEPIC TF-gene scores.
 
INVOKE offers linear regression with various regularisation techniques (Lasso, Ridge, Elastic net) to infer potentially important transcriptional regulators by predicting gene expression from TEPIC TF-gene scores.
 
|
 
|
 +
|
 +
 +
|-
 +
|<cite>Ouyang2007</cite>
 +
|Input:
 +
* ChIP-seq data
 +
* expression data (RNA-seq)
 +
Output:
 +
* log-linear regression model
 +
* principal components with weights of corresponding TFs
 +
|
 +
* for each TF, each gene - compute a TF association strength (TFAS) - the weighted sum of the corresponding ChIP-Seq signal strength, where the weights reflect the proximity of the signal to the gene.
 +
* principal component analysis (PCA) to extract uncorrelated characteristic patterns in the TFAS vectors.
 +
* centered and standardized the TFAS matrix A is decomposed by the singular value decomposition (SVD)
 +
* regression-based component selection
 +
* gene expression is expressed by the log-linear regression model
 +
|mouse ESCs, r=0.806, R<sup>2</sup>=0.65, CV-R<sup>2</sup>=0.64
 +
|
 +
 +
|-
 +
|
 +
|
 +
|
 +
|
 +
|
 +
 +
|-
 +
|
 +
|
 +
|
 +
|
 +
|
 +
 
|}
 
|}
  
Line 28: Line 61:
  
 
#Schmidt217 pmid=27899623
 
#Schmidt217 pmid=27899623
 +
 +
#Ouyang2007 pmid=19995984
  
 
</biblio>
 
</biblio>

Revision as of 19:05, 1 April 2018

Method, code, references Input data Algorithm Accuracy Comment
INVOKE (R script)[1]

https://github.com/SchulzLab/TEPIC/tree/master/MachineLearningPipelines/INVOKE

Input:

  • TF-genes scores (calculated by TEPIC)
    • open chromatin data (DNaseI-seq, NOMe-seq)
    • PWM (Jaspar, HOCOMOCO, Uniprobe)
  • expression data (RNA-seq)

Output:

  • regression coefficients for TF
  • model performance: Pearson correlation, Spearman correlation, and MSE
    • boxplot showing model performance
    • heatmap (top 10 positive and negative coefficients)
    • scatter plots for predicted versus the measured gene expression data

INVOKE offers linear regression with various regularisation techniques (Lasso, Ridge, Elastic net) to infer potentially important transcriptional regulators by predicting gene expression from TEPIC TF-gene scores.

[2] Input:
  • ChIP-seq data
  • expression data (RNA-seq)

Output:

  • log-linear regression model
  • principal components with weights of corresponding TFs
  • for each TF, each gene - compute a TF association strength (TFAS) - the weighted sum of the corresponding ChIP-Seq signal strength, where the weights reflect the proximity of the signal to the gene.
  • principal component analysis (PCA) to extract uncorrelated characteristic patterns in the TFAS vectors.
  • centered and standardized the TFAS matrix A is decomposed by the singular value decomposition (SVD)
  • regression-based component selection
  • gene expression is expressed by the log-linear regression model
mouse ESCs, r=0.806, R2=0.65, CV-R2=0.64


References

Error fetching PMID 27899623:
Error fetching PMID 19995984:
  1. Error fetching PMID 27899623: [Schmidt217]
  2. Error fetching PMID 19995984: [Ouyang2007]
All Medline abstracts: PubMed | HubMed
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox