Difference between revisions of "Gene expression prediction"

From BioUML platform
Jump to: navigation, search
 
Line 3: Line 3:
 
!Method, code, references!!Input data!!Algorithm!!Accuracy!!Comment
 
!Method, code, references!!Input data!!Algorithm!!Accuracy!!Comment
 
|-
 
|-
|INVOKE (R script)<cite>Schmidt217</cite>
+
|INVOKE (R script)<cite>Schmidt2017</cite>
 
https://github.com/SchulzLab/TEPIC/tree/master/MachineLearningPipelines/INVOKE
 
https://github.com/SchulzLab/TEPIC/tree/master/MachineLearningPipelines/INVOKE
 
|  
 
|  
Line 24: Line 24:
 
<br>GM12878 - r =0.58
 
<br>GM12878 - r =0.58
 
|
 
|
 +
 +
 +
|-
 +
|PECA - paired expression and chromatin accessibility (MATLAB)<cite>Duren2017</cite>
 +
http://web.stanford.edu/~zduren/PECA/
 +
|
 +
|
 +
|
 +
|
 +
 +
 +
|-
 +
|
 +
|
 +
|
 +
|
 +
|
 +
  
 
|-
 
|-
Line 40: Line 58:
 
* gene expression is expressed by the log-linear regression model
 
* gene expression is expressed by the log-linear regression model
 
|mouse ESCs, r=0.806, R<sup>2</sup>=0.65, CV-R<sup>2</sup>=0.64  
 
|mouse ESCs, r=0.806, R<sup>2</sup>=0.65, CV-R<sup>2</sup>=0.64  
|
 
 
|-
 
|
 
|
 
|
 
|
 
|
 
 
|-
 
|
 
|
 
|
 
|
 
 
|
 
|
  
Line 63: Line 67:
 
<biblio>
 
<biblio>
  
#Schmidt217 pmid=27899623
+
#Schmidt2017 pmid=27899623
 +
 
 +
#Duren2017 pmid=28576882
  
 
#Ouyang2009 pmid=19995984
 
#Ouyang2009 pmid=19995984
  
 
</biblio>
 
</biblio>

Latest revision as of 22:09, 1 April 2018

Method, code, references Input data Algorithm Accuracy Comment
INVOKE (R script)[1]

https://github.com/SchulzLab/TEPIC/tree/master/MachineLearningPipelines/INVOKE

Input:

  • TF-genes scores (calculated by TEPIC)
    • open chromatin data (DNaseI-seq, NOMe-seq)
    • PWM (Jaspar, HOCOMOCO, Uniprobe)
  • expression data (RNA-seq)

Output:

  • regression coefficients for TF
  • model performance: Pearson correlation, Spearman correlation, and MSE
    • boxplot showing model performance
    • heatmap (top 10 positive and negative coefficients)
    • scatter plots for predicted versus the measured gene expression data

INVOKE offers linear regression with various regularisation techniques (Lasso, Ridge, Elastic net) to infer potentially important transcriptional regulators by predicting gene expression from TEPIC TF-gene scores.

HepG2 - r=0.68,
K562 - r=0.68,
GM12878 - r =0.58


PECA - paired expression and chromatin accessibility (MATLAB)[2]

http://web.stanford.edu/~zduren/PECA/



2009 - an approach based on feature extraction of ChIP-Seq signals, principal component analysis, and regression-based component selection [3] Input:
  • ChIP-seq data
  • expression data (RNA-seq)

Output:

  • log-linear regression model
  • principal components with weights of corresponding TFs
  • for each TF, each gene - compute a TF association strength (TFAS) - the weighted sum of the corresponding ChIP-Seq signal strength, where the weights reflect the proximity of the signal to the gene.
  • principal component analysis (PCA) to extract uncorrelated characteristic patterns in the TFAS vectors.
  • centered and standardized the TFAS matrix A is decomposed by the singular value decomposition (SVD)
  • regression-based component selection
  • gene expression is expressed by the log-linear regression model
mouse ESCs, r=0.806, R2=0.65, CV-R2=0.64


[edit] References

Error fetching PMID 27899623:
Error fetching PMID 28576882:
Error fetching PMID 19995984:
  1. Error fetching PMID 27899623: [Schmidt2017]
  2. Error fetching PMID 28576882: [Duren2017]
  3. Error fetching PMID 19995984: [Ouyang2009]
All Medline abstracts: PubMed | HubMed
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox