Difference between revisions of "Cluster analysis by K-means (analysis)"

From BioUML platform
Jump to: navigation, search
(Added 'Autogenerated pages' category)
(Automatic synchronization with BioUML)
 
(11 intermediate revisions by one user not shown)
Line 1: Line 1:
 +
;Analysis title
 +
:[[File:Statistics-Cluster-analysis-by-K-means-icon.png]] Cluster analysis by K-means
 +
;Provider
 +
:[[Institute of Systems Biology]]
 +
;Class
 +
:{{Class|ru.biosoft.analysis.ClusterAnalysis}}
 +
;Plugin
 +
:[[Ru.biosoft.analysis (plugin)|ru.biosoft.analysis (Common methods of data analysis plug-in)]]
 +
 
==== Goal: ====
 
==== Goal: ====
 
Genes are grouped into clusters so that those in one cluster exhibit maximal similarity, whereas those of different clusters are maximally dissimilar.
 
Genes are grouped into clusters so that those in one cluster exhibit maximal similarity, whereas those of different clusters are maximally dissimilar.
Line 18: Line 27:
 
==== References: ====
 
==== References: ====
  
# Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics '''21''', 768–769.
+
# Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics '''21''', 768�769.
# Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics '''28''', 100–108.
+
# Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics '''28''', 100�108.
# Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory '''28''', 128–137.
+
# Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory '''28''', 128�137.
# MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, '''1''', pp. 281–297. Berkeley, CA: University of California Press.
+
# MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, '''1''', pp. 281�297. Berkeley, CA: University of California Press.
  
 
[[Category:Analyses]]
 
[[Category:Analyses]]
[[Category:Data (analyses group)]]
+
[[Category:Statistics (analyses group)]]
 +
[[Category:ISB analyses]]
 
[[Category:Autogenerated pages]]
 
[[Category:Autogenerated pages]]

Latest revision as of 18:15, 9 December 2020

Analysis title
Statistics-Cluster-analysis-by-K-means-icon.png Cluster analysis by K-means
Provider
Institute of Systems Biology
Class
ClusterAnalysis
Plugin
ru.biosoft.analysis (Common methods of data analysis plug-in)

Contents

[edit] Goal:

Genes are grouped into clusters so that those in one cluster exhibit maximal similarity, whereas those of different clusters are maximally dissimilar.

[edit] Input:

A table of genes or probes with their expression values or fold change calculated. Depending on the algorithm, input of certain parameters is required.

[edit] Output:

A table with the same genes grouped into clusters.

[edit] Parameters:

  • Experiment data - experimental data for analysis.
    • Table - a table with experimental data stored in repository.
    • Columns - the columns from the table which should be taken for the clustering analysis.
  • Cluster algorithm - the version of the K-means algorithm to be applied [1-4].
  • Cluster number - the number of clusters into which the input data will be divided.
  • Output table - name and path in the repository under which the result table will be saved. If a table with the specified name and path already exists, it will be overwritten.

[edit] Further details:

The clustering is done with the K-means algorithm as implemented in the R package (http://www.r-project.org/).

[edit] References:

  1. Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768�769.
  2. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100�108.
  3. Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128�137.
  4. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281�297. Berkeley, CA: University of California Press.
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox