Genetic algorithm based two-mode clustering of metabolomics data

Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources.

Authors: 
J.A. Hageman, R.A. van den Berg, J.A. Westerhuis, M.J. van der Werf, A.K. Smilde
DOI: 
10.1007/s11306-008-0105-7
Pages: 
2008; 4 (2): 141-149
Published in: 
Metabolomics
Date of publication: 
March, 2008
Status of the publication: 
Published/accepted