Biostatistics

Tracy-Widom statistic for the largest eigenvalue of autoscaled real matrices

Type of publication: 
NMC Publication
Authors: 
E. Saccenti, J.A. Westerhuis, A.K. Smilde, M.M.W.B. Hendriks
Published in: 
Journal of Chemometrics
Date of publication: 
2011/12
Status of the publication: 
Published/accepted

Eigenanalysis is common practice in biostatistics, and the largest eigenvalue of a data set contains valuable information about the data. However, to make inferences about the size of the largest eigenvalue, its distribution must be known.

Pages: 
2011; 25 (12): 644-652
DOI: 
10.1002/cem.1411

Simplivariate models: uncovering the underlying biology in functional genomics data

Type of publication: 
NMC Publication
Authors: 
E. Saccenti E, J.A. Westerhuis, A.K. Smilde, M.J. van der Werf, J.A. Hageman, M.M.W.B. Hendriks
Published in: 
Plos One
Date of publication: 
2011/06
Status of the publication: 
Published/accepted
Pages: 
2011; 6 (6): e20747
DOI: 
10.1371/journal.pone.0020747

Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies

Type of publication: 
NMC Publication
Authors: 
E. Szymanska, E. Saccenti, A.K. Smilde, J.A. Westerhuis
Published in: 
Metabolomics
Date of publication: 
2012/06
Status of the publication: 
Published/accepted

Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary 'dummy' y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PLS-DA analyses e.g. double cross validation procedures or permutation testing.

Publisher: 
Springer
Pages: 
2012; 8: S3–S16
DOI: 
10.1007/s11306-011-0330-3

Between metabolite relationships: an essential aspect of metabolic change

Type of publication: 
NMC Publication
Authors: 
J.J. Jansen, E. Szymańska, H.C.J. Hoefsloot, D.M. Jacobs, K. Strassburg, A.K. Smilde
Published in: 
Metabolomics
Date of publication: 
2012/06
Status of the publication: 
Published/accepted

Not only the levels of individual metabolites, but also the relations between the levels of different metabolites may indicate (experimentally induced) changes in a biological system. Component analysis methods in current ‘standard’ use for metabolomics, such as Principal Component Analysis (PCA), do not focus on changes in these relations. We therefore propose the concept of ‘Between Metabolite Relationships’ (BMRs): common changes in the covariance (or correlation) between all metabolites in an organism.

Pages: 
2012; 8(3): 422-432
DOI: 
10.1007/s11306-011-0316-1

Multiset data analysis: ANOVA simultaneous component analysis and related methods

Type of publication: 
Book chapter
Authors: 
H.C.J. Hoefsloot, D.J. Vis, J.A. Westerhuis, A.K. Smilde, J.J. Jansen
Published in: 
Comprehensive Chemometrics
Date of publication: 
2009/02
Status of the publication: 
Published/accepted

Data sets resulting from metabolomics, proteomics, or metabolic profiling experiments are usually complex. This type of data contains underlying factors, such as time, doses, or combinations thereof. Classical biostatistics methods do not take into account the structure of such complex data sets.

Book: 
Comprehensive Chemometrics
Publisher: 
Oxford: Elsevier.
Pages: 
2009; volume 2, 453-472
DOI: 
10.1016/B978-044452701-1.00054-5
Publication data (text): 
2009

On the increase of predictive performance with high-level data fusion

Type of publication: 
NMC Publication
Authors: 
T.G. Doeswijk, A.K. Smilde, J.A. Hageman, J.A. Westerhuis, F.A. van Eeuwijk
Published in: 
Analytica Chimica Acta
Date of publication: 
2011/10
Status of the publication: 
Published/accepted

The combination of the different data sources for classification purposes, also called data fusion, can be done at different levels: low-level, i.e. concatenating data matrices, medium-level, i.e. concatenating data matrices after feature selection and high-level, i.e. combining model outputs. In this paper the predictive performance of high-level data fusion is investigated. Partial least squares is used on each of the data sets and dummy variables representing the classes are used as response variables.

Pages: 
2011; 705 (1-2): 41-47
DOI: 
10.1016/j.aca.2011.03.025
Publication data (text): 
2011, 705, 1-2, 41-7

Simplivariate models: ideas and first examples

Type of publication: 
Matching Publication
Authors: 
J.A. Hageman, M.M.W.B. Hendriks, J.A. Westerhuis, M.J. van der Werf, R. Berger, A.K. Smilde
Published in: 
Plos One
Date of publication: 
2008/08
Status of the publication: 
Published/accepted

One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data.

Pages: 
2008; 871 (2): 306-313
DOI: 
10.1016/j.jchromb.2008.05.008
Publication data (text): 
2008

Discriminant Q2 (DQ2) statistic for improved discrimination in PLSDA models

Type of publication: 
Matching Publication
Authors: 
J.A. Westerhuis, E.J.J. van Velzen, H.C.J. Hoefsloot, A.K. Smilde
Published in: 
Metabolomics
Date of publication: 
2008/12
Status of the publication: 
Published/accepted

In this paper we introduce discriminant Q2 (DQ2) as an improvement for the Q2 value used in the validation of PLSDA models. DQ2 does not penalize class predictions beyond the class label value. With rigorous Monte Carlo simulations we show that when DQ2 is used, a smaller effect can be found statistically significant than when the standard Q2 is used.

Pages: 
2008; 4 (4): 293-296
DOI: 
10.1007/s11306-008-0126-2
Publication data (text): 
2008

Multilevel data analysis of a cross-over designed human nutritional study

Type of publication: 
Matching Publication
Authors: 
E.J.J. van Velzen, J.A. Westerhuis, J.P.M. van Duynhoven, F.A. van Dorsten, H.C.J. Hoefsloot, D.M. Jacobs, S. Smit, R. Draijer, C.I. Kroner, A.K. Smilde
Published in: 
Journal of Proteome Research
Date of publication: 
2008/10
Status of the publication: 
Published/accepted
Pages: 
2008;7 (10): 4483-4491
DOI: 
10.1021/pr800145j
Publication data (text): 
2008

The geometry of ASCA

Type of publication: 
Matching Publication
Authors: 
A.K. Smilde, H.C.J. Hoefsloot, J.A. Westerhuis
Published in: 
Journal of Chemometrics
Date of publication: 
2008/08
Status of the publication: 
Published/accepted

For analyzing designed high-dimensional data, no standard methods are currently available. A method that is becoming more and more popular for analyzing such data is ASCA. The mathematics of ASCA are already described elsewhere but a geometrical interpretation is still lacking. The geometry can help practitioners to understand what ASCA does and the more advanced user can get insight into the properties of the method. This paper shows the geometry of ASCA in both the row- and column-space of the matrices involved.

Pages: 
2008; 22 (8):464-471
DOI: 
10.1002/cem.1175
Publication data (text): 
2008