Multivariate paired data analysis: multilevel PLSDA versus OPLSDA

Metabolomics data obtained from (human) nutritional intervention studies can have a rather complex structure that depends on the underlying experimental design. In this paper we discuss the complex structure in data caused by a cross-over designed experiment. In such a design, each subject in the study population acts as his or her own control and makes the data paired. For a single univariate response a paired t-test or repeated measures ANOVA can be used to test the differences between the paired observations. The same principle holds for multivariate data. In the current paper we compare a method that exploits the paired data structure in cross-over multivariate data (multilevel PLSDA) with a method that is often used by default but that ignores the paired structure (OPLSDA). The results from both methods have been evaluated in a small simulated example as well as in a genuine data set from a cross-over designed nutritional metabolomics study. It is shown that exploiting the paired data structure underlying the cross-over design considerably improves the power and the interpretability of the multivariate solution. Furthermore, the multilevel approach provides complementary information about (I) the diversity and abundance of the treatment effects within the different (subsets of) subjects across the study population, and (II) the intrinsic differences between these study subjects.


J.A. Westerhuis, E.J.J. van Velzen, H.C.J. Hoefsloot, A.K. Smilde
Publication data (text): 
2010; 6: 119–128
Published in: 
Date of publication: 
January, 2010
Status of the publication: