This article, titled “Applying compositional data methodology to nutritional epidemiology” by Maria Léa Corrêa Leite, presents a novel and formally appropriate statistical approach for analyzing dietary data in nutritional epidemiology. Published in Statistical Methods in Medical Research, this work addresses a critical challenge in the field: investigating the effects of specific dietary components independently of total energy intake.
The core issue lies in the compositional nature of dietary data, where measurements represent parts of a whole (e.g., percentages or proportions of macronutrients contributing to total energy). Such data are constrained to a constant sum and convey only relative information, leading to a negatively biased covariance structure and occupying a restricted space known as a D-part simplex. Consequently, standard statistical methods designed for unconstrained variables are inappropriate for analyzing raw compositional data, as they are based on Euclidean space variances and covariances, not those suitable for the simplex. Traditional multivariate analyses struggle to disentangle the specific effects of individual macronutrients from the generic effect of total energy on disease risk due to this inherent compositional constraint.
To overcome these limitations, the article advocates for a compositional data perspective, building upon the foundational work of Aitchison. This approach is centered on log-ratio transformations, which convert compositional data from the restricted simplex space to an unconstrained real space, thereby enabling the use of standard multivariate statistical techniques. While additive log-ratio (alr) and centered log-ratio (clr) transformations exist, the paper focuses on the isometric log-ratio (ilr) transformation. The ilr transformation is particularly advantageous because it preserves all simplicial metric properties by transforming compositions into real orthogonal coordinates, allowing for systematic application of usual statistical methods. A specific form of ilr coordinates, known as balances, offers a straightforward interpretation by representing the relative variation and relationships between groups of parts within the composition.
The methodology proposes the construction of regression models using ilr-transformed compositional explanatory variables. For a D-part composition, D different ilr transformations are generated, each resulting in (D-1) coordinates. A crucial aspect is that the first ilr coordinate in each transformation is designed to capture all the relevant information about a specific compositional part of interest (e.g., a particular macronutrient). These ilr coordinates are then included as covariates in regression models. Due to the orthogonal nature of the ilr bases, inferences can be specifically focused on the parameters corresponding to these first coordinates, providing clear insights into the effect of an increment in the “weight” of each nutrient within an isocaloric context. The constant term, other covariate terms, and the quality of fit of the regression models remain consistent across these different equations, reinforcing the method’s robustness.
The article illustrates this approach using data from the Italian Bollate Eye Study, a population-based study of middle-aged subjects. Logistic regression models were fitted to evaluate the effects of macronutrient intake (proteins, fats, and carbohydrates) on the odds of having metabolic syndrome (MS). The results demonstrated that the odds of having MS increased with higher dietary protein content and decreased with higher carbohydrate content in an isocaloric setting. This allowed for a distinct analysis of the compositional aspect of the diet (macronutrient content) separate from its quantitative effect (total energy intake).
This compositional data methodology offers significant advantages over traditional approaches, such as residual models or complete partition models, which often suffer from issues like collinearity, heteroscedasticity, pervasive confounding, and difficulties in interpreting the specific role of each dietary component. The ilr-based models provide intuitive interpretations of nutrient composition effects in isocaloric analyses and can overcome potential pitfalls where total energy intake itself is associated with disease, which can lead to spurious associations in other methods. Although the quality of fit was comparable, the ilr-based models showed a somewhat greater ability to explain the response based on likelihood-ratio statistics. The authors emphasize that this method aligns with the fundamental goal of nutritional epidemiology to study dietary composition (energy-adjusted intakes) in relation to disease, and also holds promise for investigating the role of different food sources of nutrients. A noted limitation, however, is that the approach is not suitable for data with zero observations, though methods exist to address this.
Reference:
Leite, M. L. C. (2016). Applying compositional data methodology to nutritional epidemiology. Statistical Methods in Medical Research, 25(6), 3057–3065. https://doi.org/10.1177/0962280214560047
