Paper 46: Ventilatory efficiency during constant-load test at lactate threshold intensity: Endurance versus resistance exercises
References
Albesa-Albiol L, Serra-Paya N, Garnacho-Castaño MA, Guirao Cano L, Pleguezuelos Cobo E, Mate´-Muñoz JL, et al. (2019) Ventilatory efficiency during constant-load test at lactate threshold intensity: Endurance versus resistance exercises. PLoS ONE 14(5): e0216824. https://doi.org/10.1371/journal.pone.0216824
Disclosure
This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.
Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.
Methods used in our reproducibility analyses
There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.
The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability, and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.
To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.
For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.
Results from the reproduction of the Albesa-Albiol et al. (2019) paper are presented below. An overall summary of results is provided first, followed by model-specific results displayed in tab panels. In each panel, the original results tab shows the linear regression results extracted from the published paper. The reproduced results tab presents the results reproduced from the data provided by the authors, including a comprehensive check of linear regression assumptions. The third tab displays the differences between the original and reproduced models, assessing computational reproducibility. The final tab presents sensitivity analysis which evaluates inferential reproducibility by examining whether identified assumption violations had a meaningful effect on the results.
Summary from statistical review
This paper examines ventilatory efficiency in resistance exercises. Repeated measures ANOVA was the primary analysis, and linear regression was used in conjunction with correlation to explore the relationships between variables such as ventilation and carbon dioxide. While the analysis seems to account for repeated measures (SPSS output provided), this did not seem to be the case for the regression analysis as there were many more observations than participants. The normality of the data was assessed using the Shapiro-Wilk test rather than the model residuals, and no other assumptions were discussed. Regression coefficients were not discussed, only displayed in equations on scatterplots.
Data availability and software used
The authors provided a poorly formatted Excel file spread across multiple worksheets, with data copied and pasted into arbitrary columns and no participant identifiers or formal variable names. Variables required for the regression analyses had to be extracted manually, one at a time. The regression analyses were conducted using SPSS.
Regression sample
Four regressions were shown in scatterplots with regression lines, equations and correlation coefficients. The three randomly chosen regressions were Cycle ergometer VO₂ Half-squat VO₂, and Cycle ergometer ventilatory efficiency.
Computational reproducibility results
All three regression analyses were computationally reproduced. However, reproducing the regressions for cycle and half-squat VO₂, which used ergometer_log_VE as the independent variable, was challenging. The authors provided the data in an unlabelled Excel file, making the identification of the outcome and predictor variables ambiguous and easily confused. In addition, the data were presented graphically using scatterplots in which several outlying observations were omitted, giving the appearance of a more linear relationship than was present in the full dataset. Further complicating interpretation, the x-axis was labelled as ergometer_log10_VE, although the plotted values corresponded to the raw data. This mislabelling made the reported model equations difficult to interpret, as the equations specified a log base 10 transformed predictor, whereas the transformation applied in the analyses were natural logarithms.
Inferential reproducibility results
This paper was not inferentially reproducible, as all three models exhibited violations of the linearity assumption. In the Cycle Ergometer VO₂ model, these violations were largely driven by a single extremely influential observation. Linear (BIC = −61), quadratic (BIC = −87), and cubic (BIC = −215) specifications were therefore examined. Although the quadratic and cubic models showed BIC reductions exceeding 10 relative to the linear model, none provided an adequate fit. The cubic specification appeared to be dominated by extreme observations rather than reflecting the underlying trend, while the linear and quadratic models were pulled downward at the upper end of the predictor range.
After removal of the influential observation, the linear model remained a poor fit (BIC = −193), whereas the quadratic model provided the best fit (BIC = −230). Log transformation of the variables visually improved the alignment between the fitted line and the observed data; however, one high-leverage observation continued to have disproportionate influence on the slope fitted using ordinary least squares. Robust regression was therefore applied to down-weight this observation, producing a fitted relationship that more closely reflected the central trend of the data.
The remaining two models did not exhibit extreme outliers but showed violations of linearity, with linear terms failing to adequately characterise the associations. Both models were therefore refitted with log-transformed predictor and outcome variables. The log–log specification provided an adequate final model for the second analysis. In contrast, the third model continued to display residual curvature, and a natural cubic spline with three degrees of freedom was adopted as the final specification to allow modest curvature while limiting model complexity. Further modelling is constrained by the absence of participant-level identifiers. Without these identifiers, it is not possible to fit random-effects models or to formally assess whether the observed patterns reflect within-participant clustering rather than the underlying association.
Recommended changes
- Match the correct equation to the data.
- Consider the linearity of relationships.
- Provide the data in a recognised long or wide format, with a unique participant identifier included to allow assessment of repeated measures and clustering.
- For repeated measures data, Linear Mixed Models is the default method.
- Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.
Model 1
Model results for Cycle ergometer VO₂
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | −0.8679 | |||||
Cycle_ergometer_log_VE | 5.7886 | <0.001 | ||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Cycle ergometer VO₂
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.875 | ||||||||
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Cycle ergometer VO₂
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Cycle_ergometer_log_VE | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for Cycle ergometer VO₂
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | −0.868 | 0.085 | −1.036 | −0.700 | −10.172 | <0.001 |
Cycle_ergometer_log_VE | 5.789 | 0.158 | 5.478 | 6.099 | 36.696 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval; Calculated using type III SS. | ||||||
Fit statistics for Cycle ergometer VO₂
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.884 | 0.782 | 0.781 | −99.541 | 0.210 | 1,346.579 | 1 | 376 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Cycle ergometer VO₂
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Cycle_ergometer_log_VE | 59.952 | 1 | 59.952 | 1,346.579 | <0.001 |
Residuals | 16.740 | 376 | 0.045 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
Cycle_ergometer_log_VE | −2.178 | 0.0300 | Linearity violation |
Tukey test | −2.178 | 0.0294 | Linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
- Plots and tests show an unaccounted non-linear relationship may be present.
- The outlier in this data is causing linearity problems with the regression line being pulled down so it does not fit the data.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
40.441 | <0.001 | 1 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test indicates heteroscedasticity.
- Some heteroscedasticity is present in plots, and a sensitivity analysis using weighted or robust regression or wild bootstrapping is recommended.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
Cycle ergometer VO₂ | 378 | 2.238 | 0.451 | 2.178 | 0.829 | 3.534 | −0.142 | 0.906 |
Cycle_ergometer_log_VE | 378 | 0.536 | 0.069 | 0.537 | 0.188 | 0.890 | −0.716 | 4.114 |
.fitted | 378 | 2.238 | 0.399 | 2.240 | 0.220 | 4.286 | −0.716 | 4.114 |
.resid | 378 | −0.000 | 0.211 | −0.002 | −1.956 | 0.609 | −2.465 | 20.365 |
.leverage | 378 | 0.005 | 0.007 | 0.003 | 0.003 | 0.073 | 6.399 | 54.992 |
.sigma | 378 | 0.211 | 0.002 | 0.211 | 0.183 | 0.211 | −16.364 | 291.909 |
.cooksd | 378 | 0.013 | 0.188 | 0.001 | 0.000 | 3.629 | 18.981 | 362.736 |
.std.resid | 378 | −0.000 | 1.011 | −0.010 | −9.627 | 2.992 | −2.617 | 22.382 |
dfb.1_ | 378 | 0.002 | 0.167 | −0.003 | −0.229 | 2.943 | 14.915 | 255.211 |
dfb.C___ | 378 | −0.002 | 0.172 | 0.001 | −3.043 | 0.246 | −14.901 | 255.544 |
dffit | 378 | −0.001 | 0.181 | −0.001 | −3.100 | 0.833 | −13.119 | 226.552 |
cov.r | 378 | 1.005 | 0.022 | 1.007 | 0.616 | 1.035 | −14.550 | 247.008 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.
DFBETA threshold
DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.C___ | dffit | cov.r |
|---|---|---|---|---|---|---|---|
266 | −4.359 | 0.011 | 0.105 | 0.378 | −0.410 | −0.468 | 0.921 |
106 | 3.024 | 0.071 | 0.340 | 0.831 | −0.817 | 0.833 | 1.031 |
84 | −11.075 | 0.073 | 3.629 | 2.943 | −3.043 | −3.100 | 0.616 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||
Results for outliers and influential points
Three observations had studentized residuals greater than three. Two of these did not make a substantial difference to estimates with their DFBETAS, DFFITS, and COVRATIO statistics were within conventional thresholds. The third observation was an extreme outlier, with a Cook’s distance of approximately 11 and DFBETAS and DFFITS values around three, indicating that both the regression coefficients and their standard errors would change substantially if this observation were removed. The COVRATIO for this observation was approximately 0.6, suggesting that it also affected the precision of the parameter estimates. Residual diagnostics further suggested that this observation may be contributing to violations of the linearity assumption.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.094 | 0.0025 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.838 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov test indicates residuals may not be normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot indicates the residuals are not normally distributed.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
0.602 | 0.795 | <0.001 |
Independence results
- The Durbin–Watson test suggests there are auto-correlation issues.
- The study design is not independent and should be assessed using linear mixed models or generalized estimating equations.
Assumption conclusions
The model assumptions were dominated by an extreme outlier, with a Cook’s distance of approximately 11 and DFBETAS and DFFITS values around three, indicating that both the regression coefficients and their standard errors would change substantially if this observation were removed. The COVRATIO for this observation was approximately 0.6, suggesting a material impact on the precision of the parameter estimates. Residual diagnostics further indicated that this observation may be contributing to violations of the linearity assumption.
Residual plots also suggested departures from normality, evidence of autocorrelation, and mild heteroscedasticity. In addition, the study design involved repeated measures, which were not accounted for in the analysis. Taken together, these issues indicate that the fitted model poorly represents the underlying data structure.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for Cycle ergometer VO₂
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | −0.8679 | −0.8679 | 0.0000 | Reproduced |
Cycle_ergometer_log_VE | 5.7886 | 5.7886 | 0.0000 | Reproduced |
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
Cycle_ergometer_log_VE | <0.001 | <0.001 | 0.0000 | Reproduced | Remains sig, B same direction |
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
The p-value for the model was reproduced.
Conclusion computational reproducibility
This model was computationally reproducible, with all reported statistics that were assessed being reproducible.
Methods
The model was successfully reproduced; however, residual diagnostics indicated non-linearity and the presence of influential observations that may affect both point estimates and confidence intervals. The data was standardized and fitted with and without the outlying observation to compare the differences in coefficients.
Linear, quadratic, and cubic models were therefore visualised using the same variable transformations as specified in the original model. Model fit was compared using AIC and BIC, with comparisons restricted to models fitted to the same dataset. Models with and without the influential outlier were assessed descriptively through visualisation rather than formal information-criterion comparison. To address assumption violations, the outcome variable (cycle ergometer VO₂) was log-transformed and the model refitted. Robust regression was then applied to reduce the effect of influential observations.
Results
After standardisation, the regression coefficients changed only slightly (0.88 with the outlier included vs 0.91 excluded). On this basis alone, one might conclude that the outlier does not substantially affect the results. However, this conclusion would be misleading. The apparent stability of the coefficient arises because the linear model provides a poor fit to the data both with and without the outlier, rather than because the outlier is inconsequential.
| Characteristic |
All_data
|
Outlier Removed
|
||||||
|---|---|---|---|---|---|---|---|---|
| N | Beta | 95% CI1 | p-value | N | Beta | 95% CI1 | p-value | |
| (Intercept) | 378 | 0.0000 | -0.0473, 0.0473 | >0.999 | 377 | 0.0000 | -0.0411, 0.0411 | >0.999 |
| z_Cycle_ergometer_log_VE | 378 | 0.8842 | 0.8368, 0.9315 | <0.001 | 377 | 0.9141 | 0.8729, 0.9552 | <0.001 |
| 1 CI = Confidence Interval | ||||||||
Going back to the unstandardized data, the initial model had substantial violations of regression assumptions, driven by an extreme influential outlier, as shown by fitting three models: Linear (BIC = -61), quadratic (BIC = -87), and cubic (BIC = -215). The Cubic and quadratic models showed BIC reductions of more than 10; however, none of the models provided an adequate fit. The cubic model appeared to follow individual extreme observations rather than the underlying trend, while both the linear and quadratic fits were artificially pulled downward at the upper end of the predictor range. After removal of the outlier, the linear model remained a poor representation of the data (BIC = -193), with quadratic fitting best (BIC = -230) than cubic (BIC = -224), with residual curvature and continued influence from observations at the upper end of the distribution.
Comparison of non linear models with full data
Comparison of non linear models with outlier removed
The full dataset was re-analysed with the outlier retained, applying log-transformations to both the predictor and outcome variables. Visually, the resulting log–log model performed similarly to the quadratic model fitted to the reduced dataset. In the log–log specification, ventilation (VE) was strongly associated with carbon dioxide production (VO₂), with a 1% increase in VE associated with an approximately 2.9% increase in VO₂.
Although the log–log transformation did not substantially improve the residual diagnostics, it visually improved the alignment of the fitted line with the observed data. However, one high-leverage observation exerted disproportionate influence on the OLS slope. Robust regression was therefore examined as a sensitivity analysis, downweighting this observation and producing a fitted line that more closely reflected the central trend of the data. The exponent for VE indicates that VO₂ is proportional to VE raised to the power of 3.4. This implies an elastic relationship, where proportional changes in VE are associated with amplified proportional changes in VO₂. For example, doubling VE would be associated with an approximate 2\(^{3.4}\) ≈ 10.6-fold increase in VO₂.
Log-log model with comparison of OLS and robust regression
Comparison of Robust regression with full data on log scale
Ordinary least squares | Robust regression | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
Variable | Estimate | SE | Lower | Upper | p-value | Estimate | SE | Lower | Upper | p-value |
Intercept | −0.785 | 0.038 | −0.860 | −0.710 | <0.001 | −1.014 | 0.048 | −1.107 | −0.920 | <0.001 |
Cycle_ergometer_log_VE | 2.922 | 0.071 | 2.784 | 3.061 | <0.001 | 3.357 | 0.084 | 3.192 | 3.522 | <0.001 |
Further modelling is constrained by the absence of participant-level identifiers. Without these identifiers, it is not possible to fit random-effects models or formally investigate whether the observed patterns are driven by within-participant clustering rather than the underlying association.
Model 2
Model results for Half-squat VO₂
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | −0.5116 | |||||
Half_squat_log_VE | 4.3696 | <0.001 | ||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Half-squat VO₂
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.853 | ||||||||
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Half-squat VO₂
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Half_squat_log_VE | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results Half-squat VO₂
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | −0.512 | 0.069 | −0.648 | −0.376 | −7.396 | <0.001 |
Half_squat_log_VE | 4.370 | 0.142 | 4.091 | 4.648 | 30.830 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval; Calculated using type III SS. | ||||||
Model fit for Half-squat VO₂
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.850 | 0.722 | 0.721 | −494.657 | 0.123 | 950.494 | 1 | 366 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Half-squat VO₂
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Half_squat_log_VE | 14.355 | 1 | 14.355 | 950.494 | <0.001 |
Residuals | 5.527 | 366 | 0.015 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
Half_squat_log_VE | 4.352 | <0.001 | Linearity violation |
Tukey test | 4.352 | <0.001 | Linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
Plots and tests show an unaccounted non-linear relationship may be present.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
0.006 | 0.9373 | 1 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test indicates heteroscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
Half-squat VO₂ | 368 | 1.612 | 0.233 | 1.594 | 0.913 | 2.269 | 0.248 | 0.171 |
Half_squat_log_VE | 368 | 0.486 | 0.045 | 0.493 | 0.324 | 0.601 | −0.387 | 0.741 |
.fitted | 368 | 1.612 | 0.198 | 1.645 | 0.902 | 2.113 | −0.387 | 0.741 |
.resid | 368 | −0.000 | 0.123 | −0.012 | −0.440 | 0.437 | 0.497 | 0.740 |
.leverage | 368 | 0.005 | 0.005 | 0.004 | 0.003 | 0.038 | 3.593 | 17.028 |
.sigma | 368 | 0.123 | 0.000 | 0.123 | 0.121 | 0.123 | −3.742 | 19.314 |
.cooksd | 368 | 0.003 | 0.011 | 0.001 | 0.000 | 0.134 | 8.094 | 78.252 |
.std.resid | 368 | 0.000 | 1.002 | −0.100 | −3.582 | 3.592 | 0.503 | 0.754 |
dfb.1_ | 368 | 0.000 | 0.065 | 0.000 | −0.222 | 0.506 | 2.627 | 20.490 |
dfb.H___ | 368 | −0.000 | 0.065 | 0.000 | −0.491 | 0.236 | −2.256 | 18.506 |
dffit | 368 | 0.005 | 0.084 | −0.006 | −0.202 | 0.527 | 1.878 | 7.841 |
cov.r | 368 | 1.005 | 0.009 | 1.008 | 0.939 | 1.041 | −2.275 | 11.999 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept), and n is the number of observations. In practice, this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.
DFBETA threshold
DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each model parameter. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.H___ | dffit | cov.r |
|---|---|---|---|---|---|---|---|
1 | 0.687 | 0.036 | 0.009 | 0.131 | −0.128 | 0.133 | 1.041 |
88 | −3.641 | 0.003 | 0.020 | 0.050 | −0.068 | −0.202 | 0.939 |
64 | 2.344 | 0.038 | 0.107 | 0.457 | −0.448 | 0.465 | 1.014 |
7 | 3.652 | 0.020 | 0.134 | 0.506 | −0.491 | 0.527 | 0.955 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||
Results for outliers and influential points
- Although two observations had residuals greater than three Cook’s distances, DFBETAS, DFFITS and COVRATIO were within acceptable limits, suggesting minimal influence on model estimates.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.074 | 0.0357 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.979 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov test indicates residuals may not be normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot indicates the residuals are not normally distributed.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
0.555 | 0.883 | <0.001 |
Independence results
- The Durbin–Watson test suggests there are auto-correlation issues.
- The study design is not independent and should be assessed using linear mixed models or generalized estimating equations.
Assumption conclusions
The residual diagnostics indicated departures from linearity and normality; however, the identified outliers were not considered influential. There was no indication of substantial heteroscedasticity.
Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for Half-squat VO₂
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | −0.5116 | −0.5116 | 0.0000 | Reproduced |
Half_squat_log_VE | 4.3696 | 4.3696 | 0.0000 | Reproduced |
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
Half_squat_log_VE | <0.001 | <0.001 | 0.0000 | Reproduced | Remains sig, B same direction |
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
- The p-value for this model was reproduced.
Conclusion computational reproducibility
This model was computationally reproducible, with all reported statistics that were assessed being reproducible.
Methods
The model was successfully reproduced; however, residual diagnostics indicated that the residuals were not normally distributed and non-linear. Linear, quadratic, and cubic models were fit and compared using AIC and BIC and standardize for reporting of coefficients. To address assumption violations, the outcome variable (Half squat VO₂) was log-transformed, and the model was refitted, and the residuals were re-examined.
Results
Visual inspection of the linear, quadratic and cubic models suggested pronounced curvature at the lower end of the predictor range, with the linear model being pulled down at the upper end. Model fit statistics indicated substantial improvement for the quadratic model, with reductions in AIC and BIC exceeding 10 relative to the linear model, providing strong evidence that a linear specification was inadequate.
Comparison of non-linear terms standardized coefficients
| Characteristic |
Linear
|
Quadratic
|
Cubic
|
||||||
|---|---|---|---|---|---|---|---|---|---|
| Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
| (Intercept) | -0.0007 | -0.0549, 0.0534 | 0.978 | -0.0731 | -0.1353, -0.0110 | 0.021 | -0.0648 | -0.1277, -0.0020 | 0.043 |
| z_Half_squat_log_VE | 0.8487 | 0.7945, 0.9028 | <0.001 | 0.8767 | 0.8223, 0.9310 | <0.001 | 0.9247 | 0.8443, 1.0051 | <0.001 |
| z_Half_squat_log_VE_sq | 0.0724 | 0.0397, 0.1051 | <0.001 | 0.0586 | 0.0217, 0.0954 | 0.002 | |||
| z_Half_squat_log_VE_cube | -0.0142 | -0.0317, 0.0033 | 0.112 | ||||||
| 1 CI = Confidence Interval | |||||||||
Comparison of non-linear terms orginal scales
Comparison of non-linear terms fit
Model_type | R2adj | sigma | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|
Linear | 0.721 | 0.5279874 | 1 | −286.1314 | 578.2628 | 589.9870 | 102.03009 | 366 | 368 |
Quadratic | 0.734 | 0.5155056 | 2 | −276.8238 | 561.6476 | 577.2800 | 96.99730 | 365 | 368 |
Cubic | 0.735 | 0.5144226 | 3 | −275.5451 | 561.0901 | 580.6305 | 96.32552 | 364 | 368 |
Because the X variable (Half_squat_log_VE) had already been log transformed rather than fitting a further quadratic a more simple solution was to reduce the curvature by also logging the the y variable (Half_squat_VO2). In the log–log model, the exponent for VE indicates that VO₂ is proportional to VE raised to the power of 3.4. This implies an elastic relationship, where proportional changes in VE are associated with amplified proportional changes in VO₂. For example, doubling VE would be associated with an approximate 2\(^{2.8}\) ≈ 7-fold increase in VO₂.
Model with log coefficients
| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| (Intercept) | -0.8760 | -0.9593, -0.7926 | <0.001 |
| Half_squat_log_VE | 2.7633 | 2.5925, 2.9341 | <0.001 |
| 1 CI = Confidence Interval | |||
Model with log transformed variables
Residual diagnostics for the log–log model indicated no evidence of residual nonlinearity. Although a small number of observations had relatively large residuals, Cook’s distance values were well below conventional thresholds, suggesting no undue influence on the fitted model. Further modelling is constrained by the absence of participant-level identifiers. Without these identifiers, it is not possible to fit random-effects models or formally investigate whether the observed patterns are driven by within-participant clustering rather than the underlying association.
Residual plots
Model 3
Model results for Cycle ergometer VE
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | −3.4427 | |||||
VCO2 | 27.816 | <0.001 | ||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit Statistics Cycle ergometer VE
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.892 | ||||||||
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for Cycle ergometer VE
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
VCO2 | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for Cycle ergometer VE
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | −3.443 | 1.281 | −5.961 | −0.924 | −2.687 | 0.0075 |
VCO2 | 27.816 | 0.598 | 26.641 | 28.992 | 46.525 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval; Calculated using type III SS. | ||||||
Fit statistics for Cycle ergometer VE
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.915 | 0.838 | 0.837 | 2,653.921 | 5.617 | 2,164.557 | 1 | 419 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for Cycle ergometer VE
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
VCO2 | 68,627.952 | 1 | 68,627.952 | 2,164.557 | <0.001 |
Residuals | 13,284.526 | 419 | 31.705 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
VCO2 | 4.399 | <0.001 | Linearity violation |
Tukey test | 4.399 | <0.001 | Linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
Plots and tests show an unaccounted non-linear relationship may be present.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
0.448 | 0.5033 | 1 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
Cycle ergometer VE | 421 | 54.790 | 13.965 | 51.500 | 16.100 | 115.400 | 0.752 | 1.550 |
VCO2 | 421 | 2.093 | 0.460 | 2.027 | 0.629 | 3.515 | 0.141 | 0.925 |
.fitted | 421 | 54.790 | 12.783 | 52.941 | 14.054 | 94.332 | 0.141 | 0.925 |
.resid | 421 | 0.000 | 5.624 | −0.419 | −15.779 | 41.353 | 3.261 | 19.897 |
.leverage | 421 | 0.005 | 0.004 | 0.003 | 0.002 | 0.027 | 2.566 | 6.753 |
.sigma | 421 | 5.631 | 0.033 | 5.636 | 5.261 | 5.637 | −9.667 | 99.964 |
.cooksd | 421 | 0.003 | 0.015 | 0.000 | 0.000 | 0.207 | 9.025 | 95.495 |
.std.resid | 421 | 0.000 | 1.002 | −0.074 | −2.809 | 7.358 | 3.259 | 19.840 |
dfb.1_ | 421 | 0.000 | 0.059 | −0.001 | −0.560 | 0.360 | −2.046 | 34.262 |
dfb.VCO2 | 421 | −0.000 | 0.061 | 0.001 | −0.284 | 0.620 | 3.916 | 39.777 |
dffit | 421 | 0.005 | 0.079 | −0.005 | −0.202 | 0.656 | 3.968 | 23.510 |
cov.r | 421 | 1.005 | 0.021 | 1.007 | 0.765 | 1.032 | −9.239 | 94.524 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if the ith observation is removed. Potiential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a theshold of 0.5 to 1 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potiential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.
DFBETA threshold
DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each parameter in the model. Potiential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentised residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cutoff.
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.VCO2 | dffit | cov.r |
|---|---|---|---|---|---|---|---|
346 | −0.329 | 0.025 | 0.001 | 0.046 | −0.050 | −0.053 | 1.030 |
94 | 0.368 | 0.027 | 0.002 | 0.061 | −0.058 | 0.061 | 1.032 |
311 | 7.875 | 0.004 | 0.100 | 0.360 | −0.284 | 0.478 | 0.765 |
312 | 7.875 | 0.004 | 0.100 | 0.360 | −0.284 | 0.478 | 0.765 |
232 | 3.620 | 0.017 | 0.108 | −0.388 | 0.436 | 0.471 | 0.961 |
234 | 4.364 | 0.022 | 0.207 | −0.560 | 0.620 | 0.656 | 0.940 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||
Results for outliers and influential points
- Four observations had studentized residuals > 3. They Cook’s distance, with DFBETAS and DFFITS within conventional ranges. However, two COVRATIO values were substantially < 1, but potential inflation of standard errors (wider confidence intervals). Therefore, these should be further invesigated.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.133 | <0.001 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.756 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov test indicates residuals may not be normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot indicates the residuals are not normally distributed.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
0.798 | 0.403 | <0.001 |
Independence results
- The Durbin–Watson test suggests there are auto-correlation issues.
- The study design is not independent and should be assessed using linear mixed models or generalized estimating equations.
Assumption conclusions
Residual plots indicated departures from normality. Diagnostic plots and tests also suggested potential non-linearity, indicating that the fitted linear models may not have fully captured the underlying relationships. COVRATIO values suggested reduced precision of the confidence intervals, and residuals showed evidence of autocorrelation. Given the repeated-measures study design, the assumption of independence was unlikely to hold.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for Cycle ergometer VE
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | −3.4427 | −3.4427 | 0.0000 | Reproduced |
VCO2 | 27.8160 | 27.8164 | 0.0004 | Reproduced |
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | 0.0075 | ||||
VCO2 | <0.001 | <0.001 | 0.0000 | Reproduced | Remains sig, B same direction |
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
- The p-value for this model was reproduced.
Conclusion computational reproducibility
- This model was computationally reproducible, with all reported statistics that were assessed being reproducible.
Methods
The model was successfully reproduced; however, residual diagnostics indicated departures from normality and evidence of nonlinearity. The predictors were standardized, and linear, quadratic, and cubic specifications were fitted and compared using AIC and BIC. Model visualisations were presented on the original scale, as specified by the authors. To address assumption violations, variables were log-transformed and the model refitted. Residual curvature persisted following transformation; therefore, quadratic and a spline models were subsequently compared using AIC and BIC to assess relative model fit. A natural cubic spline with three degrees of freedom was used to capture residual curvature after log-transformation, while limiting model complexity and improving stability at the extremes of the predictor range.
Results
Visual inspection of the linear, quadratic and cubic models suggested pronounced curvature at the upper end of the predictor range. Model fit statistics indicated substantial improvement for the quadratic model, with reductions in AIC and BIC exceeding 10 relative to the linear model, providing strong evidence that a linear specification was inadequate. Therefore, this model was not inferentially reproducible.
Comparison of non-linear terms standardized coefficients
| Characteristic |
Linear
|
Quadratic
|
Cubic
|
||||||
|---|---|---|---|---|---|---|---|---|---|
| Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
| (Intercept) | 0.0004 | -0.0383, 0.0390 | 0.985 | -0.0492 | -0.0930, -0.0053 | 0.028 | -0.0516 | -0.0955, -0.0077 | 0.021 |
| z_VCO2 | 0.9152 | 0.8766, 0.9539 | <0.001 | 0.9081 | 0.8702, 0.9461 | <0.001 | 0.8652 | 0.7979, 0.9324 | <0.001 |
| z_VCO2_sq | 0.0495 | 0.0274, 0.0717 | <0.001 | 0.0504 | 0.0283, 0.0726 | <0.001 | |||
| z_VCO2_cube | 0.0109 | -0.0032, 0.0249 | 0.129 | ||||||
| 1 CI = Confidence Interval | |||||||||
Comparison of non-linear terms orginal scales
Fit comparison of non-linear
Model_type | R2adj | sigma | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|
Linear | 0.837 | 0.4035717 | 1 | −214.3548 | 434.7097 | 446.8376 | 68.24257 | 419 | 421 |
Quadratic | 0.844 | 0.3950157 | 2 | −204.8304 | 417.6607 | 433.8313 | 65.22362 | 418 | 421 |
Cubic | 0.845 | 0.3943966 | 3 | −203.6658 | 417.3317 | 437.5448 | 64.86378 | 417 | 421 |
To reduce curvature, both the predictor and outcome variables were log-transformed. Residual diagnostics indicated some remaining non-linearity. Quadratic and spline models were therefore fitted and compared with the linear model using AIC and BIC, both of which showed substantially improved fit (ΔAIC and ΔBIC > 10). The spline model provided a marginally better fit than the quadratic model (ΔAIC = 6.6; ΔBIC = 2.6) and was selected as the final model, after visual inspection.
Comparison log-log model
Model_type | R2adj | sigma | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|
log Linear | 0.873 | 0.09230962 | 1 | 406.7068 | −807.4135 | −795.2856 | 3.570327 | 419 | 421 |
log Quadratic | 0.877 | 0.09095109 | 2 | 413.4517 | −818.9034 | −802.7329 | 3.457738 | 418 | 421 |
Log Spline | 0.879 | 0.09013299 | 3 | 417.7599 | −825.5198 | −805.3067 | 3.387689 | 417 | 421 |
The spline terms were all statistically significant; however, individual spline coefficients are not directly interpretable. The fitted spline indicated a clear non-linear association between log(VCO₂) and log(ventilation).
Further modelling is constrained by the absence of participant-level identifiers. Without these identifiers, it is not possible to fit random-effects models or formally investigate whether the observed patterns are driven by within-participant clustering rather than the underlying association.
Spline fit
Spline results
term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
(Intercept) | 2.832 | 0.049 | 57.401 | <0.001 |
ns(log(VCO2), df = 3)1 | 0.991 | 0.027 | 36.507 | <0.001 |
ns(log(VCO2), df = 3)2 | 2.275 | 0.109 | 20.952 | <0.001 |
ns(log(VCO2), df = 3)3 | 1.402 | 0.033 | 42.652 | <0.001 |