Paper 41: Weight loss is associated with improved quality of life among rural women completers of a web-based lifestyle intervention
References
Hageman PA, Mroz JE, Yoerger MA, Pullen CH (2019) Weight loss is associated with improved quality of life among rural women completers of a web-based lifestyle intervention. PLoS ONE 14(11): e0225446. https://doi.org/10.1371/journal.pone.0225446
Disclosure
This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.
Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.
Methods used in our reproducibility analyses
There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.
The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability, and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.
To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.
For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.
Results from the reproduction of the Hageman et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.
Summary from statistical review
This paper examined weight change and health quality of life (HQOL) in a population after a health intervention. Multivariable linear regression was used to determine whether weight loss is associated with improvements in QOL across seven health domains. The author checked raw data for normality but did not mention other assumptions, outliers or collinearity. Direction, but not the size, of the regression coefficients was interpreted.
Data availability and software used
The authors provide a wide-format SPSS dataset in the supporting information, which has an in-built data dictionary. The analyses were conducted using SPSS.
Regression sample
Seven multivariable linear regression models were reported. Three outcomes: depression, sleep disturbance, and fatigue—were selected at random for assessment. The primary predictor was percentage change in weight, with models adjusted for age, number of comorbidities, change in physical activity from baseline, intervention group, and baseline outcome scores. The authors did not report results for all covariates included in the models, presenting estimates only for the primary variable of interest, percentage weight loss.
Computational reproducibility results
This paper was computationally reproducible. All reported statistics for the depression model were successfully reproduced, and the other two models were mostly reproducible. The p-value for percentage weight loss in the sleep-disturbance model was not reproduced. However, this discrepancy was considered a typographical error, as the reported value of 0.67 was reproduced as 0.57 and did not alter the statistical significance of the result. A minor rounding discrepancy was observed in one coefficient in the fatigue model, and R2 instead of adjusted R2 was mistakenly reported for this model.
Inferential reproducibility results
All three models were inferentially reproducible. Residual diagnostics did not indicate major violations of linear model assumptions; however, structured residual patterns were observed, consistent with an ordinal measure being treated as continuous and the limited variation in baseline scores. A small number of larger standardized residuals were also present, which may contribute to wider confidence intervals but did not materially affect point estimates or inference. Although some statistics differed by 10% or more between models, these differences were not considered meaningful, as changes in standardized regression coefficients were less than 0.10. The direction of effects and statistical significance remained consistent between the reproduced models and the bootstrapped sensitivity analyses.
Recommended changes
- Provide tables in the Supporting Information that present all analyses conducted in the paper, including full model outputs such as regression coefficients and all variables used for adjustment.
- Correct the error in the p-value in the sleep-disturbance model.
- Report adjusted R2 in the fatigue model.
- Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.
Model 1
Model results for Depression change
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | ||||||
agPerWtChange | 0.13 | 0.03 | 0.24 | 0.01 | ||
Age | ||||||
comorbcatsum | ||||||
agmodup_act | ||||||
group: | ||||||
Discussion – Internet Only | ||||||
E-Mail – Internet Only | ||||||
adepsc | ||||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Depression change
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.30 | 14.09 | |||||||
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Depression change
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
agPerWtChange | |||||
Age | |||||
comorbcatsum | |||||
agmodup_act | |||||
group | |||||
adepsc | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for Depression change
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 35.127 | 4.400 | 26.452 | 43.803 | 7.983 | <0.001 |
agPerWtChange | 0.133 | 0.054 | 0.027 | 0.238 | 2.473 | 0.0142 |
Age | −0.233 | 0.062 | −0.355 | −0.112 | −3.785 | <0.001 |
comorbcatsum | 1.259 | 0.329 | 0.610 | 1.908 | 3.824 | <0.001 |
agmodup_act | 0.013 | 0.022 | −0.031 | 0.056 | 0.580 | 0.5624 |
group: | ||||||
Discussion – Internet Only | −0.352 | 0.976 | −2.275 | 1.571 | −0.361 | 0.7185 |
E-Mail – Internet Only | 0.771 | 0.964 | −1.130 | 2.672 | 0.800 | 0.4248 |
adepsc | −0.501 | 0.060 | −0.619 | −0.383 | −8.352 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Depression change
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.568 | 0.323 | 0.300 | 1,373.339 | 5.658 | 14.087 | 7 | 207 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Depression change
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
agPerWtChange | 203.349 | 1 | 203.349 | 6.116 | 0.0142 |
Age | 476.181 | 1 | 476.181 | 14.323 | <0.001 |
comorbcatsum | 486.085 | 1 | 486.085 | 14.621 | <0.001 |
agmodup_act | 11.191 | 1 | 11.191 | 0.337 | 0.5624 |
group | 44.457 | 2 | 22.229 | 0.669 | 0.5135 |
adepsc | 2,318.876 | 1 | 2,318.876 | 69.748 | <0.001 |
Residuals | 6,881.993 | 207 | 33.246 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
agPerWtChange | 0.449 | 0.6539 | No linearity violation |
Age | −0.277 | 0.7820 | No linearity violation |
comorbcatsum | 0.381 | 0.7033 | No linearity violation |
agmodup_act | −0.789 | 0.4309 | No linearity violation |
group | |||
adepsc | 1.293 | 0.1973 | No linearity violation |
Tukey test | 0.211 | 0.8329 | No linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
- No linearity violation was observed in either plots or tests.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
9.747 | 0.2034 | 7 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
Depression change | 215 | 0.020 | 6.890 | 0.000 | −16.300 | 32.300 | 0.707 | 2.309 |
agPerWtChange | 215 | −4.471 | 7.648 | −2.879 | −35.702 | 12.220 | −0.996 | 1.650 |
Age | 215 | 54.660 | 6.771 | 55.000 | 40.000 | 69.000 | 0.001 | −0.669 |
comorbcatsum | 215 | 1.474 | 1.271 | 1.000 | 0.000 | 5.000 | 0.778 | −0.073 |
agmodup_act | 215 | −2.098 | 18.624 | −0.952 | −71.016 | 47.405 | −0.404 | 1.480 |
group* | 215 | 1.963 | 0.825 | 2.000 | 1.000 | 3.000 | 0.069 | −1.535 |
adepsc | 215 | 47.368 | 6.722 | 49.000 | 41.000 | 62.200 | 0.394 | −1.337 |
.fitted | 215 | 0.020 | 3.914 | 0.896 | −9.724 | 7.592 | −0.261 | −0.884 |
.resid | 215 | −0.000 | 5.671 | −0.974 | −11.056 | 25.929 | 0.843 | 1.769 |
.leverage | 215 | 0.037 | 0.016 | 0.034 | 0.015 | 0.138 | 2.200 | 8.297 |
.sigma | 215 | 5.766 | 0.028 | 5.774 | 5.479 | 5.780 | −6.208 | 53.826 |
.cooksd | 215 | 0.005 | 0.010 | 0.002 | 0.000 | 0.098 | 5.621 | 44.016 |
.std.resid | 215 | 0.000 | 1.002 | −0.172 | −1.949 | 4.580 | 0.841 | 1.752 |
dfb.1_ | 215 | 0.000 | 0.088 | −0.003 | −0.245 | 0.666 | 2.547 | 16.953 |
dfb.aPWC | 215 | −0.000 | 0.069 | 0.001 | −0.526 | 0.196 | −2.301 | 15.355 |
dfb.Age | 215 | −0.000 | 0.083 | −0.002 | −0.617 | 0.304 | −1.833 | 15.932 |
dfb.cmrb | 215 | 0.000 | 0.068 | 0.001 | −0.294 | 0.240 | −0.333 | 3.575 |
dfb.agm_ | 215 | −0.000 | 0.061 | 0.001 | −0.315 | 0.268 | −1.033 | 7.011 |
dfb.grpD | 215 | 0.000 | 0.066 | −0.001 | −0.223 | 0.317 | 0.433 | 3.726 |
dfb.gE.M | 215 | 0.000 | 0.074 | 0.000 | −0.235 | 0.388 | 0.920 | 4.978 |
dfb.adps | 215 | −0.000 | 0.068 | 0.012 | −0.309 | 0.212 | −0.851 | 3.045 |
dffit | 215 | 0.001 | 0.199 | −0.032 | −0.440 | 0.931 | 0.911 | 2.163 |
cov.r | 215 | 1.041 | 0.068 | 1.058 | 0.459 | 1.134 | −4.218 | 27.075 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.
DFBETA threshold
DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.aPWC | dfb.Age | dfb.cmrb | dfb.agm_ | dfb.grpD | dfb.gE.M | dfb.adps | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | −0.269 | 0.086 | 0.001 | −0.020 | 0.018 | 0.004 | −0.001 | 0.074 | 0.016 | 0.019 | 0.024 | −0.082 | 1.134 |
97 | 3.133 | 0.029 | 0.035 | −0.245 | 0.035 | 0.279 | −0.141 | −0.216 | 0.033 | 0.315 | 0.063 | 0.543 | 0.738 |
157 | 1.661 | 0.138 | 0.055 | −0.139 | −0.526 | −0.040 | 0.108 | −0.315 | 0.005 | 0.123 | 0.197 | 0.666 | 1.085 |
131 | 4.820 | 0.036 | 0.098 | 0.666 | 0.068 | −0.617 | 0.115 | −0.182 | −0.038 | 0.388 | −0.309 | 0.931 | 0.459 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||||||||
Results for outliers and influential points
Two observations had studentized residuals > 3. Both had low leverage and small Cook’s distance, with DFBETAS and DFFITS within conventional ranges. The COVRATIO indicated observations that may affect confidence intervals widths.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.082 | 0.1083 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.960 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot looks roughly normal.
Assessing collinearity with VIF
Term | VIF | Tolerance |
|---|---|---|
agPerWtChange | 1.040 | 0.961 |
Age | 1.058 | 0.945 |
comorbcatsum | 1.061 | 0.942 |
agmodup_act | 1.038 | 0.963 |
group | 1.015 | 0.985 |
adepsc | 1.023 | 0.977 |
VIF = Variance Inflation Factor. | ||
Collinearity results
- All VIF values are under three, indicating no collinearity issues.
- Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
−0.063 | 2.123 | 0.3800 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- While the study design was longitudinal change scores were used, therefore, no violation of linearity.
Assumption conclusions
Residual diagnostics did not indicate major violations of linear model assumptions. however, structured residual patterns were observed, consistent with an ordinal measure being treated as continuous and the limited variation in baseline depression scores. Outlier diagnostics indicated that point estimates were unlikely to be substantially affected by influential points, but confidence-interval width could be affected and should be further investigated.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for Depression change
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 35.1272 | |||
agPerWtChange | 0.13 | 0.1326 | 0.0026 | Reproduced |
Age | −0.2332 | |||
comorbcatsum | 1.2586 | |||
agmodup_act | 0.0127 | |||
group: | ||||
Discussion – Internet Only | −0.3521 | |||
E-Mail – Internet Only | 0.7711 | |||
adepsc | −0.5010 | |||
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in lower 95% confidence intervals for coefficients
term | O_lower | R_lower | Change.lci | Reproduce.lower |
|---|---|---|---|---|
Intercept | 26.4519 | |||
agPerWtChange | 0.03 | 0.0269 | −0.0031 | Reproduced |
Age | −0.3546 | |||
comorbcatsum | 0.6097 | |||
agmodup_act | −0.0306 | |||
group: | ||||
Discussion – Internet Only | −2.2754 | |||
E-Mail – Internet Only | −1.1297 | |||
adepsc | −0.6193 | |||
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced. | ||||
Change in upper 95% confidence intervals for coefficients
term | O_upper | R_upper | Change.uci | Reproduce.upper |
|---|---|---|---|---|
Intercept | 43.8025 | |||
agPerWtChange | 0.24 | 0.2382 | −0.0018 | Reproduced |
Age | −0.1117 | |||
comorbcatsum | 1.9076 | |||
agmodup_act | 0.0561 | |||
group: | ||||
Discussion – Internet Only | 1.5711 | |||
E-Mail – Internet Only | 2.6718 | |||
adepsc | −0.3827 | |||
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced. | ||||
Change in Adjusted R2
O_R2Adj | R_R2Adj | Change.R2Adj | Reproduce.R2Adj |
|---|---|---|---|
0.300 | 0.2998 | −0.0002 | Reproduced |
O_R2Adj = original R2 Adjusted; R_R2Adj = reproduced R2 Adjusted; Change.R2Adj = change in R2Adj (R2Adj - O_R2Adj); Reproduce.R2Adj = R2 Adjusted reproduced. | |||
Change in global F
Term | O_F | R_F | Change.F | Reproduce.F |
|---|---|---|---|---|
Intercept | 14.09 | 14.0872 | −0.0028 | Reproduced |
O_F = original global F; R_F = reproduced global F; Change.F = change in R_F - O_F; Reproduce.F = Global F reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
agPerWtChange | 0.01 | 0.0142 | 0.0042 | Reproduced | Remains sig, B same direction |
Age | <0.001 | ||||
comorbcatsum | <0.001 | ||||
agmodup_act | 0.5624 | ||||
group: | |||||
Discussion – Internet Only | 0.7185 | ||||
E-Mail – Internet Only | 0.4248 | ||||
adepsc | <0.001 | ||||
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
- The p-value was reproduced.
Conclusion computational reproducibility
This model was computationally reproducible, with all reported statistics that were assessed being reproducible.
Methods
The model was successfully reproduced; however, residual diagnostics indicated a small number of observations that may contribute to wider confidence intervals. All continuous variables in the model were standardized, and inference was assessed using bootstrapped standardized regression coefficients and their corresponding 95% confidence intervals. Percentage and absolute changes in estimates and confidence-interval ranges relative to the original linear model were summarised using thresholds of 10% change and standardized coefficient differences of <0.10 and <0.20. Consistency of coefficient direction and statistical significance was also evaluated.
Bootstrapped results
A non-parametric bootstrap with bias-corrected and accelerated (BCa) confidence intervals was performed using 10,000 resamples.
Change in regression coefficients
Term | B | boot.B | B_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.0252 | −0.0263 | 0.0011 | 4.2500 | No | No | No |
z_agPerWtChange | 0.1470 | 0.1507 | −0.0038 | −2.5600 | No | No | No |
z_Age | −0.2294 | −0.2296 | 0.0002 | 0.0900 | No | No | No |
z_comorbcatsum | 0.2317 | 0.2309 | 0.0008 | 0.3600 | No | No | No |
z_agmodup_act | 0.0346 | 0.0362 | −0.0017 | −4.8300 | No | No | No |
groupDiscussion | −0.0511 | −0.0491 | −0.0020 | −3.9800 | No | No | No |
groupE-Mail | 0.1119 | 0.1123 | −0.0004 | −0.3300 | No | No | No |
z_adepsc | −0.4944 | −0.4965 | 0.0021 | 0.4300 | No | No | No |
B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in lower 95% confidence interval
Term | Lower | boot.Lower | Lower_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.2149 | −0.1854 | −0.0295 | −13.7300 | Yes | No | No |
z_agPerWtChange | 0.0298 | 0.0315 | −0.0017 | −5.6200 | No | No | No |
z_Age | −0.3489 | −0.3750 | 0.0261 | 7.4700 | No | No | No |
z_comorbcatsum | 0.1122 | 0.1172 | −0.0050 | −4.4100 | No | No | No |
z_agmodup_act | −0.0829 | −0.0743 | −0.0087 | −10.4400 | Yes | No | No |
groupDiscussion | −0.3302 | −0.3154 | −0.0148 | −4.4700 | No | No | No |
groupE-Mail | −0.1640 | −0.1546 | −0.0093 | −5.6900 | No | No | No |
z_adepsc | −0.6111 | −0.6111 | 0.0000 | 0.0000 | No | No | No |
Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in upper 95% confidence interval
Term | Upper | boot.Upper | Upper_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.1646 | 0.1463 | 0.0183 | 11.0900 | Yes | No | No |
z_agPerWtChange | 0.2641 | 0.2551 | 0.0090 | 3.4200 | No | No | No |
z_Age | −0.1099 | −0.0997 | −0.0102 | −9.2600 | No | No | No |
z_comorbcatsum | 0.3511 | 0.3491 | 0.0021 | 0.5800 | No | No | No |
z_agmodup_act | 0.1521 | 0.1284 | 0.0237 | 15.5700 | Yes | No | No |
groupDiscussion | 0.2280 | 0.2098 | 0.0182 | 7.9700 | No | No | No |
groupE-Mail | 0.3878 | 0.4137 | −0.0259 | −6.6800 | No | No | No |
z_adepsc | −0.3777 | −0.3897 | 0.0120 | 3.1800 | No | No | No |
Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in Range of 95% confidence interval
Term | Range | boot.Range | Range_Diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.3795 | 0.3317 | −0.0478 | −12.5900 | Yes | No | No |
z_agPerWtChange | 0.2343 | 0.2236 | −0.0107 | −4.5700 | No | No | No |
z_Age | 0.2390 | 0.2752 | 0.0362 | 15.1600 | Yes | No | No |
z_comorbcatsum | 0.2389 | 0.2319 | −0.0070 | −2.9300 | No | No | No |
z_agmodup_act | 0.2350 | 0.2026 | −0.0323 | −13.7600 | Yes | No | No |
groupDiscussion | 0.5582 | 0.5253 | −0.0329 | −5.9000 | No | No | No |
groupE-Mail | 0.5517 | 0.5683 | 0.0166 | 3.0000 | No | No | No |
z_adepsc | 0.2334 | 0.2214 | −0.0120 | −5.1400 | No | No | No |
Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in p-value significance and regression coefficient direction
Term | p-value | boot.p-value | changep | SigChangeDirection |
|---|---|---|---|---|
Intercept | 0.7937 | 0.7563 | 0.0375 | Remains non-sig, B same direction |
z_agPerWtChange | 0.0142 | 0.0085 | 0.0057 | Remains sig, B same direction |
z_Age | <0.001 | 0.0011 | −0.0009 | Remains sig, B same direction |
z_comorbcatsum | <0.001 | <0.001 | 0.0001 | Remains sig, B same direction |
z_agmodup_act | 0.5624 | 0.4808 | 0.0816 | Remains non-sig, B same direction |
groupDiscussion | 0.7185 | 0.7126 | 0.0058 | Remains non-sig, B same direction |
groupE-Mail | 0.4248 | 0.4397 | −0.0149 | Remains non-sig, B same direction |
z_adepsc | <0.001 | <0.001 | 0.0000 | Remains sig, B same direction |
p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model. | ||||
Check the distribution of bootstrap estimates
The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.
Conclusions based on the bootstrapped model
This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful, with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.
Model 2
Model results for Sleep disturbance change
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | ||||||
agPerWtChange | 0.03 | −0.07 | 0.13 | 0.67 | ||
Age | ||||||
comorbcatsum | ||||||
agmodup_act | ||||||
group: | ||||||
Discussion – Internet Only | ||||||
E-Mail – Internet Only | ||||||
asldsc | ||||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Sleep disturbance change
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.16 | 6.86 | |||||||
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Sleep disturbance change
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
agPerWtChange | |||||
Age | |||||
comorbcatsum | |||||
agmodup_act | |||||
group | |||||
asldsc | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results Sleep disturbance change
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 22.625 | 4.195 | 14.355 | 30.895 | 5.394 | <0.001 |
agPerWtChange | 0.030 | 0.053 | −0.074 | 0.134 | 0.564 | 0.5734 |
Age | −0.109 | 0.060 | −0.228 | 0.010 | −1.812 | 0.0714 |
comorbcatsum | 0.479 | 0.324 | −0.160 | 1.117 | 1.478 | 0.1408 |
agmodup_act | 0.006 | 0.021 | −0.036 | 0.048 | 0.292 | 0.7707 |
group: | ||||||
Discussion – Internet Only | −0.465 | 0.952 | −2.342 | 1.412 | −0.488 | 0.6258 |
E-Mail – Internet Only | 0.417 | 0.950 | −1.455 | 2.290 | 0.439 | 0.6608 |
asldsc | −0.349 | 0.053 | −0.454 | −0.243 | −6.521 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Model fit for Sleep disturbance change
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.433 | 0.188 | 0.160 | 1,373.052 | 5.572 | 6.864 | 7 | 208 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Sleep disturbance change
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
agPerWtChange | 10.256 | 1 | 10.256 | 0.318 | 0.5734 |
Age | 105.847 | 1 | 105.847 | 3.283 | 0.0714 |
comorbcatsum | 70.472 | 1 | 70.472 | 2.186 | 0.1408 |
agmodup_act | 2.747 | 1 | 2.747 | 0.085 | 0.7707 |
group | 26.478 | 2 | 13.239 | 0.411 | 0.6638 |
asldsc | 1,370.835 | 1 | 1,370.835 | 42.518 | <0.001 |
Residuals | 6,706.214 | 208 | 32.241 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
agPerWtChange | 1.731 | 0.0850 | No linearity violation |
Age | 1.479 | 0.1406 | No linearity violation |
comorbcatsum | 1.315 | 0.1899 | No linearity violation |
agmodup_act | −0.186 | 0.8525 | No linearity violation |
group | |||
asldsc | −0.555 | 0.5798 | No linearity violation |
Tukey test | −0.713 | 0.4758 | No linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
No linearity violation was observed in either plots or tests.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
10.752 | 0.1498 | 7 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
Sleep disturbance change | 216 | 0.262 | 6.197 | 0.000 | −20.400 | 31.300 | 0.568 | 3.132 |
agPerWtChange | 216 | −4.446 | 7.640 | −2.875 | −35.702 | 12.220 | −1.003 | 1.664 |
Age | 216 | 54.699 | 6.779 | 55.000 | 40.000 | 69.000 | −0.007 | −0.680 |
comorbcatsum | 216 | 1.477 | 1.268 | 1.000 | 0.000 | 5.000 | 0.773 | −0.070 |
agmodup_act | 216 | −1.961 | 18.689 | −0.940 | −71.016 | 47.405 | −0.402 | 1.437 |
group* | 216 | 1.963 | 0.823 | 2.000 | 1.000 | 3.000 | 0.068 | −1.528 |
asldsc | 216 | 48.531 | 7.327 | 48.400 | 32.000 | 68.800 | −0.430 | 0.280 |
.fitted | 216 | 0.262 | 2.684 | 0.110 | −6.993 | 7.648 | 0.207 | 0.173 |
.resid | 216 | 0.000 | 5.585 | −0.158 | −18.553 | 27.011 | 0.510 | 3.042 |
.leverage | 216 | 0.037 | 0.016 | 0.033 | 0.014 | 0.127 | 1.908 | 5.714 |
.sigma | 216 | 5.678 | 0.032 | 5.687 | 5.355 | 5.692 | −6.442 | 53.018 |
.cooksd | 216 | 0.005 | 0.015 | 0.001 | 0.000 | 0.168 | 7.363 | 65.581 |
.std.resid | 216 | 0.001 | 1.004 | −0.028 | −3.335 | 4.889 | 0.530 | 3.130 |
dfb.1_ | 216 | 0.000 | 0.071 | 0.000 | −0.297 | 0.316 | 0.043 | 4.002 |
dfb.aPWC | 216 | −0.000 | 0.073 | 0.000 | −0.481 | 0.411 | −0.986 | 14.448 |
dfb.Age | 216 | 0.000 | 0.067 | −0.001 | −0.329 | 0.232 | −0.511 | 5.335 |
dfb.cmrb | 216 | 0.000 | 0.074 | −0.001 | −0.241 | 0.619 | 3.078 | 25.828 |
dfb.agm_ | 216 | 0.000 | 0.084 | 0.000 | −0.492 | 0.562 | −0.110 | 17.911 |
dfb.grpD | 216 | −0.000 | 0.070 | −0.001 | −0.326 | 0.245 | −0.299 | 4.099 |
dfb.gE.M | 216 | −0.000 | 0.075 | 0.000 | −0.421 | 0.312 | −0.339 | 6.114 |
dfb.asld | 216 | −0.000 | 0.079 | 0.000 | −0.572 | 0.263 | −1.403 | 13.985 |
dffit | 216 | 0.003 | 0.211 | −0.005 | −0.699 | 1.230 | 1.171 | 6.941 |
cov.r | 216 | 1.042 | 0.075 | 1.059 | 0.413 | 1.164 | −4.481 | 28.054 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept), and n is the number of observations. In practice, this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.
DFBETA threshold
DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each model parameter. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.aPWC | dfb.Age | dfb.cmrb | dfb.agm_ | dfb.grpD | dfb.gE.M | dfb.asld | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
134 | −0.156 | 0.108 | 0.000 | −0.018 | 0.040 | 0.005 | −0.014 | 0.012 | −0.005 | −0.012 | 0.029 | −0.054 | 1.164 |
157 | 1.528 | 0.127 | 0.042 | −0.061 | −0.481 | −0.035 | 0.111 | −0.300 | 0.028 | 0.112 | 0.074 | 0.583 | 1.089 |
89 | 3.866 | 0.056 | 0.103 | 0.176 | 0.411 | −0.310 | −0.241 | 0.562 | −0.108 | 0.312 | 0.219 | 0.938 | 0.630 |
71 | 5.184 | 0.053 | 0.168 | 0.223 | −0.331 | 0.168 | 0.619 | 0.302 | −0.326 | −0.421 | −0.572 | 1.230 | 0.413 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||||||||
Results for outliers and influential points
Two observations had studentized residuals > 3. DFBETAS and Cook’s distance within conventional ranges but the DFFIT for the largest outlier maybe of concern. The COVRATIO indicated observations that may affect confidence intervals widths.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.055 | 0.5349 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.961 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot looks roughly normal.
Assessing collinearity with VIF
Term | VIF | Tolerance |
|---|---|---|
agPerWtChange | 1.039 | 0.962 |
Age | 1.057 | 0.946 |
comorbcatsum | 1.061 | 0.943 |
agmodup_act | 1.032 | 0.969 |
group | 1.012 | 0.988 |
asldsc | 1.012 | 0.988 |
VIF = Variance Inflation Factor. | ||
Collinearity results
- All VIF values are under three, indicating no collinearity issues.
- Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
−0.088 | 2.174 | 0.1460 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- While the study design was longitudinal change scores were used, thefore no violation of linearity.
Assumption conclusions
Residual diagnostics did not indicate major violations of linear model assumptions. however, structured residual patterns were observed, consistent with an ordinal measure being treated as continuous and the limited variation in baseline scores. Outlier diagnostics indicated that a couple obervation may be a concern and should be further investigated.
Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for Sleep disturbance change
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 22.6248 | |||
agPerWtChange | 0.03 | 0.0297 | −0.0003 | Reproduced |
Age | −0.1094 | |||
comorbcatsum | 0.4787 | |||
agmodup_act | 0.0062 | |||
group: | ||||
Discussion – Internet Only | −0.4649 | |||
E-Mail – Internet Only | 0.4175 | |||
asldsc | −0.3487 | |||
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in lower 95% confidence intervals for coefficients
term | O_lower | R_lower | Change.lci | Reproduce.lower |
|---|---|---|---|---|
Intercept | 14.3551 | |||
agPerWtChange | −0.07 | −0.0741 | −0.0041 | Reproduced |
Age | −0.2285 | |||
comorbcatsum | −0.1596 | |||
agmodup_act | −0.0359 | |||
group: | ||||
Discussion – Internet Only | −2.3415 | |||
E-Mail – Internet Only | −1.4551 | |||
asldsc | −0.4541 | |||
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced. | ||||
Change in upper 95% confidence intervals for coefficients
term | O_upper | R_upper | Change.uci | Reproduce.upper |
|---|---|---|---|---|
Intercept | 30.8945 | |||
agPerWtChange | 0.13 | 0.1335 | 0.0035 | Reproduced |
Age | 0.0096 | |||
comorbcatsum | 1.1170 | |||
agmodup_act | 0.0484 | |||
group: | ||||
Discussion – Internet Only | 1.4118 | |||
E-Mail – Internet Only | 2.2900 | |||
asldsc | −0.2433 | |||
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced. | ||||
Change in Adjusted R2
O_R2Adj | R_R2Adj | Change.R2Adj | Reproduce.R2Adj |
|---|---|---|---|
0.160 | 0.1603 | 0.0003 | Reproduced |
O_R2Adj = original R2 Adjusted; R_R2Adj = reproduced R2 Adjusted; Change.R2Adj = change in R2Adj (R2Adj - O_R2Adj); Reproduce.R2Adj = R2 Adjusted reproduced. | |||
Change in global F
Term | O_F | R_F | Change.F | Reproduce.F |
|---|---|---|---|---|
Intercept | 6.86 | 6.8637 | 0.0037 | Reproduced |
O_F = original global F; R_F = reproduced global F; Change.F = change in R_F - O_F; Reproduce.F = Global F reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
agPerWtChange | 0.67 | 0.5734 | −0.0966 | Not Reproduced | Remains non-sig, B same direction |
Age | 0.0714 | ||||
comorbcatsum | 0.1408 | ||||
agmodup_act | 0.7707 | ||||
group: | |||||
Discussion – Internet Only | 0.6258 | ||||
E-Mail – Internet Only | 0.6608 | ||||
asldsc | <0.001 | ||||
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
- The p-value was not reproduced.
Conclusion computational reproducibility
This model was mostly computationally reproducible. With a likely typographic error identified for the p-value results, although the p-value had the same interpretation, and regression coefficients did not change direction.
Methods
The model was successfully reproduced; however, residual diagnostics indicated a small number of observations that may contribute to wider confidence intervals. All continuous variables in the model were standardized, and inference was assessed using bootstrapped standardized regression coefficients and their corresponding 95% confidence intervals. Percentage and absolute changes in estimates and confidence-interval ranges relative to the original linear model were summarised using thresholds of 10% change and standardized coefficient differences of <0.10 and <0.20. Consistency of coefficient direction and statistical significance was also evaluated.
Bootstrapped results
A non-parametric bootstrap with bias-corrected and accelerated (BCa) confidence intervals was performed using 10,000 resamples.
Change in regression coefficients
Term | B | boot.B | B_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.0028 | 0.0026 | 0.0002 | 8.0500 | No | No | No |
z_agPerWtChange | 0.0366 | 0.0377 | −0.0011 | −2.9100 | No | No | No |
z_Age | −0.1197 | −0.1216 | 0.0019 | 1.5500 | No | No | No |
z_comorbcatsum | 0.0980 | 0.0981 | −0.0001 | −0.0700 | No | No | No |
z_agmodup_act | 0.0188 | 0.0197 | −0.0009 | −4.7200 | No | No | No |
groupDiscussion | −0.0750 | −0.0752 | 0.0001 | 0.1800 | No | No | No |
groupE-Mail | 0.0674 | 0.0655 | 0.0018 | 2.7300 | No | No | No |
z_asldsc | −0.4123 | −0.4136 | 0.0013 | 0.3200 | No | No | No |
B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in lower 95% confidence interval
Term | Lower | boot.Lower | Lower_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.2047 | −0.2249 | 0.0202 | 9.8500 | No | No | No |
z_agPerWtChange | −0.0914 | −0.0949 | 0.0036 | 3.9000 | No | No | No |
z_Age | −0.2500 | −0.2435 | −0.0065 | −2.6100 | No | No | No |
z_comorbcatsum | −0.0327 | −0.0163 | −0.0164 | −50.2500 | Yes | No | No |
z_agmodup_act | −0.1084 | −0.1312 | 0.0229 | 21.1000 | Yes | No | No |
groupDiscussion | −0.3779 | −0.3820 | 0.0041 | 1.1000 | No | No | No |
groupE-Mail | −0.2348 | −0.2608 | 0.0260 | 11.0700 | Yes | No | No |
z_asldsc | −0.5370 | −0.5651 | 0.0281 | 5.2300 | No | No | No |
Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in upper 95% confidence interval
Term | Upper | boot.Upper | Upper_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.2103 | 0.2521 | −0.0418 | −19.8700 | Yes | No | No |
z_agPerWtChange | 0.1646 | 0.1652 | −0.0006 | −0.3700 | No | No | No |
z_Age | 0.0105 | 0.0030 | 0.0076 | 71.9900 | Yes | No | No |
z_comorbcatsum | 0.2286 | 0.2556 | −0.0269 | −11.7700 | Yes | No | No |
z_agmodup_act | 0.1460 | 0.1669 | −0.0209 | −14.3200 | Yes | No | No |
groupDiscussion | 0.2278 | 0.2174 | 0.0104 | 4.5800 | No | No | No |
groupE-Mail | 0.3696 | 0.3785 | −0.0089 | −2.4100 | No | No | No |
z_asldsc | −0.2877 | −0.2837 | −0.0039 | −1.3600 | No | No | No |
Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in Range of 95% confidence interval
Term | Range | boot.Range | Range_Diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.4151 | 0.4770 | 0.0620 | 14.9300 | Yes | No | No |
z_agPerWtChange | 0.2560 | 0.2602 | 0.0042 | 1.6300 | No | No | No |
z_Age | 0.2605 | 0.2464 | −0.0141 | −5.4200 | No | No | No |
z_comorbcatsum | 0.2613 | 0.2718 | 0.0105 | 4.0100 | No | No | No |
z_agmodup_act | 0.2544 | 0.2982 | 0.0438 | 17.2100 | Yes | No | No |
groupDiscussion | 0.6057 | 0.5994 | −0.0063 | −1.0400 | No | No | No |
groupE-Mail | 0.6044 | 0.6393 | 0.0349 | 5.7800 | No | No | No |
z_asldsc | 0.2493 | 0.2813 | 0.0320 | 12.8400 | Yes | No | No |
Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, percentage changes were truncated at ±1000%, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in p-value significance and regression coefficient direction
Term | p-value | boot.p-value | changep | SigChangeDirection |
|---|---|---|---|---|
Intercept | 0.9789 | 0.9831 | −0.0042 | Remains non-sig, B same direction |
z_agPerWtChange | 0.5734 | 0.5681 | 0.0052 | Remains non-sig, B same direction |
z_Age | 0.0714 | 0.0533 | 0.0182 | Remains non-sig, B same direction |
z_comorbcatsum | 0.1408 | 0.1528 | −0.0120 | Remains non-sig, B same direction |
z_agmodup_act | 0.7707 | 0.7941 | −0.0235 | Remains non-sig, B same direction |
groupDiscussion | 0.6258 | 0.6254 | 0.0004 | Remains non-sig, B same direction |
groupE-Mail | 0.6608 | 0.6861 | −0.0253 | Remains non-sig, B same direction |
z_asldsc | <0.001 | <0.001 | −0.0000 | Remains sig, B same direction |
p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model. | ||||
Check distribution of bootstrap estimates
The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.
Conclusions based on the bootstrapped model
This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful, with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.
Model 3
Model results for Fatigue change
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | ||||||
agPerWtChange | 0.24 | 0.12 | 0.38 | <0.001 | ||
Age | ||||||
comorbcatsum | ||||||
agmodup_act | ||||||
group: | ||||||
Discussion – Internet Only | ||||||
E-Mail – Internet Only | ||||||
afatsc | ||||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit Statistics Fatigue change
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.20 | 7.56 | |||||||
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for Fatigue change
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
agPerWtChange | |||||
Age | |||||
comorbcatsum | |||||
agmodup_act | |||||
group | |||||
afatsc | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for Fatigue change
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 30.488 | 5.609 | 19.431 | 41.546 | 5.436 | <0.001 |
agPerWtChange | 0.249 | 0.066 | 0.120 | 0.379 | 3.794 | <0.001 |
Age | −0.233 | 0.075 | −0.381 | −0.084 | −3.086 | 0.0023 |
comorbcatsum | 0.840 | 0.404 | 0.043 | 1.637 | 2.078 | 0.0389 |
agmodup_act | −0.025 | 0.027 | −0.078 | 0.027 | −0.942 | 0.3474 |
group: | ||||||
Discussion – Internet Only | −0.728 | 1.191 | −3.076 | 1.621 | −0.611 | 0.5420 |
E-Mail – Internet Only | 0.300 | 1.189 | −2.045 | 2.645 | 0.252 | 0.8009 |
afatsc | −0.368 | 0.073 | −0.512 | −0.224 | −5.044 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Fatigue change
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.450 | 0.203 | 0.176 | 1,468.766 | 6.954 | 7.563 | 7 | 208 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for Fatigue change
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
agPerWtChange | 722.947 | 1 | 722.947 | 14.396 | <0.001 |
Age | 478.333 | 1 | 478.333 | 9.525 | 0.0023 |
comorbcatsum | 216.902 | 1 | 216.902 | 4.319 | 0.0389 |
agmodup_act | 44.540 | 1 | 44.540 | 0.887 | 0.3474 |
group | 38.165 | 2 | 19.083 | 0.380 | 0.6843 |
afatsc | 1,277.792 | 1 | 1,277.792 | 25.445 | <0.001 |
Residuals | 10,445.335 | 208 | 50.218 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
agPerWtChange | −0.010 | 0.9924 | No linearity violation |
Age | 0.498 | 0.6191 | No linearity violation |
comorbcatsum | −0.162 | 0.8718 | No linearity violation |
agmodup_act | 1.613 | 0.1083 | No linearity violation |
group | |||
afatsc | 0.079 | 0.9370 | No linearity violation |
Tukey test | 0.729 | 0.4658 | No linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
No linearity violation was observed in either plots or tests.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
5.585 | 0.5890 | 7 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
Fatigue change | 216 | −1.156 | 7.807 | 0.000 | −23.300 | 29.800 | 0.157 | 1.429 |
agPerWtChange | 216 | −4.446 | 7.640 | −2.875 | −35.702 | 12.220 | −1.003 | 1.664 |
Age | 216 | 54.699 | 6.779 | 55.000 | 40.000 | 69.000 | −0.007 | −0.680 |
comorbcatsum | 216 | 1.477 | 1.268 | 1.000 | 0.000 | 5.000 | 0.773 | −0.070 |
agmodup_act | 216 | −1.961 | 18.689 | −0.940 | −71.016 | 47.405 | −0.402 | 1.437 |
group* | 216 | 1.963 | 0.823 | 2.000 | 1.000 | 3.000 | 0.068 | −1.528 |
afatsc | 216 | 51.487 | 6.716 | 51.000 | 33.700 | 71.600 | −0.438 | 0.670 |
.fitted | 216 | −1.156 | 3.517 | −1.414 | −11.440 | 10.155 | 0.198 | 0.258 |
.resid | 216 | −0.000 | 6.970 | 0.286 | −20.702 | 26.061 | −0.019 | 1.063 |
.leverage | 216 | 0.037 | 0.016 | 0.033 | 0.014 | 0.128 | 1.870 | 5.063 |
.sigma | 216 | 7.086 | 0.031 | 7.097 | 6.855 | 7.104 | −3.785 | 19.206 |
.cooksd | 216 | 0.005 | 0.011 | 0.002 | 0.000 | 0.105 | 5.214 | 35.288 |
.std.resid | 216 | 0.001 | 1.003 | 0.041 | −2.995 | 3.784 | −0.011 | 1.091 |
dfb.1_ | 216 | 0.000 | 0.075 | −0.002 | −0.281 | 0.431 | 1.266 | 8.439 |
dfb.aPWC | 216 | 0.000 | 0.080 | 0.000 | −0.452 | 0.546 | 1.542 | 17.409 |
dfb.Age | 216 | 0.000 | 0.083 | 0.001 | −0.345 | 0.476 | 0.939 | 9.811 |
dfb.cmrb | 216 | −0.000 | 0.070 | 0.000 | −0.359 | 0.218 | −1.181 | 5.949 |
dfb.agm_ | 216 | −0.000 | 0.068 | 0.000 | −0.284 | 0.550 | 2.229 | 21.110 |
dfb.grpD | 216 | −0.000 | 0.067 | −0.001 | −0.212 | 0.377 | 0.799 | 5.319 |
dfb.gE.M | 216 | 0.000 | 0.072 | −0.000 | −0.198 | 0.340 | 0.564 | 2.970 |
dfb.afts | 216 | −0.000 | 0.063 | 0.000 | −0.245 | 0.225 | −0.373 | 3.447 |
dffit | 216 | 0.004 | 0.204 | 0.007 | −0.690 | 0.946 | 0.217 | 2.839 |
cov.r | 216 | 1.041 | 0.066 | 1.059 | 0.622 | 1.133 | −2.799 | 10.680 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if the ith observation is removed. Potentially influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice, a threshold of 0.5 to 1 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept), and n is the number of observations. In practice, this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.
DFBETA threshold
DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each model parameter. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.aPWC | dfb.Age | dfb.cmrb | dfb.agm_ | dfb.grpD | dfb.gE.M | dfb.afts | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 | −0.620 | 0.096 | 0.005 | −0.100 | 0.027 | 0.031 | −0.095 | −0.078 | −0.067 | −0.003 | 0.133 | −0.202 | 1.133 |
157 | 1.430 | 0.128 | 0.037 | −0.070 | −0.452 | −0.030 | 0.103 | −0.284 | 0.022 | 0.099 | 0.084 | 0.548 | 1.102 |
100 | 3.202 | 0.047 | 0.061 | −0.198 | −0.174 | 0.428 | 0.143 | −0.097 | 0.377 | 0.051 | −0.228 | 0.714 | 0.741 |
89 | 3.911 | 0.055 | 0.105 | 0.431 | 0.403 | −0.327 | −0.201 | 0.550 | −0.063 | 0.340 | −0.207 | 0.946 | 0.622 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||||||||
Results for outliers and influential points
Two observations had studentized residuals > 3. Both had low leverage and small Cook’s distance, with DFBETAS and DFFITS within conventional ranges. The COVRATIO indicated observations that may affect confidence intervals widths.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.054 | 0.5665 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.983 | 0.0098 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot looks roughly normal.
Assessing collinearity with VIF
Term | VIF | Tolerance |
|---|---|---|
agPerWtChange | 1.038 | 0.963 |
Age | 1.058 | 0.945 |
comorbcatsum | 1.061 | 0.943 |
agmodup_act | 1.031 | 0.970 |
group | 1.014 | 0.987 |
afatsc | 1.014 | 0.986 |
VIF = Variance Inflation Factor. | ||
Collinearity results
- All VIF values are under three, indicating no collinearity issues.
- Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
−0.050 | 2.098 | 0.4860 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- While the study design was longitudinal change scores were used, therefore no violation of linearity.
Assumption conclusions
Residual diagnostics did not indicate major violations of linear model assumptions. however, outlier diagnostics indicated that point estimates were unlikely to be substantially affected by influential points, but confidence-interval width could be affected and should be further investigated.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for Fatigue change
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 30.4885 | |||
agPerWtChange | 0.24 | 0.2492 | 0.0092 | Incorrect Rounding |
Age | −0.2328 | |||
comorbcatsum | 0.8399 | |||
agmodup_act | −0.0251 | |||
group: | ||||
Discussion – Internet Only | −0.7277 | |||
E-Mail – Internet Only | 0.3003 | |||
afatsc | −0.3681 | |||
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in lower 95% confidence intervals for coefficients
term | O_lower | R_lower | Change.lci | Reproduce.lower |
|---|---|---|---|---|
Intercept | 19.4307 | |||
agPerWtChange | 0.12 | 0.1197 | −0.0003 | Reproduced |
Age | −0.3815 | |||
comorbcatsum | 0.0432 | |||
agmodup_act | −0.0776 | |||
group: | ||||
Discussion – Internet Only | −3.0765 | |||
E-Mail – Internet Only | −2.0447 | |||
afatsc | −0.5120 | |||
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced. | ||||
Change in upper 95% confidence intervals for coefficients
term | O_upper | R_upper | Change.uci | Reproduce.upper |
|---|---|---|---|---|
Intercept | 41.5463 | |||
agPerWtChange | 0.38 | 0.3787 | −0.0013 | Reproduced |
Age | −0.0841 | |||
comorbcatsum | 1.6366 | |||
agmodup_act | 0.0274 | |||
group: | ||||
Discussion – Internet Only | 1.6210 | |||
E-Mail – Internet Only | 2.6452 | |||
afatsc | −0.2243 | |||
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced. | ||||
Change in Adjusted R2
O_R2Adj | R_R2Adj | Change.R2Adj | Reproduce.R2Adj |
|---|---|---|---|
0.200 | 0.1761 | −0.0239 | Not Reproduced |
O_R2Adj = original R2 Adjusted; R_R2Adj = reproduced R2 Adjusted; Change.R2Adj = change in R2Adj (R2Adj - O_R2Adj); Reproduce.R2Adj = R2 Adjusted reproduced. | |||
Change in global F
Term | O_F | R_F | Change.F | Reproduce.F |
|---|---|---|---|---|
Intercept | 7.56 | 7.5633 | 0.0033 | Reproduced |
O_F = original global F; R_F = reproduced global F; Change.F = change in R_F - O_F; Reproduce.F = Global F reproduced. | ||||
Change in p-values
The p-value was reproduced.
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
agPerWtChange | <0.001 | <0.001 | 0.0000 | Reproduced | Remains sig, B same direction |
Age | 0.0023 | ||||
comorbcatsum | 0.0389 | ||||
agmodup_act | 0.3474 | ||||
group: | |||||
Discussion – Internet Only | 0.5420 | ||||
E-Mail – Internet Only | 0.8009 | ||||
afatsc | <0.001 | ||||
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
Conclusion computational reproducibility
This model was mostly computationally reproducible, with minor rounding errors. P-values were reproduced and had the same interpretation, and regression coefficients did not change direction. Unadjusted R2 instead adjusted R2 was mistakely reported for this model.
Methods
The model was successfully reproduced; however, residual diagnostics indicated a small number of observations that may contribute to wider confidence intervals. All continuous variables in the model were standardized, and inference was assessed using bootstrapped standardized regression coefficients and their corresponding 95% confidence intervals. Percentage and absolute changes in estimates and confidence-interval ranges relative to the original linear model were summarised using thresholds of 10% change and standardized coefficient differences of <0.10 and <0.20. Consistency of coefficient direction and statistical significance was also evaluated.
Bootstrapping results
A non-parametric bootstrap with bias-corrected and accelerated (BCa) confidence intervals was performed using 10,000 resamples.
Change in regression coefficients
Term | B | boot.B | B_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.0179 | 0.0147 | 0.0032 | 17.7900 | Yes | No | No |
z_agPerWtChange | 0.2439 | 0.2442 | −0.0003 | −0.1200 | No | No | No |
z_Age | −0.2021 | −0.2025 | 0.0004 | 0.2000 | No | No | No |
z_comorbcatsum | 0.1365 | 0.1391 | −0.0026 | −1.9100 | No | No | No |
z_agmodup_act | −0.0601 | −0.0590 | −0.0011 | −1.7600 | No | No | No |
groupDiscussion | −0.0932 | −0.0909 | −0.0023 | −2.5000 | No | No | No |
groupE-Mail | 0.0385 | 0.0389 | −0.0004 | −1.1600 | No | No | No |
z_afatsc | −0.3167 | −0.3156 | −0.0011 | −0.3400 | No | No | No |
B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in lower 95% confidence interval
Term | Lower | boot.Lower | Lower_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.1883 | −0.1789 | −0.0094 | −4.9900 | No | No | No |
z_agPerWtChange | 0.1172 | 0.1121 | 0.0051 | 4.3600 | No | No | No |
z_Age | −0.3312 | −0.3479 | 0.0167 | 5.0400 | No | No | No |
z_comorbcatsum | 0.0070 | 0.0084 | −0.0014 | −20.0200 | Yes | No | No |
z_agmodup_act | −0.1859 | −0.1655 | −0.0204 | −10.9600 | Yes | No | No |
groupDiscussion | −0.3941 | −0.3733 | −0.0207 | −5.2600 | No | No | No |
groupE-Mail | −0.2619 | −0.2560 | −0.0059 | −2.2600 | No | No | No |
z_afatsc | −0.4405 | −0.4340 | −0.0065 | −1.4700 | No | No | No |
Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in upper 95% confidence interval
Term | Upper | boot.Upper | Upper_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.2241 | 0.2075 | 0.0166 | 7.4000 | No | No | No |
z_agPerWtChange | 0.3706 | 0.3949 | −0.0243 | −6.5600 | No | No | No |
z_Age | −0.0730 | −0.0476 | −0.0255 | −34.8700 | Yes | No | No |
z_comorbcatsum | 0.2659 | 0.2625 | 0.0034 | 1.2800 | No | No | No |
z_agmodup_act | 0.0657 | 0.0791 | −0.0134 | −20.3800 | Yes | No | No |
groupDiscussion | 0.2076 | 0.1983 | 0.0093 | 4.4800 | No | No | No |
groupE-Mail | 0.3388 | 0.3465 | −0.0077 | −2.2800 | No | No | No |
z_afatsc | −0.1929 | −0.2101 | 0.0171 | 8.8800 | No | No | No |
Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in Range of 95% confidence interval
Term | Range | boot.Range | Range_Diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.4124 | 0.3864 | −0.0260 | −6.3000 | No | No | No |
z_agPerWtChange | 0.2534 | 0.2829 | 0.0294 | 11.6100 | Yes | No | No |
z_Age | 0.2582 | 0.3004 | 0.0422 | 16.3300 | Yes | No | No |
z_comorbcatsum | 0.2589 | 0.2541 | −0.0048 | −1.8500 | No | No | No |
z_agmodup_act | 0.2516 | 0.2446 | −0.0070 | −2.7700 | No | No | No |
groupDiscussion | 0.6017 | 0.5717 | −0.0300 | −4.9900 | No | No | No |
groupE-Mail | 0.6007 | 0.6025 | 0.0018 | 0.3000 | No | No | No |
z_afatsc | 0.2475 | 0.2239 | −0.0236 | −9.5300 | No | No | No |
Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, percentage changes were truncated at ±1000%, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in p-value significance and regression coefficient direction
Term | p-value | boot.p-value | changep | SigChangeDirection |
|---|---|---|---|---|
Intercept | 0.8641 | 0.8822 | −0.0181 | Remains non-sig, B same direction |
z_agPerWtChange | <0.001 | <0.001 | −0.0004 | Remains sig, B same direction |
z_Age | 0.0023 | 0.0078 | −0.0055 | Remains sig, B same direction |
z_comorbcatsum | 0.0389 | 0.0322 | 0.0067 | Remains sig, B same direction |
z_agmodup_act | 0.3474 | 0.3363 | 0.0111 | Remains non-sig, B same direction |
groupDiscussion | 0.5420 | 0.5310 | 0.0110 | Remains non-sig, B same direction |
groupE-Mail | 0.8009 | 0.8019 | −0.0009 | Remains non-sig, B same direction |
z_afatsc | <0.001 | <0.001 | 0.0000 | Remains sig, B same direction |
p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model. | ||||
Check the distribution of bootstrap estimates
The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.
Conclusions based on the bootstrapped model
This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.