Paper 30: Smoking and other determinants of bone turnover

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

March 15, 2026

References

Jorde R, Stunes AK, Kubiak J, Grimnes G, Thorsby PM, Syversen U (2019) Smoking and other determinants of bone turnover. PLoS ONE 14 (11): e0225539. https://doi.org/10.1371/journal.pone.0225539

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.n Criterion (BIC). When the Gaussian distribution was not appropriate, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Results from the reproduction of the Jorde et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.

Summary from statistical review

This paper is an exploratory analysis of factors associated with bone turnover (bone resorption and formation). Stepwise linear regression was used to explore turnover biomarker markers. The authors checked for normality of continuous dependent variables. Linearity is not discussed, but scatterplots are shown. Tables were labelled with the tests used. Regression coefficients are not interpreted, conclusions are made in terms of direction and statistical significance, and there is no discussion of false positives with multiple outcomes. Collinearity was not mentioned.

Data availability and software used

Data were provided in the Supporting Information as a wide-format SPSS (.sav) file containing an embedded data dictionary. Regression analyses were performed using SPSS.

Regression sample

Seven linear regression models were fitted to evaluate predictors of bone mineral density and related biomarkers. Sex, age, BMI, and smoking status were included a priori in all models, while metabolic and biochemical markers were evaluated as candidate covariates using stepwise regression. Three outcomes were randomly selected for further evaluation: log-transformed cross-linked telopeptide of type 1 collagen (lg-CTX-1), tumour necrosis factor-α (TNF-α), and log-transformed osteoprotegerin (lg-OPG).

Computational reproducibility results

This paper were mostly computationally reproduced. The authors reported adjusted R² but when compared this corresponded to the unadjusted R². There were minor discrepancies in the second and third models attributable to typographic errors in the sex variable related to incorrect decimal point placement. The paper was straightforward to reproduce, as the data were well formatted in SPSS and required no additional data cleaning or reformatting.

Inferential reproducibility results

The models assessed for inferential reproducibility were reproducible overall. Only minor violations of model assumptions were identified, including homoscedasticity, linearity, and normality. Statistically significant quadratic terms were observed for the lg-CTX-1 and lg-OPG models; however, diagnostic plots and model fit statistics did not provide substantive support for meaningful non-linear relationships. The study design appeared independent; however, the lg-CTX-1 model showed a significant Durbin–Watson test. Further inspection suggested that this was likely due to post-randomisation sorting of control and intervention cases rather than true autocorrelation. All three models had statistically significant Shapiro–Wilk tests for residual normality; however, Q–Q plots and the Kolmogorov–Smirnov test indicated approximately normally distributed residuals. Mild heteroscedasticity was observed for the TNF-α and lg-OPG models. Bootstrapped comparisons showed that although some statistics differed by 10% or more, these changes were not meaningful, with differences in standardized regression coefficients remaining below 0.1. The direction of effects and statistical significance were consistent between the reproduced and bootstrapped models.

Model 1

Model results for lg.ctx

Term

B

SE

Lower

Upper

t

p-value

Intercept

sex:

Male – Female

−0.006

−0.045

0.033

0.764

age.years

0.001

0.000

0.003

0.126

BMI.base.kg.m2

−0.008

−0.011

−0.004

<0.001

smoking.status:

Smoker – Non-smoker

−0.044

−0.085

−0.004

0.032

Creatinine.umol.L.base

0.003

0.001

0.004

0.002

PTH.pmol.L.base

0.010

0.002

0.018

0.020

Calcium.mmol.L.base

0.264

0.041

0.486

0.020

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for lg.ctx

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.134

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for lg.ctx

Term

SS

DF

MS

F

p-value

sex

age.years

BMI.base.kg.m2

smoking.status

Creatinine.umol.L.base

PTH.pmol.L.base

Calcium.mmol.L.base

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for lg.ctx

Term

B

SE

Lower

Upper

t

p-value

Intercept

−1.156

0.263

−1.674

−0.639

−4.393

<0.001

sex:

Male – Female

−0.006

0.020

−0.045

0.033

−0.300

0.7644

age.years

0.001

0.001

−0.000

0.003

1.535

0.1257

BMI.base.kg.m2

−0.008

0.002

−0.011

−0.004

−4.553

<0.001

smoking.status:

Smoker – Non-smoker

−0.044

0.021

−0.085

−0.004

−2.155

0.0318

Creatinine.umol.L.base

0.003

0.001

0.001

0.004

3.058

0.0024

PTH.pmol.L.base

0.010

0.004

0.002

0.018

2.344

0.0196

Calcium.mmol.L.base

0.264

0.113

0.041

0.486

2.330

0.0203

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for lg.ctx

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.365

0.134

0.118

−319.955

0.159

8.721

7

396

<0.001

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for lg.ctx

Term

SS

DF

MS

F

p-value

sex

0.002

1

0.002

0.090

0.7644

age.years

0.061

1

0.061

2.355

0.1257

BMI.base.kg.m2

0.536

1

0.536

20.731

<0.001

smoking.status

0.120

1

0.120

4.642

0.0318

Creatinine.umol.L.base

0.242

1

0.242

9.354

0.0024

PTH.pmol.L.base

0.142

1

0.142

5.495

0.0196

Calcium.mmol.L.base

0.140

1

0.140

5.427

0.0203

Residuals

10.247

396

0.026

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

sex

age.years

−0.536

0.5923

No linearity violation

BMI.base.kg.m2

1.893

0.0590

No linearity violation

smoking.status

Creatinine.umol.L.base

−0.445

0.6566

No linearity violation

PTH.pmol.L.base

−0.589

0.5559

No linearity violation

Calcium.mmol.L.base

−1.977

0.0488

Linearity violation

Tukey test

1.145

0.2521

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results

Although the quadratic term for baseline calcium (mmol/L) was statistically significant, there was no clear evidence of curvature in the residual plots. In addition, no systematic patterns were observed in the fitted-versus-residual plots.

Testing for homoscedasticity

Statistic

p-value

Parameter

Method

10.517

0.1611

7

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

lg.ctx

404

−0.463

0.171

−0.469

−0.959

0.534

0.328

2.288

sex*

404

1.525

0.500

2.000

1.000

2.000

−0.099

−1.995

age.years

404

51.817

8.666

50.000

40.000

79.000

0.741

−0.103

BMI.base.kg.m2

404

27.759

4.909

27.035

18.456

48.847

0.919

1.430

smoking.status*

404

1.213

0.410

1.000

1.000

2.000

1.398

−0.047

Creatinine.umol.L.base

404

71.252

12.387

70.000

41.000

120.000

0.509

0.399

PTH.pmol.L.base

404

6.731

2.020

6.400

3.100

17.800

1.362

3.289

Calcium.mmol.L.base

404

2.273

0.073

2.270

2.060

2.530

0.139

0.169

.fitted

404

−0.463

0.063

−0.458

−0.699

−0.265

−0.303

0.763

.resid

404

−0.000

0.159

0.006

−0.462

0.906

0.333

2.094

.leverage

404

0.020

0.012

0.017

0.006

0.107

2.490

10.176

.sigma

404

0.161

0.000

0.161

0.154

0.161

−9.912

144.001

.cooksd

404

0.003

0.006

0.001

0.000

0.085

8.797

119.003

.std.resid

404

−0.000

1.002

0.036

−2.885

5.691

0.332

2.086

dfb.1_

404

−0.000

0.050

0.000

−0.252

0.196

−0.408

4.496

dfb.sxMl

404

−0.000

0.053

0.000

−0.437

0.191

−1.164

12.223

dfb.ag.y

404

0.000

0.050

0.000

−0.276

0.242

−0.322

5.244

dfb.BMI.

404

0.000

0.054

−0.001

−0.421

0.219

−1.142

13.187

dfb.sm.S

404

−0.000

0.052

−0.000

−0.243

0.210

−0.315

5.248

dfb.Cr..L.

404

0.000

0.048

0.000

−0.209

0.281

0.263

5.217

dfb.PTH.

404

−0.000

0.054

0.000

−0.314

0.269

−0.789

10.220

dfb.Cl..L.

404

−0.000

0.051

0.000

−0.226

0.298

0.637

5.739

dffit

404

−0.001

0.146

0.004

−0.415

0.858

0.314

2.822

cov.r

404

1.021

0.038

1.028

0.526

1.115

−6.309

73.681

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.

DFBETA threshold

DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentised residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance) of the estimated regression coefficients if the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 indicate inflated variances and reduced precision (wider confidence intervals), whereas values above 1 indicate deflated variances and increased precision (narrower confidence intervals). A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.sxMl

dfb.ag.y

dfb.BMI.

dfb.sm.S

dfb.Cr..L.

dfb.PTH.

dfb.Cl..L.

dffit

cov.r

90

2.952

0.008

0.008

−0.022

−0.134

0.030

0.097

−0.053

0.014

0.054

0.001

0.259

0.864

246

1.346

0.070

0.017

0.001

0.008

−0.079

0.069

0.203

0.108

0.269

−0.054

0.369

1.058

185

−1.092

0.107

0.018

0.196

0.004

0.018

−0.018

0.013

0.080

−0.313

−0.187

−0.377

1.115

267

−1.704

0.056

0.021

0.161

−0.036

0.116

0.036

−0.213

−0.061

−0.314

−0.143

−0.415

1.019

130

5.932

0.021

0.085

−0.211

−0.437

−0.075

−0.421

−0.137

0.281

−0.249

0.298

0.858

0.526

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points

Two observations had studentized residuals near or above 3. Both points had low leverage and small Cook’s distance, with DFBETAS and DFFITS within conventional ranges. However, their COVRATIO values were substantially < 1, suggesting little effect on point estimates but potential inflation of standard errors (wider confidence intervals).

Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.030

0.8513

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.982

<0.001

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov supports residuals being normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • QQ-plot looks roughly normal.
Assessing collinearity with VIF

Term

VIF

Tolerance

sex

1.545

0.647

age.years

1.034

0.967

BMI.base.kg.m2

1.072

0.933

smoking.status

1.114

0.898

Creatinine.umol.L.base

1.636

0.611

PTH.pmol.L.base

1.094

0.914

Calcium.mmol.L.base

1.054

0.949

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

0.367

1.261

<0.001

Independence results
  • The Durbin–Watson test suggests there are auto-correlation issues.
  • The Durbin–Watson test was used to assess first-order autocorrelation, noting that the test assumes observations are ordered sequentially. Apparent violations may therefore arise if the dataset has been sorted after randomisation rather than reflecting true serial correlation. In this dataset, observations appear in blocks of approximately 22 control cases followed by intervention cases. This pattern may reflect post-randomisation sorting of the file, but could also indicate issues with the randomisation process itself. Consequently, results from the Durbin–Watson test were interpreted with caution.
  • The study design seems to be independent.
Assumption conclusions

Visual inspection of Q–Q plots and results from the Kolmogorov–Smirnov test suggested that residuals were approximately normally distributed, although the Shapiro–Wilk test indicated some departure from normality. The study design appeared independent; however, the model showed a statistically significant Durbin–Watson test. Further inspection suggested that this was likely due to block ordering of control and intervention cases resulting from post-randomisation sorting of the dataset rather than true autocorrelation.

No substantial deviations from homoscedasticity were observed. Outlier diagnostics indicated that point estimates were unlikely to be materially influenced by individual observations; however, confidence-interval widths may be affected and warrant further investigation. Minor departures from linearity were detected by formal significance tests, but no clear curvature was evident in the residual plots. Sensitivity analyses comparing linear and non-linear model specifications are therefore recommended.

Forest plot showing original and reproduced coefficients and 95% confidence intervals for lg.ctx

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

−1.1563

sex:

Male – Female

−0.006

−0.0060

0.0000

Reproduced

age.years

0.001

0.0014

0.0004

Reproduced

BMI.base.kg.m2

−0.008

−0.0077

0.0003

Reproduced

smoking.status:

Smoker – Non-smoker

−0.044

−0.0445

−0.0005

Reproduced

Creatinine.umol.L.base

0.003

0.0025

−0.0005

Reproduced

PTH.pmol.L.base

0.010

0.0097

−0.0003

Reproduced

Calcium.mmol.L.base

0.264

0.2637

−0.0003

Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in lower 95% confidence intervals for coefficients

term

O_lower

R_lower

Change.lci

Reproduce.lower

Intercept

−1.6737

sex:

Male – Female

−0.045

−0.0451

−0.0001

Reproduced

age.years

0.000

−0.0004

−0.0004

Reproduced

BMI.base.kg.m2

−0.011

−0.0110

0.0000

Reproduced

smoking.status:

Smoker – Non-smoker

−0.085

−0.0850

0.0000

Reproduced

Creatinine.umol.L.base

0.001

0.0009

−0.0001

Reproduced

PTH.pmol.L.base

0.002

0.0016

−0.0004

Reproduced

Calcium.mmol.L.base

0.041

0.0412

0.0002

Reproduced

O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced.

Change in upper 95% confidence intervals for coefficients

term

O_upper

R_upper

Change.uci

Reproduce.upper

Intercept

−0.6389

sex:

Male – Female

0.033

0.0332

0.0002

Reproduced

age.years

0.003

0.0033

0.0003

Reproduced

BMI.base.kg.m2

−0.004

−0.0044

−0.0004

Reproduced

smoking.status:

Smoker – Non-smoker

−0.004

−0.0039

0.0001

Reproduced

Creatinine.umol.L.base

0.004

0.0042

0.0002

Reproduced

PTH.pmol.L.base

0.018

0.0179

−0.0001

Reproduced

Calcium.mmol.L.base

0.486

0.4863

0.0003

Reproduced

O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced.

Change in adjusted R2

O_R2Adj

R_R2Adj

Change.R2Adj

Reproduce.R2Adj

0.134

0.1182

−0.0158

Not Reproduced

O_R2Adj = original R2 Adjusted; R_R2Adj = reproduced R2 Adjusted; Change.R2Adj = change in R2Adj (R2Adj - O_R2Adj); Reproduce.R2Adj = R2 Adjusted reproduced.

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

sex:

Male – Female

0.764

0.7644

0.0004

Reproduced

Remains non-sig, B same direction

age.years

0.126

0.1257

−0.0003

Reproduced

Remains non-sig, B same direction

BMI.base.kg.m2

<0.001

<0.001

0.0000

Reproduced

Remains sig, B same direction

smoking.status:

Smoker – Non-smoker

0.032

0.0318

−0.0002

Reproduced

Remains sig, B same direction

Creatinine.umol.L.base

0.002

0.0024

0.0004

Reproduced

Remains sig, B same direction

PTH.pmol.L.base

0.020

0.0196

−0.0004

Reproduced

Remains sig, B same direction

Calcium.mmol.L.base

0.020

0.0203

0.0003

Reproduced

Remains sig, B same direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman plot showing differences between original and reproduced p-values for lg.ctx

Results for p-values

The mean difference for p-values in this model was close to zero as they were all reproduced.

Conclusion computational reproducibility

This model was mostly computationally reproduced. Regression coefficients and confidence intervals were reproduced. However, the authors reported adjusted R² but when compared this corresponded to the unadjusted R². P-values were reproduced and had the same interpretation, and regression coefficients did not change direction.

Methods

The model was successfully reproduced; however, there were indications that the residuals may not be normally distributed. Diagnostic checks suggested a minor deviation from linearity for one variable. A quadratic (squared) term was fitted, and AIC and BIC were used to compare model fit with the linear specification. If the linear model was retained, to further verify the findings, non-parametric BCa bootstrapping (10,000 resamples) was used to obtain standardised coefficients and 95% confidence intervals (CIs), and CI widths were examined. Percentage and absolute changes in estimates and confidence-interval bounds relative to the linear model were summarised using thresholds of 10% change and standardised coefficient differences of <0.10 and <0.20. Coefficient directions were assessed for concordance, and the stability of statistical significance was evaluated.

Results

Comparison of linear and squared model
Characteristic
Linear
Squared Calcium
Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 0.0706 -0.0822, 0.2233 0.364 0.1396 -0.0273, 0.3066 0.101
sex





    Female

    Male -0.0348 -0.2630, 0.1934 0.764 -0.0510 -0.2790, 0.1769 0.660
z_age.years 0.0728 -0.0205, 0.1661 0.126 0.0688 -0.0242, 0.1618 0.147
z_BMI.base.kg.m2 -0.2203 -0.3154, -0.1252 <0.001 -0.2206 -0.3153, -0.1258 <0.001
smoking.status





    Non-smoker

    Smoker -0.2591 -0.4955, -0.0227 0.032 -0.2443 -0.4803, -0.0083 0.042
z_Creatinine.umol.L.base 0.1825 0.0652, 0.2997 0.002 0.1768 0.0598, 0.2938 0.003
z_PTH.pmol.L.base 0.1151 0.0186, 0.2117 0.020 0.1271 0.0302, 0.2240 0.010
z_Calcium.mmol.L.base 0.1121 0.0175, 0.2066 0.020 0.1230 0.0282, 0.2179 0.011
z_Calcium.mmol.L.base2


-0.0642 -0.1281, -0.0004 0.049
1 CI = Confidence Interval
Comparison of linear and squared model fit

Model_type

R2adj

sigma

df

logLik

AIC

BIC

deviance

df.residual

nobs

Linear

0.118

0.93747

7

−543.13

1,104.3

1,140.3

348.03

396

404

Squared Calcium

0.125

0.93405

8

−541.14

1,102.3

1,142.3

344.62

395

404

Linearity conclusion

After re-running the models and re-evaluating assumptions, statistically significant deviations from linearity were identified. However, these did not improve model fit, as indicated by similar AIC and Adjusted R2 values. Visual inspection suggested that the apparent non-linearity arose from sparse data at the higher end of the predictor ranges rather than a meaningful improvement in model performance.

Bootstrapped results

As the inclusion of non-linear terms did not improve model fit, the original model was retained. Further model assessment was conducted using Non-parametric BCa bootstrapping with 10,000 resamples.

Change in regression coefficients

Term

B

boot.B

B_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.0706

0.0718

−0.0013

−1.8200

No

No

No

sexMale

−0.0348

−0.0360

0.0012

3.4600

No

No

No

z_age.years

0.0728

0.0729

−0.0001

−0.1400

No

No

No

z_BMI.base.kg.m2

−0.2203

−0.2216

0.0013

0.6000

No

No

No

smoking.statusSmoker

−0.2591

−0.2611

0.0020

0.7800

No

No

No

z_Creatinine.umol.L.base

0.1825

0.1820

0.0004

0.2400

No

No

No

z_PTH.pmol.L.base

0.1151

0.1152

−0.0001

−0.0700

No

No

No

z_Calcium.mmol.L.base

0.1121

0.1133

−0.0012

−1.0900

No

No

No

B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in lower 95% confidence interval

Term

Lower

boot.Lower

Lower_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.0822

−0.0994

0.0173

21.0300

Yes

No

No

sexMale

−0.2630

−0.2823

0.0193

7.3400

No

No

No

z_age.years

−0.0205

−0.0172

−0.0033

−15.8900

Yes

No

No

z_BMI.base.kg.m2

−0.3154

−0.3219

0.0065

2.0700

No

No

No

smoking.statusSmoker

−0.4955

−0.5006

0.0052

1.0400

No

No

No

z_Creatinine.umol.L.base

0.0652

0.0720

−0.0069

−10.5200

Yes

No

No

z_PTH.pmol.L.base

0.0186

0.0161

0.0025

13.3300

Yes

No

No

z_Calcium.mmol.L.base

0.0175

0.0194

−0.0019

−11.0300

Yes

No

No

Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in upper 95% confidence interval

Term

Upper

boot.Upper

Upper_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.2233

0.2532

−0.0299

−13.3900

Yes

No

No

sexMale

0.1934

0.1961

−0.0027

−1.4000

No

No

No

z_age.years

0.1661

0.1636

0.0025

1.4900

No

No

No

z_BMI.base.kg.m2

−0.1252

−0.1250

−0.0002

−0.1200

No

No

No

smoking.statusSmoker

−0.0227

−0.0193

−0.0034

−14.8700

Yes

No

No

z_Creatinine.umol.L.base

0.2997

0.2948

0.0050

1.6600

No

No

No

z_PTH.pmol.L.base

0.2117

0.2180

−0.0064

−3.0200

No

No

No

z_Calcium.mmol.L.base

0.2066

0.2114

−0.0048

−2.3200

No

No

No

Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in Range of 95% confidence interval

Term

Range

boot.Range

Range_Diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.3054

0.3526

0.0472

15.4500

Yes

No

No

sexMale

0.4564

0.4785

0.0220

4.8300

No

No

No

z_age.years

0.1865

0.1808

−0.0057

−3.0700

No

No

No

z_BMI.base.kg.m2

0.1902

0.1969

0.0067

3.5200

No

No

No

smoking.statusSmoker

0.4728

0.4813

0.0085

1.8000

No

No

No

z_Creatinine.umol.L.base

0.2346

0.2227

−0.0118

−5.0400

No

No

No

z_PTH.pmol.L.base

0.1931

0.2019

0.0089

4.5900

No

No

No

z_Calcium.mmol.L.base

0.1891

0.1920

0.0029

1.5100

No

No

No

Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in p-value significance and regression coefficient direction

Term

p-value

boot.p-value

changep

SigChangeDirection

Intercept

0.3643

0.4254

−0.0611

Remains non-sig, B same direction

sexMale

0.7644

0.7690

−0.0045

Remains non-sig, B same direction

z_age.years

0.1257

0.1128

0.0129

Remains non-sig, B same direction

z_BMI.base.kg.m2

<0.001

<0.001

−0.0000

Remains sig, B same direction

smoking.statusSmoker

0.0318

0.0349

−0.0031

Remains sig, B same direction

z_Creatinine.umol.L.base

0.0024

0.0013

0.0011

Remains sig, B same direction

z_PTH.pmol.L.base

0.0196

0.0257

−0.0061

Remains sig, B same direction

z_Calcium.mmol.L.base

0.0203

0.0184

0.0019

Remains sig, B same direction

p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model.

Check the distribution of bootstrap estimates

The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.

Conclusions based on the bootstrapped model

This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.

Model 2

Model results for TNF.pg.ml.base

Term

B

SE

Lower

Upper

t

p-value

Intercept

sex:

Male – Female

0.035

0.144

0.465

<0.001

age.years

0.001

−0.008

0.010

0.837

BMI.base.kg.m2

0.014

−0.006

0.034

0.176

smoking.status:

Smoker – Non-smoker

0.085

−0.102

0.272

0.370

HOMA.IR.base

0.043

0.005

0.081

0.027

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for TNF.pg.ml.base

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.095

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for TNF.pg.ml.base

Term

SS

DF

MS

F

p-value

sex

age.years

BMI.base.kg.m2

smoking.status

HOMA.IR.base

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results TNF.pg.ml.base

Term

B

SE

Lower

Upper

t

p-value

Intercept

1.640

0.371

0.911

2.369

4.422

<0.001

sex:

Male – Female

0.305

0.082

0.144

0.465

3.736

<0.001

age.years

0.001

0.005

−0.008

0.010

0.206

0.8366

BMI.base.kg.m2

0.014

0.010

−0.006

0.034

1.354

0.1764

smoking.status:

Smoker – Non-smoker

0.085

0.095

−0.102

0.272

0.897

0.3704

HOMA.IR.base

0.043

0.019

0.005

0.081

2.225

0.0266

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Model fit for TNF.pg.ml.base

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.308

0.095

0.084

954.798

0.773

8.388

5

399

<0.001

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for TNF.pg.ml.base

Term

SS

DF

MS

F

p-value

sex

8.465

1

8.465

13.956

<0.001

age.years

0.026

1

0.026

0.043

0.8366

BMI.base.kg.m2

1.112

1

1.112

1.834

0.1764

smoking.status

0.488

1

0.488

0.804

0.3704

HOMA.IR.base

3.003

1

3.003

4.952

0.0266

Residuals

242.008

399

0.607

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

sex

age.years

−1.897

0.0586

No linearity violation

BMI.base.kg.m2

−1.459

0.1454

No linearity violation

smoking.status

HOMA.IR.base

−1.512

0.1314

No linearity violation

Tukey test

−1.333

0.1826

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results

No linearity violation was observed in either plots or tests.

Testing for homoscedasticity

Statistic

p-value

Parameter

Method

9.455

0.0922

5

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • Some heteroscedasticity is present in plots, and a sensitivity analysis using weighted or robust regression or wild bootstrapping is recommended.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

TNF.pg.ml.base

405

2.400

0.814

2.360

0.030

5.320

0.320

0.484

sex*

405

1.523

0.500

2.000

1.000

2.000

−0.094

−1.996

age.years

405

51.840

8.667

50.000

40.000

79.000

0.734

−0.116

BMI.base.kg.m2

405

27.774

4.912

27.053

18.456

48.847

0.910

1.403

smoking.status*

405

1.212

0.409

1.000

1.000

2.000

1.402

−0.036

HOMA.IR.base

405

3.476

2.701

2.738

0.444

21.207

2.469

8.915

.fitted

405

2.400

0.251

2.401

1.997

3.446

0.743

1.156

.resid

405

−0.000

0.774

−0.037

−2.410

2.528

0.223

0.435

.leverage

405

0.015

0.011

0.012

0.006

0.128

4.613

33.271

.sigma

405

0.779

0.002

0.779

0.769

0.780

−3.066

11.679

.cooksd

405

0.003

0.006

0.001

0.000

0.080

7.017

78.842

.std.resid

405

−0.000

1.002

−0.048

−3.116

3.271

0.224

0.431

dfb.1_

405

0.000

0.047

0.001

−0.228

0.178

−0.348

3.974

dfb.sxMl

405

−0.000

0.050

0.001

−0.193

0.197

−0.142

1.581

dfb.ag.y

405

−0.000

0.054

0.001

−0.276

0.291

−0.243

6.193

dfb.BMI.

405

0.000

0.048

−0.001

−0.216

0.284

0.519

6.029

dfb.sm.S

405

0.000

0.051

0.000

−0.197

0.278

0.867

6.692

dfb.HOMA

405

−0.000

0.055

−0.000

−0.259

0.552

2.120

29.521

dffit

405

−0.003

0.128

−0.005

−0.431

0.696

0.433

2.588

cov.r

405

1.015

0.026

1.021

0.876

1.155

−1.774

9.711

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.

DFBETA threshold

DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance) of the estimated regression coefficients if the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 indicate inflated variances and reduced precision (wider confidence intervals), whereas values above 1 indicate deflated variances and increased precision (narrower confidence intervals). A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.sxMl

dfb.ag.y

dfb.BMI.

dfb.sm.S

dfb.HOMA

dffit

cov.r

363

−0.709

0.128

0.012

−0.021

0.048

−0.021

0.090

0.006

−0.248

−0.271

1.155

318

3.154

0.009

0.015

0.121

0.169

−0.027

−0.124

−0.099

−0.039

0.305

0.884

217

3.312

0.016

0.028

0.150

0.071

−0.207

−0.068

−0.061

0.255

0.419

0.876

213

−1.491

0.077

0.031

0.025

0.026

0.108

−0.076

0.004

−0.259

−0.431

1.064

236

2.304

0.084

0.080

0.127

−0.081

−0.191

−0.129

0.278

0.552

0.696

1.023

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points

Two observations had studentized residuals above 3. Both points had low leverage and small Cook’s distance, with DFBETAS and DFFITS within conventional ranges. However, their COVRATIO values were substantially < 1, suggesting little effect on point estimates but potential inflation of standard errors (wider confidence intervals).

Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.044

0.4085

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.993

0.0405

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov supports residuals being normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • QQ-plot looks roughly normal.
Assessing collinearity with VIF

Term

VIF

Tolerance

sex

1.109

0.901

age.years

1.027

0.974

BMI.base.kg.m2

1.680

0.595

smoking.status

1.010

0.990

HOMA.IR.base

1.805

0.554

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

0.007

1.983

0.7860

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

The model was found to meet the assumptions of linearity, normality, and independence with no substantial outliers identified. While the model was not statistically heteroscedastic, it visually displayed signs of mild heteroscedasticity, and could be examined further.

Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for TNF.pg.ml.base

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

1.6402

sex:

Male – Female

0.035

0.3049

0.2699

Not Reproduced

age.years

0.001

0.0009

−0.0001

Reproduced

BMI.base.kg.m2

0.014

0.0138

−0.0002

Reproduced

smoking.status:

Smoker – Non-smoker

0.085

0.0853

0.0003

Reproduced

HOMA.IR.base

0.043

0.0429

−0.0001

Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in lower 95% confidence intervals for coefficients

term

O_lower

R_lower

Change.lci

Reproduce.lower

Intercept

0.9111

sex:

Male – Female

0.144

0.1444

0.0004

Reproduced

age.years

−0.008

−0.0080

0.0000

Reproduced

BMI.base.kg.m2

−0.006

−0.0063

−0.0003

Reproduced

smoking.status:

Smoker – Non-smoker

−0.102

−0.1017

0.0003

Reproduced

HOMA.IR.base

0.005

0.0050

0.0000

Reproduced

O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced.

Change in upper 95% confidence intervals for coefficients

term

O_upper

R_upper

Change.uci

Reproduce.upper

Intercept

2.3693

sex:

Male – Female

0.465

0.4653

0.0003

Reproduced

age.years

0.010

0.0098

−0.0002

Reproduced

BMI.base.kg.m2

0.034

0.0339

−0.0001

Reproduced

smoking.status:

Smoker – Non-smoker

0.272

0.2722

0.0002

Reproduced

HOMA.IR.base

0.081

0.0808

−0.0002

Reproduced

O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced.

Change in adjusted R2

O_R2Adj

R_R2Adj

Change.R2Adj

Reproduce.R2Adj

0.095

0.0838

−0.0112

Not Reproduced

O_R2Adj = original R2 Adjusted; R_R2Adj = reproduced R2 Adjusted; Change.R2Adj = change in R2Adj (R2Adj - O_R2Adj); Reproduce.R2Adj = R2 Adjusted reproduced.

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

sex:

Male – Female

<0.001

<0.001

0.0000

Reproduced

Remains sig, B same direction

age.years

0.837

0.8366

−0.0004

Reproduced

Remains non-sig, B same direction

BMI.base.kg.m2

0.176

0.1764

0.0004

Reproduced

Remains non-sig, B same direction

smoking.status:

Smoker – Non-smoker

0.370

0.3704

0.0004

Reproduced

Remains non-sig, B same direction

HOMA.IR.base

0.027

0.0266

−0.0004

Reproduced

Remains sig, B same direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman plot showing differences between original and reproduced p-values for TNF.pg.ml.base

Results for p-values

The mean difference for p-values in this model was close to zero as they were all reproduced.

Conclusion computational reproducibility

This model was mostly computationally reproduced with a typographic error in the sex coefficient. There was also a reporting error for adjusted R² which corresponded to the unadjusted R². P-values were reproduced and had the same interpretation, and regression coefficients did not change direction.

Methods

The model was successfully reproduced; however, it may exhibit slight heteroscedasticity. To further verify the findings, bootstrapped standardized regression coefficients and their 95% confidence intervals were examined. Percentage and absolute changes in estimates and confidence-interval bounds relative to the linear model were summarised using thresholds of 10% change and standardised coefficient differences of <0.10 and <0.20. Coefficient directions were assessed for concordance, and the stability of statistical significance was evaluated. Wild bootstrapping was used to account for potential heteroscedasticity.

Results

Bootstrapped results

Wild bootstrapping was performed with 10,000 iterations.

Change in regression coefficients

Term

B

boot.B

B_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.2198

−0.2220

0.0022

0.9900

No

No

No

sexMale

0.3751

0.3780

−0.0029

−0.7600

No

No

No

z_age.years

0.0100

0.0108

−0.0008

−8.0200

No

No

No

z_BMI.base.kg.m2

0.0837

0.0841

−0.0004

−0.4700

No

No

No

smoking.statusSmoker

0.1049

0.1051

−0.0002

−0.2300

No

No

No

z_HOMA.IR.base

0.1424

0.1407

0.0017

1.1800

No

No

No

B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in lower 95% confidence interval

Term

Lower

boot.Lower

Lower_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.3661

−0.3637

−0.0025

−0.6800

No

No

No

sexMale

0.1777

0.1815

−0.0038

−2.1400

No

No

No

z_age.years

−0.0849

−0.0894

0.0045

5.3200

No

No

No

z_BMI.base.kg.m2

−0.0378

−0.0322

−0.0055

−14.6800

Yes

No

No

smoking.statusSmoker

−0.1251

−0.1209

−0.0042

−3.3400

No

No

No

z_HOMA.IR.base

0.0166

0.0119

0.0047

28.4800

Yes

No

No

Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in upper 95% confidence interval

Term

Upper

boot.Upper

Upper_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.0735

−0.0790

0.0055

7.4600

No

No

No

sexMale

0.5725

0.5715

0.0011

0.1800

No

No

No

z_age.years

0.1048

0.1084

−0.0036

−3.4600

No

No

No

z_BMI.base.kg.m2

0.2052

0.1986

0.0065

3.1800

No

No

No

smoking.statusSmoker

0.3349

0.3313

0.0035

1.0600

No

No

No

z_HOMA.IR.base

0.2683

0.2699

−0.0017

−0.6200

No

No

No

Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in Range of 95% confidence interval

Term

Range

boot.Range

Range_Diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.2926

0.2847

−0.0080

−2.7200

No

No

No

sexMale

0.3948

0.3899

−0.0049

−1.2300

No

No

No

z_age.years

0.1897

0.1979

0.0081

4.2900

No

No

No

z_BMI.base.kg.m2

0.2430

0.2309

−0.0121

−4.9700

No

No

No

smoking.statusSmoker

0.4600

0.4523

−0.0077

−1.6800

No

No

No

z_HOMA.IR.base

0.2517

0.2580

0.0064

2.5300

No

No

No

Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, percentage changes were truncated at ±1000%, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in p-value significance and regression coefficient direction

Term

p-value

boot.p-value

changep

SigChangeDirection

Intercept

0.0033

0.0022

0.0011

Remains sig, B same direction

sexMale

<0.001

<0.001

0.0001

Remains sig, B same direction

z_age.years

0.8366

0.8307

0.0059

Remains non-sig, B same direction

z_BMI.base.kg.m2

0.1764

0.1523

0.0241

Remains non-sig, B same direction

smoking.statusSmoker

0.3704

0.3628

0.0076

Remains non-sig, B same direction

z_HOMA.IR.base

0.0266

0.0348

−0.0082

Remains sig, B same direction

p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model.

Check the distribution of bootstrap estimates

The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.

Conclusions based on bootstrapped model

This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.

Model 3

Model results for lg.opg

Term

B

SE

Lower

Upper

t

p-value

Intercept

sex:

Male – Female

−0.012

−0.035

0.001

0.286

age.years

0.006

0.005

0.007

<0.001

BMI.base.kg.m2

−0.002

−0.005

0.001

0.237

smoking.status:

Smoker – Non-smoker

0.032

0.005

0.058

0.018

HOMA.IR.base

0.007

0.002

0.013

0.006

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit Statistics lg.opg

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.216

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for lg.opg

Term

SS

DF

MS

F

p-value

sex

age.years

BMI.base.kg.m2

smoking.status

HOMA.IR.base

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for lg.opg

Term

B

SE

Lower

Upper

t

p-value

Intercept

2.201

0.052

2.099

2.304

42.204

<0.001

sex:

Male – Female

−0.012

0.011

−0.035

0.010

−1.069

0.2858

age.years

0.006

0.001

0.005

0.007

9.330

<0.001

BMI.base.kg.m2

−0.002

0.001

−0.005

0.001

−1.183

0.2375

smoking.status:

Smoker – Non-smoker

0.032

0.013

0.005

0.058

2.376

0.0180

HOMA.IR.base

0.007

0.003

0.002

0.013

2.740

0.0064

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for lg.opg

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.464

0.216

0.206

−634.130

0.109

21.923

5

399

<0.001

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for lg.opg

Term

SS

DF

MS

F

p-value

sex

0.014

1

0.014

1.142

0.2858

age.years

1.044

1

1.044

87.040

<0.001

BMI.base.kg.m2

0.017

1

0.017

1.400

0.2375

smoking.status

0.068

1

0.068

5.647

0.0180

HOMA.IR.base

0.090

1

0.090

7.506

0.0064

Residuals

4.786

399

0.012

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

sex

age.years

0.833

0.4055

No linearity violation

BMI.base.kg.m2

1.464

0.1439

No linearity violation

smoking.status

HOMA.IR.base

2.020

0.0441

Linearity violation

Tukey test

0.590

0.5549

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results

Although the quadratic term for baseline Homa was statistically significant, there was no clear evidence of curvature in the residual plots, with spares data at the higher end of the scale. No systematic patterns were observed in the fitted-versus-residual plots.

Testing for homoscedasticity

Statistic

p-value

Parameter

Method

6.634

0.2493

5

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • Some heteroscedasticity is present in plots, and a sensitivity analysis using weighted or robust regression or wild bootstrapping is recommended.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

lg.opg

405

2.488

0.123

2.485

2.134

2.925

0.123

0.468

sex*

405

1.523

0.500

2.000

1.000

2.000

−0.094

−1.996

age.years

405

51.840

8.667

50.000

40.000

79.000

0.734

−0.116

BMI.base.kg.m2

405

27.774

4.912

27.053

18.456

48.847

0.910

1.403

smoking.status*

405

1.212

0.409

1.000

1.000

2.000

1.402

−0.036

HOMA.IR.base

405

3.476

2.701

2.738

0.444

21.207

2.469

8.915

.fitted

405

2.488

0.057

2.477

2.389

2.668

0.727

0.034

.resid

405

−0.000

0.109

0.001

−0.312

0.409

0.213

0.632

.leverage

405

0.015

0.011

0.012

0.006

0.128

4.613

33.271

.sigma

405

0.110

0.000

0.110

0.108

0.110

−3.528

16.688

.cooksd

405

0.003

0.005

0.001

0.000

0.051

4.699

28.229

.std.resid

405

0.001

1.001

0.012

−2.876

3.754

0.218

0.634

dfb.1_

405

0.000

0.054

−0.000

−0.342

0.226

−0.573

7.539

dfb.sxMl

405

−0.000

0.049

−0.001

−0.174

0.178

−0.114

1.126

dfb.ag.y

405

−0.000

0.056

0.001

−0.202

0.398

1.112

10.890

dfb.BMI.

405

−0.000

0.047

−0.000

−0.220

0.237

0.472

5.393

dfb.sm.S

405

0.000

0.054

−0.001

−0.263

0.319

0.957

8.677

dfb.HOMA

405

0.000

0.050

−0.000

−0.215

0.440

2.149

22.198

dffit

405

0.003

0.124

0.001

−0.384

0.555

0.718

2.177

cov.r

405

1.015

0.027

1.021

0.826

1.160

−1.786

11.191

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.

DFBETA threshold

DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance) of the estimated regression coefficients if the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 indicate inflated variances and reduced precision (wider confidence intervals), whereas values above 1 indicate deflated variances and increased precision (narrower confidence intervals). A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.sxMl

dfb.ag.y

dfb.BMI.

dfb.sm.S

dfb.HOMA

dffit

cov.r

363

0.473

0.128

0.005

0.014

−0.032

0.014

−0.060

−0.004

0.165

0.181

1.160

19

3.817

0.009

0.022

0.163

−0.146

−0.176

0.013

−0.094

−0.107

0.369

0.826

13

3.215

0.019

0.032

0.196

−0.166

−0.202

−0.070

0.319

0.023

0.444

0.887

355

3.060

0.023

0.036

−0.342

0.175

0.398

0.155

−0.103

−0.153

0.473

0.904

236

1.838

0.084

0.051

0.101

−0.064

−0.152

−0.103

0.222

0.440

0.555

1.053

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points

Three observations had studentized residuals > 3. these observations had low leverage and small Cook’s distance, with DFBETAS and DFFITS within conventional ranges. However, their COVRATIO values were substantially < 1, suggesting little effect on point estimates but potential inflation of standard errors (wider confidence intervals).

Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.027

0.9333

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.992

0.0279

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov supports residuals being normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • QQ-plot looks roughly normal.
Assessing collinearity with VIF

Term

VIF

Tolerance

sex

1.109

0.901

age.years

1.027

0.974

BMI.base.kg.m2

1.680

0.595

smoking.status

1.010

0.990

HOMA.IR.base

1.805

0.554

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

−0.022

2.037

0.7780

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

The model met the assumptions of normality and independence. However, residual plots and diagnostic tests indicated potential issues with linearity, suggesting that a non-linear relationship may not have been adequately captured, and residuals may be mildly heteroscedastic.

Forest plot showing original and reproduced coefficients and 95% confidence intervals for lg.opg

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

2.2012

sex:

Male – Female

−0.012

−0.0123

−0.0003

Reproduced

age.years

0.006

0.0059

−0.0001

Reproduced

BMI.base.kg.m2

−0.002

−0.0017

0.0003

Reproduced

smoking.status:

Smoker – Non-smoker

0.032

0.0318

−0.0002

Reproduced

HOMA.IR.base

0.007

0.0074

0.0004

Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in lower 95% confidence intervals for coefficients

term

O_lower

R_lower

Change.lci

Reproduce.lower

Intercept

2.0987

sex:

Male – Female

−0.035

−0.0348

0.0002

Reproduced

age.years

0.005

0.0047

−0.0003

Reproduced

BMI.base.kg.m2

−0.005

−0.0045

0.0005

Reproduced

smoking.status:

Smoker – Non-smoker

0.005

0.0055

0.0005

Reproduced

HOMA.IR.base

0.002

0.0021

0.0001

Reproduced

O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced.

Change in upper 95% confidence intervals for coefficients

term

O_upper

R_upper

Change.uci

Reproduce.upper

Intercept

2.3038

sex:

Male – Female

0.001

0.0103

0.0093

Not Reproduced

age.years

0.007

0.0072

0.0002

Reproduced

BMI.base.kg.m2

0.001

0.0011

0.0001

Reproduced

smoking.status:

Smoker – Non-smoker

0.058

0.0581

0.0001

Reproduced

HOMA.IR.base

0.013

0.0128

−0.0002

Reproduced

O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced.

Change in adjusted R2

O_R2Adj

R_R2Adj

Change.R2Adj

Reproduce.R2Adj

0.216

0.2057

−0.0103

Not Reproduced

O_R2Adj = original R2 Adjusted; R_R2Adj = reproduced R2 Adjusted; Change.R2Adj = change in R2Adj (R2Adj - O_R2Adj); Reproduce.R2Adj = R2 Adjusted reproduced.

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

sex:

Male – Female

0.286

0.2858

−0.0002

Reproduced

Remains non-sig, B same direction

age.years

<0.001

<0.001

0.0000

Reproduced

Remains sig, B same direction

BMI.base.kg.m2

0.237

0.2375

0.0005

Reproduced

Remains non-sig, B same direction

smoking.status:

Smoker – Non-smoker

0.018

0.0180

0.0000

Reproduced

Remains sig, B same direction

HOMA.IR.base

0.006

0.0064

0.0004

Reproduced

Remains sig, B same direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman Plot showing differences between original and reproduced p-values for lg.opg

Results for p-values

The mean difference for p-values in this model was close to zero as they were all reproduced.

Conclusion computational reproducibility

This model was mostly computationally reproducible, with a typographic error in the sex upper 95% confidence interval. P-values were reproduced and had the same interpretation, and regression coefficients did not change direction. There was also a reporting error for adjusted R² which corresponded to the unadjusted R².

Methods

The model was successfully reproduced; however, there were indications that the residuals may not be normally distributed and may exhibit slight heteroscedasticity. Diagnostic checks suggested a minor deviation from linearity for one variable. A quadratic (squared) term was fitted, and AIC and BIC were used to compare model fit with the linear specification. If the linear model was retained, to further verify the findings, bootstrapped standardized regression coefficients and their 95% confidence intervals were examined. Percentage and absolute changes in estimates and confidence-interval bounds relative to the linear model were summarised using thresholds of 10% change and standardised coefficient differences of <0.10 and <0.20. Coefficient directions were assessed for concordance, and the stability of statistical significance was evaluated. Wild bootstrapping was used to account for potential heteroscedasticity.

Results

Comparison of linear and squared model
Characteristic
Linear
Squared HOMA
Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) -0.0034 -0.1397, 0.1328 0.960 -0.0551 -0.1998, 0.0896 0.455
sex





    Female

    Male -0.0999 -0.2837, 0.0839 0.286 -0.0770 -0.2614, 0.1075 0.412
z_age.years 0.4191 0.3308, 0.5075 <0.001 0.4262 0.3379, 0.5144 <0.001
z_BMI.base.kg.m2 -0.0681 -0.1812, 0.0450 0.237 -0.0421 -0.1576, 0.0733 0.474
smoking.status





    Non-smoker

    Smoker 0.2589 0.0447, 0.4730 0.018 0.2496 0.0361, 0.4631 0.022
z_HOMA.IR.base 0.1633 0.0461, 0.2804 0.006 0.0397 -0.1279, 0.2073 0.642
z_HOMA.IR.base2


0.0419 0.0011, 0.0827 0.044
1 CI = Confidence Interval
Comparison of linear and squared model fit

Model_type

R2adj

sigma

df

logLik

AIC

BIC

deviance

df.residual

nobs

Linear

0.206

0.89221

5

−525.45

1,064.9

1,092.9

317.62

399

405

Squared HOMA

0.212

0.88878

6

−523.39

1,062.8

1,094.8

314.39

398

405

Linearity conclusion

After re-running the models and re-evaluating assumptions, statistically significant deviations from linearity were identified. However, these did not improve model fit, as indicated by similar AIC and Adjusted R2 values. Visual inspection suggested that the apparent non-linearity arose from sparse data at the higher end of the predictor ranges rather than a meaningful improvement in model performance.

Bootstrapped results

As the inclusion of non-linear terms did not improve model fit, the original model was retained. Further model assessment was conducted using Wild bootstrapping with 10,000 iterations.

Change in regression coefficients

Term

B

boot.B

B_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.0034

−0.0036

0.0001

3.9700

No

No

No

sexMale

−0.0999

−0.1004

0.0005

0.4800

No

No

No

z_age.years

0.4191

0.4188

0.0003

0.0700

No

No

No

z_BMI.base.kg.m2

−0.0681

−0.0683

0.0002

0.3300

No

No

No

smoking.statusSmoker

0.2589

0.2603

−0.0014

−0.5600

No

No

No

z_HOMA.IR.base

0.1633

0.1634

−0.0001

−0.0900

No

No

No

B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in lower 95% confidence interval

Term

Lower

boot.Lower

Lower_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.1397

−0.1300

−0.0097

−6.9500

No

No

No

sexMale

−0.2837

−0.2742

−0.0096

−3.3700

No

No

No

z_age.years

0.3308

0.3239

0.0069

2.0800

No

No

No

z_BMI.base.kg.m2

−0.1812

−0.1729

−0.0082

−4.5500

No

No

No

smoking.statusSmoker

0.0447

0.0353

0.0094

21.0300

Yes

No

No

z_HOMA.IR.base

0.0461

0.0520

−0.0059

−12.7200

Yes

No

No

Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in upper 95% confidence interval

Term

Upper

boot.Upper

Upper_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.1328

0.1184

0.0144

10.8300

Yes

No

No

sexMale

0.0839

0.0731

0.0107

12.8100

Yes

No

No

z_age.years

0.5075

0.5166

−0.0091

−1.8000

No

No

No

z_BMI.base.kg.m2

0.0450

0.0372

0.0078

17.3200

Yes

No

No

smoking.statusSmoker

0.4730

0.4841

−0.0111

−2.3500

No

No

No

z_HOMA.IR.base

0.2804

0.2752

0.0053

1.8800

No

No

No

Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in Range of 95% confidence interval

Term

Range

boot.Range

Range_Diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.2725

0.2484

−0.0241

−8.8400

No

No

No

sexMale

0.3676

0.3473

−0.0203

−5.5300

No

No

No

z_age.years

0.1766

0.1927

0.0160

9.0700

No

No

No

z_BMI.base.kg.m2

0.2262

0.2102

−0.0161

−7.1000

No

No

No

smoking.statusSmoker

0.4283

0.4488

0.0205

4.7900

No

No

No

z_HOMA.IR.base

0.2343

0.2232

−0.0111

−4.7500

No

No

No

Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, percentage changes were truncated at ±1000%, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in p-value significance and regression coefficient direction

Term

p-value

boot.p-value

changep

SigChangeDirection

Intercept

0.9604

0.9553

0.0051

Remains non-sig, B same direction

sexMale

0.2858

0.2581

0.0278

Remains non-sig, B same direction

z_age.years

<0.001

<0.001

0.0000

Remains sig, B same direction

z_BMI.base.kg.m2

0.2375

0.2047

0.0328

Remains non-sig, B same direction

smoking.statusSmoker

0.0180

0.0238

−0.0058

Remains sig, B same direction

z_HOMA.IR.base

0.0064

0.0040

0.0024

Remains sig, B same direction

p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model.

Check the distribution of bootstrapped estimates

The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.

Conclusions based on the bootstrapped model

This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.