Paper 8: Linking childhood emotional abuse and depressive symptoms: The role of emotion dysregulation and interpersonal problems

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

March 28, 2026

References

Christ C, de Waal MM, Dekker JJM, van Kuijk I, van Schaik DJF, Kikkert MJ, et al. (2019) Linking childhood emotional abuse and depressive symptoms: The role of emotion dysregulation and interpersonal problems. PLoS ONE 14(2): e0211882. https://doi.org/10.1371/journal.pone.0211882

de Waal, Marleen (2019). Linking childhood emotional abuse and depressive symptoms. figshare. Dataset. https://doi.org/10.6084/m9.figshare.7636511.v3

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability, and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Results from the reproduction of the Christ et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.

Summary from statistical review

This study examines if childhood emotional abuse (CEA), physical abuse (CPA), and sexual abuse (CSA) were independently associated with depressive symptoms, emotion dysregulation, and interpersonal problems. The paper was clear and easy to read. The authors checked the assumptions of the linear regression but did not provide details. The authors discussed independent variables in terms of significant associations rather than interpreting the effect sizes.

Data availability and software used

The authors provided data in a well-structured Excel (wide) file, but no accompanying data dictionary. The dataset was hosted on Figshare, consistent with the data availability statement reported in the paper. SPSS was used for analyses of linear regression models.

Regression sample

This paper reports at least 16 linear regressions (excluding the mediation models), including both univariate and multivariable analyses. Multivariable modelling was performed for three main outcomes: depressive symptoms, emotional dysregulation, and interpersonal problems. There were three outcomes clearly identified as the primary focus, hence we did not need to select three models for our reproducibility exercise.

Computational reproducibility results

The models assessed in this paper were mostly computationally reproducible, with minor errors identified. Some estimates had incorrect rounding, and in the emotional dysregulation model, the variable childhood sexual abuse (CSA) had a lower 95% confidence interval that was missing a negative sign. The authors reported R2 rather than adjusted R2.

Inferential reproducibility results

The three models assessed in this paper were inferentially reproducible. The main modelling issue was the skewed distribution of the independent variables, with most participants reporting minimum scores and sparse data at the upper ends of the scales. Mild assumption violations were observed in the residuals. Upon reproducing the models, residual plots for all three models indicated mild heteroscedasticity, and model 1 also showed mild non-normality. The third model exhibited potential issues with linearity and outliers. After refitting this model to include quadratic and cubic terms, the quadratic terms were statistically significant. However, these differences did not translate into a substantially improved model fit according to AIC, BIC, or adjusted R², and were likely attributable to sparse data at the upper end of the predictor scales. All three models were bootstrapped and produced results consistent with the reproduced models, with differences in standardized regression coefficients less than 0.1. The direction and statistical significance of the regression coefficients remained consistent across both the reproduced and bootstrapped models.

Model 1

Model results for depressive symptoms

Term

B

SE

Lower

Upper

t

p-value

Intercept

CEA

0.42

0.26

0.57

<0.001

CPA

−0.03

−0.32

0.27

0.866

CSA

−0.11

−0.29

0.08

0.254

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for depressive symptoms

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.11

11.34

3

272

<0.001

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for depressive symptoms

Term

SS

DF

MS

F

p-value

CEA

CPA

CSA

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for depressive symptoms

Term

B

SE

Lower

Upper

t

p-value

Intercept

2.894

0.823

1.273

4.515

3.515

<0.001

CEA

0.415

0.079

0.259

0.572

5.229

<0.001

CPA

−0.025

0.147

−0.315

0.265

−0.169

0.8657

CSA

−0.107

0.094

−0.292

0.077

−1.142

0.2544

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for depressive symptoms

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.333

0.111

0.101

1,521.963

3.744

11.342

3

272

<0.001

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for depressive symptoms

Term

SS

DF

MS

F

p-value

CEA

388.917

1

388.917

27.345

<0.001

CPA

0.408

1

0.408

0.029

0.8657

CSA

18.550

1

18.550

1.304

0.2544

Residuals

3,868.609

272

14.223

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

CEA

−1.187

0.2364

No linearity violation

CPA

−0.454

0.6503

No linearity violation

CSA

0.137

0.8911

No linearity violation

Tukey test

−1.179

0.2382

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results
  • No linearity violation was observed in either plots or tests.
Testing for homoscedasticity

Statistic

p-value

Parameter

Method

7.805

0.0502

3

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • Some heteroscedasticity is present in plots, and a sensitivity analysis using weighted or robust regression or wild bootstrapping is recommended.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

depressive symptoms

276

5.373

3.978

4.000

0.000

22.000

1.369

2.143

CEA

276

7.783

3.430

7.000

5.000

22.000

1.731

3.052

CPA

276

5.649

1.786

5.000

5.000

18.000

4.005

18.635

CSA

276

5.716

2.624

5.000

5.000

21.250

4.384

19.670

.fitted

276

5.373

1.327

5.116

2.571

11.196

1.563

2.602

.resid

276

−0.000

3.751

−0.726

−7.239

16.993

1.235

2.401

.leverage

276

0.014

0.027

0.006

0.004

0.189

4.240

19.368

.sigma

276

3.771

0.015

3.776

3.634

3.778

−5.580

39.543

.cooksd

276

0.005

0.013

0.001

0.000

0.120

5.271

33.318

.std.resid

276

−0.000

1.003

−0.193

−2.015

4.516

1.216

2.343

dfb.1_

276

0.000

0.067

−0.004

−0.603

0.415

−1.709

30.357

dfb.CEA

276

−0.000

0.072

0.000

−0.441

0.395

−0.403

13.681

dfb.CPA

276

0.000

0.075

−0.001

−0.555

0.372

−0.563

17.354

dfb.CSA

276

0.000

0.064

0.000

−0.505

0.491

0.773

32.357

dffit

276

−0.003

0.136

−0.016

−0.647

0.694

0.302

6.631

cov.r

276

1.016

0.043

1.016

0.746

1.249

0.198

16.367

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.

DFBETA threshold

DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.CEA

dfb.CPA

dfb.CSA

dffit

cov.r

226

0.125

0.170

0.001

−0.012

−0.021

−0.006

0.055

0.057

1.223

120

−0.337

0.189

0.007

0.144

0.007

−0.129

−0.040

−0.163

1.249

132

4.687

0.005

0.023

0.139

−0.043

−0.092

0.097

0.317

0.746

13

4.112

0.006

0.024

0.143

0.172

−0.143

−0.092

0.319

0.801

198

−2.026

0.092

0.103

0.415

−0.028

−0.555

0.203

−0.647

1.053

265

1.549

0.167

0.120

−0.603

−0.005

0.362

0.413

0.694

1.177

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points
  • Two observations had studentized residuals > 3. Both had low leverage and small Cook’s distance, with DFBETAS and DFFITS within conventional ranges. The COVRATIO indicated observations that may affect confidence intervals widths.
Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.106

0.0043

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.927

<0.001

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov test indicates residuals may not be normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • QQ-plot indicates the residuals are not normally distributed.
Assessing collinearity with VIF

Term

VIF

Tolerance

CEA

1.435

0.697

CPA

1.337

0.748

CSA

1.170

0.855

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

0.028

1.939

0.6280

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

No meaningful departures from the assumptions of linearity and independence were observed. While the Breusch-Pagan test was not statistically significant, the residuals vs. fitted plot showed a funnel shape, suggesting heteroscedasticity and a sensitivity analysis using weighted or robust regression is recommended. Normality tests and the Q-Q plot indicated the residuals may not be normally distributed. Outlier diagnostics indicated that point estimates were unlikely to be substantially affected by influential points, but confidence-interval width could be affected and should be further investigated.

Forest plot showing original and reproduced coefficients and 95% confidence intervals for depressive symptoms

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

2.8939

CEA

0.42

0.4153

−0.0047

Reproduced

CPA

−0.03

−0.0249

0.0051

Incorrect Rounding

CSA

−0.11

−0.1071

0.0029

Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in lower 95% confidence intervals for coefficients

term

O_lower

R_lower

Change.lci

Reproduce.lower

Intercept

1.2730

CEA

0.26

0.2589

−0.0011

Reproduced

CPA

−0.32

−0.3149

0.0051

Incorrect Rounding

CSA

−0.29

−0.2916

−0.0016

Reproduced

O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced.

Change in upper 95% confidence intervals for coefficients

term

O_upper

R_upper

Change.uci

Reproduce.upper

Intercept

4.5147

CEA

0.57

0.5716

0.0016

Reproduced

CPA

0.27

0.2650

−0.0050

Incorrect Rounding

CSA

0.08

0.0775

−0.0025

Reproduced

O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced.

Change in R2

O_R2

R_R2

Change.R2

Reproduce.R2

0.110

0.1112

0.0012

Reproduced

O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in global F

Term

O_F

R_F

Change.F

Reproduce.F

Intercept

11.34

11.3422

0.0022

Reproduced

O_F = original global F; R_F = reproduced global F; Change.F = change in R_F - O_F; Reproduce.F = Global F reproduced.

Change in degrees of freedom

O_DF1

R_DF1

Change.DF1

Reproduce.DF1

O_DF2

R_DF2

Change.DF2

Reproduce.DF2

3

3

0

Reproduced

272

272

0

Reproduced

O_DF1 = original degrees of freedom for the model; R_DF1 = reproduced degrees of freedom for the model; Change.DF1 = change in R_DF1 - O_DF1; Reproduce.DF1 = reproduced degrees of freedom for the model (R_DF1 = O_DF1); O_DF2 = original degrees of freedom for the residuals; R_DF2 = reproduced degrees of freedom for the residuals; Change.DF2 = change in R_DF2 - O_DF2; Reproduce.DF2 = reproduced degrees of freedom for the residuals (R_DF2 = O_DF2).

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

CEA

<0.001

<0.001

0.0000

Reproduced

Remains sig, B same direction

CPA

0.866

0.8657

−0.0003

Reproduced

Remains non-sig, B same direction

CSA

0.254

0.2544

0.0004

Reproduced

Remains non-sig, B same direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman plot showing differences between original and reproduced p-values for depressive symptoms

Results for p-values
  • The p-values in this model were reproduced with mean differences for p-values close to zero.

Conclusion computational reproducibility

This model was mostly computationally reproducible, with minor rounding errors. P-values were reproduced and had the same interpretation, and regression coefficients did not change direction.

Methods

The model was successfully reproduced; however, there were indications that the residuals may not be normally distributed and may exhibit slight heteroscedasticity. To further verify the findings, bootstrapped standardized regression coefficients and their 95% confidence intervals were examined. Percentage and absolute changes in estimates and confidence-interval bounds relative to the linear model were summarised using thresholds of 10% change and standardized coefficient differences of <0.10 and <0.20. Coefficient direction and statistical significance were assessed for consistency. Wild bootstrapping was used to account for potential heteroscedasticity.

Results

Bootstrapped results

Wild bootstrapping was performed with 10,000 iterations.

Change in regression coefficients

Term

B

boot.B

B_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.0000

0.0000

−0.0000

−1,000.0000

Yes

No

No

z_CEA

0.3580

0.3585

−0.0005

−0.1500

No

No

No

z_CPA

−0.0112

−0.0101

−0.0011

−9.3800

No

No

No

z_CSA

−0.0706

−0.0710

0.0004

0.6300

No

No

No

B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in lower 95% confidence interval

Term

Lower

boot.Lower

Lower_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.1123

−0.1110

−0.0013

−1.1700

No

No

No

z_CEA

0.2232

0.2024

0.0208

9.3100

No

No

No

z_CPA

−0.1413

−0.1614

0.0200

14.1700

Yes

No

No

z_CSA

−0.1923

−0.1867

−0.0056

−2.9200

No

No

No

Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in upper 95% confidence interval

Term

Upper

boot.Upper

Upper_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.1123

0.1120

0.0003

0.2600

No

No

No

z_CEA

0.4928

0.5149

−0.0221

−4.4800

No

No

No

z_CPA

0.1189

0.1401

−0.0212

−17.8100

Yes

No

No

z_CSA

0.0511

0.0435

0.0076

14.8000

Yes

No

No

Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in Range of 95% confidence interval

Term

Range

boot.Range

Range_Diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.2247

0.2231

−0.0016

−0.7200

No

No

No

z_CEA

0.2696

0.3125

0.0429

15.9000

Yes

No

No

z_CPA

0.2603

0.3015

0.0412

15.8400

Yes

No

No

z_CSA

0.2434

0.2302

−0.0132

−5.4200

No

No

No

Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in p-value significance and regression coefficient direction

Term

p-value

boot.p-value

changep

SigChangeDirection

Intercept

1.0000

1.0000

0.0000

Remains non-sig, B changes direction

z_CEA

<0.001

<0.001

−0.0000

Remains sig, B same direction

z_CPA

0.8657

0.8946

−0.0289

Remains non-sig, B same direction

z_CSA

0.2544

0.2308

0.0236

Remains non-sig, B same direction

p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model.

Check the distribution of bootstrap estimates

The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.

Conclusions based on the bootstrapped model

This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful, with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.

Model 2

Model results for emotion dysregulation

Term

B

SE

Lower

Upper

t

p-value

Intercept

CEA

1.33

0.52

2.14

0.001

CPA

−0.32

−1.82

1.19

0.681

CSA

−0.37

1.33

0.59

0.452

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for emotion dysregulation

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.04

3.96

3

272

0.009

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for emotion dysregulation

Term

SS

DF

MS

F

p-value

CEA

CPA

CSA

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results emotion dysregulation

Term

B

SE

Lower

Upper

t

p-value

Intercept

78.411

4.272

70.001

86.821

18.355

<0.001

CEA

1.328

0.412

0.516

2.139

3.222

0.0014

CPA

−0.315

0.764

−1.819

1.190

−0.412

0.6809

CSA

−0.366

0.486

−1.324

0.591

−0.753

0.4522

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Model fit for emotion dysregulation

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.204

0.042

0.031

2,430.831

19.426

3.955

3

272

0.0087

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for emotion dysregulation

Term

SS

DF

MS

F

p-value

CEA

3,974.533

1

3,974.533

10.379

0.0014

CPA

64.886

1

64.886

0.169

0.6809

CSA

217.067

1

217.067

0.567

0.4522

Residuals

104,156.682

272

382.929

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

CEA

−1.595

0.1119

No linearity violation

CPA

−1.671

0.0959

No linearity violation

CSA

−0.846

0.3984

No linearity violation

Tukey test

−1.752

0.0798

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results

No linearity violation was observed in either plots or tests.

Testing for homoscedasticity

Statistic

p-value

Parameter

Method

3.330

0.3434

3

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • Some heteroscedasticity is present in plots, and a sensitivity analysis using weighted or robust regression or wild bootstrapping is recommended.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

emotion dysregulation

276

84.873

19.882

84.000

44.000

139.000

0.312

−0.581

CEA

276

7.783

3.430

7.000

5.000

22.000

1.731

3.052

CPA

276

5.649

1.786

5.000

5.000

18.000

4.005

18.635

CSA

276

5.716

2.624

5.000

5.000

21.250

4.384

19.670

.fitted

276

84.873

4.065

83.986

75.694

102.012

1.535

2.658

.resid

276

0.000

19.462

−0.307

−41.049

55.158

0.304

−0.478

.leverage

276

0.014

0.027

0.006

0.004

0.189

4.240

19.368

.sigma

276

19.568

0.045

19.584

19.315

19.605

−2.385

7.320

.cooksd

276

0.005

0.019

0.001

0.000

0.183

7.361

58.733

.std.resid

276

−0.001

1.004

−0.016

−2.114

2.825

0.293

−0.475

dfb.1_

276

0.000

0.083

−0.003

−0.747

0.727

0.166

45.264

dfb.CEA

276

−0.000

0.063

0.000

−0.490

0.181

−2.185

14.292

dfb.CPA

276

−0.000

0.077

−0.001

−0.649

0.449

−1.937

27.561

dfb.CSA

276

−0.000

0.077

0.000

−0.808

0.512

−2.469

54.895

dffit

276

−0.005

0.144

−0.002

−0.823

0.860

−0.714

12.074

cov.r

276

1.016

0.035

1.015

0.905

1.222

2.554

13.269

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if the ith observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 to 1 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept), and n is the number of observations. In practice, this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.

DFBETA threshold

DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each model parameter. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.CEA

dfb.CPA

dfb.CSA

dffit

cov.r

226

−0.207

0.170

0.002

0.020

0.035

0.009

−0.092

−0.094

1.222

132

2.862

0.005

0.009

0.085

−0.026

−0.056

0.059

0.193

0.905

183

2.662

0.006

0.011

0.121

−0.113

0.006

0.004

0.207

0.921

120

−1.697

0.189

0.167

0.727

0.038

−0.649

−0.201

−0.820

1.200

63

−1.952

0.151

0.168

0.210

0.151

0.166

−0.808

−0.823

1.131

265

1.918

0.167

0.183

−0.747

−0.007

0.449

0.512

0.860

1.155

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points
  • Although three observations exhibited relatively higher Cook’s distances, all remained below 0.5. DFBETAS and DFFITS were within acceptable limits, suggesting minimal influence on model estimates. Several observations had COVRATIO values > 1, implying that their removal might reduce standard errors and narrow confidence intervals.
Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.055

0.3703

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.985

0.0061

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov supports residuals being normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • QQ-plot looks roughly normal.
Assessing collinearity with VIF

Term

VIF

Tolerance

CEA

1.435

0.697

CPA

1.337

0.748

CSA

1.170

0.855

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

−0.047

2.085

0.4680

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

No meaningful departures from the assumptions of linearity, normality, or independence were observed. No substantial outliers were identified, although a few points may influence confidence intervals. While the model was not statistically heteroscedastic, visual inspection suggested mild heteroscedasticity that may warrant further examination.

Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for emotion dysregulation

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

78.4109

CEA

1.33

1.3276

−0.0024

Reproduced

CPA

−0.32

−0.3145

0.0055

Incorrect Rounding

CSA

−0.37

−0.3662

0.0038

Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in lower 95% confidence intervals for coefficients

term

O_lower

R_lower

Change.lci

Reproduce.lower

Intercept

70.0006

CEA

0.52

0.5163

−0.0037

Reproduced

CPA

−1.82

−1.8189

0.0011

Reproduced

CSA

1.33

−1.3238

−2.6538

Not Reproduced

O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced.

Change in upper 95% confidence intervals for coefficients

term

O_upper

R_upper

Change.uci

Reproduce.upper

Intercept

86.8212

CEA

2.14

2.1389

−0.0011

Reproduced

CPA

1.19

1.1898

−0.0002

Reproduced

CSA

0.59

0.5914

0.0014

Reproduced

O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced.

Change in R2

O_R2

R_R2

Change.R2

Reproduce.R2

0.040

0.0418

0.0018

Reproduced

O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in global F

Term

O_F

R_F

Change.F

Reproduce.F

Intercept

3.96

3.9554

−0.0046

Reproduced

O_F = original global F; R_F = reproduced global F; Change.F = change in R_F - O_F; Reproduce.F = Global F reproduced.

Change in degrees of freedom

O_DF1

R_DF1

Change.DF1

Reproduce.DF1

O_DF2

R_DF2

Change.DF2

Reproduce.DF2

3

3

0

Reproduced

272

272

0

Reproduced

O_DF1 = original degrees of freedom for the model; R_DF1 = reproduced degrees of freedom for the model; Change.DF1 = change in R_DF1 - O_DF1; Reproduce.DF1 = reproduced degrees of freedom for the model (R_DF1 = O_DF1); O_DF2 = original degrees of freedom for the residuals; R_DF2 = reproduced degrees of freedom for the residuals; Change.DF2 = change in R_DF2 - O_DF2; Reproduce.DF2 = reproduced degrees of freedom for the residuals (R_DF2 = O_DF2).

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

CEA

0.001

0.0014

0.0004

Reproduced

Remains sig, B same direction

CPA

0.681

0.6809

−0.0001

Reproduced

Remains non-sig, B same direction

CSA

0.452

0.4522

0.0002

Reproduced

Remains non-sig, B same direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman plot showing differences between original and reproduced p-values for emotion dysregulation

Results for p-values
  • The p-values in this model were reproduced with mean differences for p-values close to zero.

Conclusion computational reproducibility

This model was mostly computationally reproducible, with minor rounding errors. P-values were reproduced and had the same interpretation, and regression coefficients did not change direction. With one error identified for the lower confidence interval of CSA, which was missing a negative sign.

Methods

The model was successfully reproduced; however, it may exhibit slight heteroscedasticity. To further verify the findings, bootstrapped standardized regression coefficients and their 95% confidence intervals were examined. Percentage and absolute changes in estimates and confidence-interval bounds relative to the linear model were summarised using thresholds of 10% change and standardized coefficient differences of <0.10 and <0.20. Coefficient direction and statistical significance were assessed for consistency. Wild bootstrapping was used to account for potential heteroscedasticity.

Results

Bootstrapped results

Wild bootstrapping was performed with 10,000 iterations.

Change in regression coefficients

Term

B

boot.B

B_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.0000

0.0000

−0.0000

−1,000.0000

Yes

No

No

z_CEA

0.2290

0.2289

0.0001

0.0500

No

No

No

z_CPA

−0.0283

−0.0278

−0.0004

−1.4800

No

No

No

z_CSA

−0.0483

−0.0488

0.0005

0.9500

No

No

No

B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in lower 95% confidence interval

Term

Lower

boot.Lower

Lower_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.1166

−0.1161

−0.0005

−0.4600

No

No

No

z_CEA

0.0891

0.0901

−0.0011

−1.2000

No

No

No

z_CPA

−0.1634

−0.1780

0.0147

8.9700

No

No

No

z_CSA

−0.1747

−0.1895

0.0148

8.4900

No

No

No

Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in upper 95% confidence interval

Term

Upper

boot.Upper

Upper_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.1166

0.1171

−0.0004

−0.3800

No

No

No

z_CEA

0.3690

0.3705

−0.0016

−0.4300

No

No

No

z_CPA

0.1069

0.1256

−0.0187

−17.4900

Yes

No

No

z_CSA

0.0780

0.0921

−0.0141

−18.0500

Yes

No

No

Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in Range of 95% confidence interval

Term

Range

boot.Range

Range_Diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.2333

0.2332

−0.0001

−0.0400

No

No

No

z_CEA

0.2799

0.2804

0.0005

0.1800

No

No

No

z_CPA

0.2702

0.3036

0.0333

12.3400

Yes

No

No

z_CSA

0.2527

0.2817

0.0289

11.4400

Yes

No

No

Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, percentage changes were truncated at ±1000%, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in p-value significance and regression coefficient direction

Term

p-value

boot.p-value

changep

SigChangeDirection

Intercept

1.0000

0.9999

0.0001

Remains non-sig, B same direction

z_CEA

0.0014

0.0014

0.0000

Remains sig, B same direction

z_CPA

0.6809

0.7226

−0.0417

Remains non-sig, B same direction

z_CSA

0.4522

0.4975

−0.0453

Remains non-sig, B same direction

p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model.

Check distribution of bootstrap estimates

The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.

Conclusions based on the bootstrapped model

This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful, with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.

Model 3

Model results for interpersonal problems

Term

B

SE

Lower

Upper

t

p-value

Intercept

CEA

1.09

0.54

1.65

<0.001

CPA

0.23

−0.80

1.26

0.662

CSA

−0.08

−0.74

0.58

0.809

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit Statistics interpersonal problems

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.08

7.73

3

272

<0.001

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for interpersonal problems

Term

SS

DF

MS

F

p-value

CEA

CPA

CSA

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for interpersonal problems

Term

B

SE

Lower

Upper

t

p-value

Intercept

25.237

2.922

19.484

30.989

8.637

<0.001

CEA

1.091

0.282

0.536

1.646

3.871

<0.001

CPA

0.229

0.523

−0.800

1.258

0.438

0.6619

CSA

−0.080

0.333

−0.735

0.575

−0.241

0.8095

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for interpersonal problems

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.280

0.079

0.068

2,221.147

13.287

7.726

3

272

<0.001

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for interpersonal problems

Term

SS

DF

MS

F

p-value

CEA

2,684.506

1

2,684.506

14.986

<0.001

CPA

34.332

1

34.332

0.192

0.6619

CSA

10.434

1

10.434

0.058

0.8095

Residuals

48,723.942

272

179.132

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

CEA

−2.012

0.0452

Linearity violation

CPA

−2.131

0.0340

Linearity violation

CSA

−0.631

0.5285

No linearity violation

Tukey test

−2.349

0.0188

Linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results

Plots and tests show an unaccounted non-linear relationship may be present.

Testing for homoscedasticity

Statistic

p-value

Parameter

Method

2.235

0.5252

3

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • Some heteroscedasticity is present in plots, and a sensitivity analysis using weighted or robust regression or wild bootstrapping is recommended.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

interpersonal problems

276

34.562

13.866

34.000

3.000

78.000

0.428

0.083

CEA

276

7.783

3.430

7.000

5.000

22.000

1.731

3.052

CPA

276

5.649

1.786

5.000

5.000

18.000

4.005

18.635

CSA

276

5.716

2.624

5.000

5.000

21.250

4.384

19.670

.fitted

276

34.562

3.886

33.617

30.130

51.585

1.775

3.287

.resid

276

0.000

13.311

−0.571

−33.710

40.652

0.378

0.257

.leverage

276

0.014

0.027

0.006

0.004

0.189

4.240

19.368

.sigma

276

13.384

0.038

13.397

13.178

13.409

−3.007

10.919

.cooksd

276

0.005

0.022

0.001

0.000

0.290

9.594

106.708

.std.resid

276

−0.001

1.004

−0.043

−2.609

3.049

0.351

0.282

dfb.1_

276

0.000

0.066

−0.005

−0.227

0.712

4.980

50.152

dfb.CEA

276

−0.000

0.077

0.000

−0.689

0.339

−3.382

30.765

dfb.CPA

276

−0.000

0.071

0.000

−0.635

0.348

−2.267

29.205

dfb.CSA

276

−0.000

0.080

0.000

−1.069

0.275

−8.002

112.240

dffit

276

−0.005

0.143

−0.003

−1.089

0.356

−2.647

17.430

cov.r

276

1.016

0.037

1.015

0.889

1.222

2.154

12.449

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if the ith observation is removed. Potentially influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice, a threshold of 0.5 to 1 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when the ith observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept), and n is the number of observations. In practice, this can result in a large number of points identified, a practical cut-off of 1 was used to flag observations with meaningful impact.

DFBETA threshold

DFBETAS quantify the influence of the ith observation on the jth regression coefficient as the change in that coefficient when the observation is omitted, expressed in units of the coefficient’s estimated standard error. There is a DFBETA for each model parameter. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets, this threshold can flag a high number of observations with only minor influence on the model. A practical cut-off of 1 was used to flag observations with meaningful impact.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.CEA

dfb.CPA

dfb.CSA

dffit

cov.r

226

0.235

0.170

0.003

−0.023

−0.040

−0.011

0.105

0.106

1.222

230

3.087

0.006

0.014

0.140

−0.131

0.007

0.004

0.240

0.889

60

3.096

0.007

0.017

−0.048

0.088

0.106

−0.114

0.267

0.890

120

−1.662

0.189

0.160

0.712

0.037

−0.635

−0.197

−0.803

1.202

63

−2.580

0.151

0.290

0.278

0.200

0.219

−1.069

−1.089

1.085

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points
  • Two observations had studentized residuals > 3. Both had low leverage and small Cook’s distance, with DFBETAS and DFFITS within conventional ranges. However, their COVRATIO values were substantially < 1, suggesting little effect on point estimates but potential inflation of standard errors (wider confidence intervals). Separately, some observations showed COVRATIO > 1, indicating deflated standard errors (narrower confidence intervals).
Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.046

0.6059

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.990

0.0556

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov supports residuals being normally distributed.
  • The Shapiro-Wilk supports residuals being normally distributed.
  • QQ-plot looks roughly normal.
Assessing collinearity with VIF

Term

VIF

Tolerance

CEA

1.435

0.697

CPA

1.337

0.748

CSA

1.170

0.855

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

0.022

1.947

0.6420

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

No meaningful departures from the assumptions of normality, or independence were observed. However, residual plots and diagnostic tests indicated potential issues with linearity, suggesting that a non-linear relationship may not have been adequately captured, and residuals may be mildly heteroscedastic. Additionally, the presence of observations with high DFBETA, DFFIT and COVRatio indicates that further investigation of potential influential points is warranted.

Forest plot showing original and reproduced coefficients and 95% confidence intervals for interpersonal problems

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

25.2367

CEA

1.09

1.0911

0.0011

Reproduced

CPA

0.23

0.2288

−0.0012

Reproduced

CSA

−0.08

−0.0803

−0.0003

Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in lower 95% confidence intervals for coefficients

term

O_lower

R_lower

Change.lci

Reproduce.lower

Intercept

19.4844

CEA

0.54

0.5362

−0.0038

Reproduced

CPA

−0.80

−0.8001

−0.0001

Reproduced

CSA

−0.74

−0.7352

0.0048

Reproduced

O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced.

Change in upper 95% confidence intervals for coefficients

term

O_upper

R_upper

Change.uci

Reproduce.upper

Intercept

30.9890

CEA

1.65

1.6459

−0.0041

Reproduced

CPA

1.26

1.2577

−0.0023

Reproduced

CSA

0.58

0.5747

−0.0053

Incorrect Rounding

O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced.

Change in R2

O_R2

R_R2

Change.R2

Reproduce.R2

0.080

0.0785

−0.0015

Reproduced

O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in global F

Term

O_F

R_F

Change.F

Reproduce.F

Intercept

7.73

7.7262

−0.0038

Reproduced

O_F = original global F; R_F = reproduced global F; Change.F = change in R_F - O_F; Reproduce.F = Global F reproduced.

Change in degrees of freedom

O_DF1

R_DF1

Change.DF1

Reproduce.DF1

O_DF2

R_DF2

Change.DF2

Reproduce.DF2

3

3

0

Reproduced

272

272

0

Reproduced

O_DF1 = original degrees of freedom for the model; R_DF1 = reproduced degrees of freedom for the model; Change.DF1 = change in R_DF1 - O_DF1; Reproduce.DF1 = reproduced degrees of freedom for the model (R_DF1 = O_DF1); O_DF2 = original degrees of freedom for the residuals; R_DF2 = reproduced degrees of freedom for the residuals; Change.DF2 = change in R_DF2 - O_DF2; Reproduce.DF2 = reproduced degrees of freedom for the residuals (R_DF2 = O_DF2).

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

CEA

<0.001

<0.001

0.0000

Reproduced

Remains sig, B same direction

CPA

0.662

0.6619

−0.0001

Reproduced

Remains non-sig, B same direction

CSA

0.809

0.8095

0.0005

Reproduced

Remains non-sig, B same direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman Plot showing differences between original and reproduced p-values for interpersonal problems

Results for p-values
  • The p-values in this model were reproduced with mean differences for p-values close to zero.

Conclusion computational reproducibility

This model was mostly computationally reproducible, with minor rounding errors. P-values were reproduced and had the same interpretation, and regression coefficients did not change direction.

Methods

The model was successfully reproduced; however, residual diagnostics suggested potential non-linearity and the presence of observations that may influence point estimates or confidence intervals. Evidence of non-linearity was primarily associated with the variables CEA and CPA. To assess the impact of this departure from linearity, non-linear terms were introduced separately for CEA and CPA and the resulting model fit (AIC and BIC) were compared with the original specification. In these models, all variables were standardized. The non-linear predictor, which was centered and scaled prior to creation of squared and cubic terms. Visualisation was performed using scatterplots on the original scale to illustrate the fitted relationships from the linear, quadratic, and cubic models.

When non-linear terms were not supported, robustness of the findings was further assessed by examining bootstrapped standardized regression coefficients and their corresponding 95% confidence intervals. Percentage and absolute changes in estimates and confidence interval bounds relative to the linear model were summarised using thresholds of 10% change and standardized coefficient differences of <0.10 and <0.20. Coefficient direction and statistical significance were assessed for consistency. Wild bootstrapping was used to account for potential heteroscedasticity.

Results

standardised z_IIP32_Total with only linear terms
Characteristic Beta 95% CI1 p-value
(Intercept) 0.00 -0.11, 0.11 >0.999
z_CEA 0.27 0.13, 0.41 <0.001
z_CPA 0.03 -0.10, 0.16 0.662
z_CSA -0.02 -0.14, 0.11 0.809
1 CI = Confidence Interval
Comparison of z_IIP32_Total with non-linear terms for CEA
Characteristic
CEA Quadratic
CEA Cubic
Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 0.0834 -0.0566, 0.2234 0.242 0.1930 0.0158, 0.3702 0.0329
z_CEA 0.3963 0.2121, 0.5805 <0.001 0.3954 0.2121, 0.5786 <0.001
z_CEA_2 -0.0837 -0.1656, -0.0018 0.045 -0.3023 -0.5355, -0.0691 0.0112
z_CPA 0.0514 -0.0821, 0.1849 0.449 0.0535 -0.0794, 0.1863 0.4288
z_CSA 0.0070 -0.1181, 0.1322 0.912 0.0068 -0.1177, 0.1313 0.9145
z_CEA_3


0.0625 0.0000, 0.1250 0.0499
1 CI = Confidence Interval
Comparison of z_IIP32_Total with non-linear terms for CPA
Characteristic
CPA Quadratic
CPA Cubic
Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 0.0576 -0.0679, 0.1831 0.367 0.1163 -0.0519, 0.2845 0.175
z_CEA 0.2633 0.1268, 0.3998 <0.001 0.2644 0.1279, 0.4009 <0.001
z_CPA 0.2628 0.0102, 0.5154 0.042 0.4133 0.0307, 0.7958 0.034
z_CPA_2 -0.0578 -0.1113, -0.0044 0.034 -0.1792 -0.4171, 0.0587 0.139
z_CSA -0.0064 -0.1298, 0.1170 0.918 -0.0101 -0.1336, 0.1135 0.873
z_CPA_3


0.0156 -0.0142, 0.0453 0.303
1 CI = Confidence Interval
Comparison of z_IIP32_Total with non-linear models fit

Model_type

R2adj

sigma

df

logLik

AIC

BIC

deviance

df.residual

nobs

Linear

0.068

0.9652149

3

−379.8407

769.6815

787.7835

253.4060

272

276

CEA Quadratic

0.079

0.9598501

4

−377.7941

767.5883

789.3107

249.6756

271

276

CEA Cubic

0.088

0.9547889

5

−375.8248

765.6496

790.9924

246.1379

270

276

CPA Quadratic

0.080

0.9589936

4

−377.5478

767.0955

788.8179

249.2302

271

276

CPA Cubic

0.081

0.9588824

5

−377.0056

768.0111

793.3540

248.2530

270

276

Linearity conclusion

After re-running the models and re-checking assumptions, statistically significant departures from linearity were detected. Quadratic and cubic terms were considered for CEA and CPA. Although the quadratic terms were statistically significant, AIC and BIC did not indicate improved model fit. Visual inspection suggested that the apparent non-linearity arose from sparse data at the higher end of the predictor ranges rather than a meaningful improvement in model performance.

Bootstrapping results

As the inclusion of non-linear terms did not improve model fit, the original model was retained. Further model assessment was conducted using Wild bootstrapping with 10,000 iterations.

Change in regression coefficients

Term

B

boot.B

B_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.0000

−0.0008

0.0008

1,000.0000

Yes

No

No

z_CEA

0.2699

0.2680

0.0019

0.7000

No

No

No

z_CPA

0.0295

0.0303

−0.0008

−2.7700

No

No

No

z_CSA

−0.0152

−0.0144

−0.0008

−5.0600

No

No

No

B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in lower 95% confidence interval

Term

Lower

boot.Lower

Lower_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

−0.1144

−0.1138

−0.0006

−0.5200

No

No

No

z_CEA

0.1326

0.1055

0.0272

20.4900

Yes

No

No

z_CPA

−0.1030

−0.1087

0.0057

5.5200

No

No

No

z_CSA

−0.1391

−0.1587

0.0196

14.0600

Yes

No

No

Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in upper 95% confidence interval

Term

Upper

boot.Upper

Upper_diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.1144

0.1102

0.0042

3.6600

No

No

No

z_CEA

0.4071

0.4365

−0.0293

−7.2100

No

No

No

z_CPA

0.1620

0.1710

−0.0091

−5.6000

No

No

No

z_CSA

0.1087

0.1292

−0.0205

−18.8700

Yes

No

No

Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference, percentage changes were truncated at ±1000%; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in Range of 95% confidence interval

Term

Range

boot.Range

Range_Diff

%_Diff

Diff_10%

Diff_0.1

Diff_0.2

Intercept

0.2288

0.2240

−0.0048

−2.0900

No

No

No

z_CEA

0.2745

0.3310

0.0565

20.5900

Yes

No

No

z_CPA

0.2650

0.2798

0.0148

5.5700

No

No

No

z_CSA

0.2478

0.2879

0.0401

16.1700

Yes

No

No

Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, percentage changes were truncated at ±1000%, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively.

Change in p-value significance and regression coefficient direction

Term

p-value

boot.p-value

changep

SigChangeDirection

Intercept

1.0000

0.9890

0.0110

Remains non-sig, B changes direction

z_CEA

<0.001

0.0015

−0.0014

Remains sig, B same direction

z_CPA

0.6619

0.6698

−0.0079

Remains non-sig, B same direction

z_CSA

0.8095

0.8435

−0.0340

Remains non-sig, B same direction

p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model.

Check the distribution of bootstrap estimates

The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.

Conclusions based on the bootstrapped model

This model was inferentially reproducible. While some statistics changed by 10% or more, these differences were not meaningful with a change in standardized regression coefficients of less than 0.1. The direction of effects and statistical significance remained consistent between the reproduced and bootstrapped models.