Paper 22: Post-traumatic stress disorder and self reported outcomes after traumatic brain injury in victims of assault

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

March 15, 2026

References

Bown D, Belli A, Qureshi K, Davies D, Toman E, Upthegrove R (2019) Post-traumatic stress disorder and self-reported outcomes after traumatic brain injury in victims of assault. PLoS ONE 14(2): e0211684. https://doi.org/10.1371/journal.pone.0211684

Bown, Dominic and Belli, Antonio and Davies, David and Toman, Emma (2019) Research data supporting “Post-traumatic stress disorder and self-reported outcomes after traumatic brain injury in victims of assault”. https://doi.org/10.25500/eData.bham.00000293

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Results from the reproduction of the Bown et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.

Summary from statistical review

This study examines the association of assault with self-reported outcomes, including quality of life in patients attending traumatic brain injury clinics. The article is easy to read and uses a mix of non-parametric and linear regression but does not discuss normality, other assumptions, or collinearity. The authors state a cross-sectional design and describe a clinically meaningful difference between groups. Univariate modelling is done with different tests, and the authors talk in terms of significant or borderline variables for inclusion in multivariable models.

Data availability and software used

Data is available from University of Birmingham open access UBRIA eData repository in an Excel in a wide format with no data dictionary. SPSS was used for analysis.

Regression sample

There were eight multivariable model identified in the paper with the main focusing on physical, cognitive and emotional symptoms after traumatic brain injury (TBI). Three linear regressions were randomly sampled these outcomes included Rivermead Post-concussion Questionnaire (RPQ), both the full scale and the subscale RPQ13 and the Impact of Event Scale (IES). All analysis was adjusted for age, ethnicity and extracranial trauma.

Computational reproducibility results

The three models assessed in this paper were not found to be computationally reproducible and were only partially reproduced. A partial data dictionary was provided, with label values for categorical variables and missing data definitions included in the accompanying Excel file. Identification of “extracranial trauma” was unclear; it appeared to correspond to a variable named Polytrauma in the dataset, which was inferred by matching descriptive statistics. Multiple missing value codes were used inconsistently, for example, age was documented as missing if coded 100, but in the dataset, missing values appeared as zero. A filter variable was available to identify included cases, which is good practice.

Although the reproduced sample sizes matched the publication, minor discrepancies in medians and regression results suggested potential differences in the data. These may have arisen from updates to the dataset after results were generated. For instance, the median age for participants experiencing assault was reported as 29.3 (22.1, 36.4), whereas the reproduced median was 29.0 (22.0, 35.0). Similarly, RPQ (full) scores for the same group were reported as 31.0 (13.4, 42.0) but reproduced as 31.0 (14.75, 42.00). These small differences may reflect slight variations in the dataset used or differences in rounding or summary procedures.

Linear regression results were not fully reproduced; coefficients and p-values differed at the decimal level. The authors stated that pairwise deletion was used for missing data or incomplete questionnaires to maximize data usage. It was unclear whether this applied to the regression models. Testing the pairwise option in SPSS yielded results that differed substantially from the reported findings, suggesting that discrepancies between the reproduced and reported analyses were likely due to minor differences in the data rather than analysis strategy. Linear regression results were also difficult to reproduce due to incomplete reporting. Only the regression coefficients and p-values for the primary independent variable were provided, with limited detail on the model specification. While the covariates used for adjustment were listed, it was unclear how categorical variables were handled. For example, Ethnicity, which included levels such as White, Asian, and Other, appeared to have been treated as a continuous variable. In SPSS, the regression function does not automatically handle categorical variables and requires the user to manually create dummy variables.

Inferential reproducibility results

None of the three models assessed in this paper were found to be inferentially reproducible. The primary reasons were violations of model assumptions and issues with data distribution and parameterisation. Notably, ethnicity was treated as a continuous variable, which is inappropriate given that it is a nominal variable. While treating ordinal variables as continuous can be reasonable if the relationship is approximately linear, this does not apply to nominal variables like ethnicity, which should be modelled categorically. Across the three outcome variables, 10–24% of values were zero. This was most pronounced in the IES model, where a strong residual pattern indicated violation of the normality assumption. A Tweedie distribution, which accounts for zero inflation and non-normality, provided a substantially better fit, reducing AIC by 69–159 across all outcomes.

Model 1

Model results for RPQ (full)

Term

B

SE

Lower

Upper

t

p-value

Intercept

Mechanism_of_TBI:

Assault – Non-assault

5.708

3.495

0.105

Age

Ethicity_short

Extracranial_trauma:

Yes – No

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for RPQ (full)

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for RPQ (full)

Term

SS

DF

MS

F

p-value

Mechanism_of_TBI

Age

Ethicity_short

Extracranial_trauma

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for RPQ (full)

Term

B

SE

Lower

Upper

t

p-value

Intercept

22.782

5.661

11.599

33.965

4.025

<0.001

Mechanism_of_TBI:

Assault – Non-assault

5.718

3.495

−1.186

12.623

1.636

0.1039

Age

−0.141

0.077

−0.294

0.012

−1.814

0.0716

Ethicity_short

2.842

2.836

−2.761

8.446

1.002

0.3179

Extracranial_trauma:

Yes – No

4.679

2.971

−1.191

10.548

1.575

0.1174

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for RPQ (full)

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.308

0.095

0.071

1,352.162

16.812

4.013

4

153

0.0040

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for RPQ (full)

Term

SS

DF

MS

F

p-value

Mechanism_of_TBI

781.299

1

781.299

2.677

0.1039

Age

960.764

1

960.764

3.292

0.0716

Ethicity_short

293.130

1

293.130

1.004

0.3179

Extracranial_trauma

723.878

1

723.878

2.480

0.1174

Residuals

44,656.668

153

291.874

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

Mechanism_of_TBI

Age

−2.245

0.0262

Linearity violation

Ethicity_short

−2.241

0.0265

Linearity violation

Extracranial_trauma

Tukey test

−0.164

0.8700

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results
  • Plots and tests show an unaccounted non-linear relationship may be present.
Testing for homoscedasticity

Statistic

p-value

Parameter

Method

1.147

0.8868

4

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

RPQ (full)

158

23.332

17.728

20.000

0.000

60.000

0.334

−1.092

Mechanism_of_TBI*

158

1.241

0.429

1.000

1.000

2.000

1.203

−0.557

Age

158

42.209

19.564

39.500

16.000

88.000

0.348

−1.123

Ethicity_short

158

1.203

0.502

1.000

1.000

3.000

2.439

5.049

Extracranial_trauma*

158

1.361

0.482

1.000

1.000

2.000

0.574

−1.681

.fitted

158

23.332

5.463

22.976

13.259

36.053

0.210

−0.745

.resid

158

−0.000

16.865

−3.426

−33.351

38.824

0.373

−0.770

.leverage

158

0.032

0.020

0.028

0.013

0.124

2.611

7.962

.sigma

158

17.084

0.063

17.109

16.837

17.140

−1.602

2.174

.cooksd

158

0.007

0.010

0.003

0.000

0.060

2.829

9.293

.std.resid

158

−0.001

1.004

−0.206

−2.003

2.319

0.362

−0.773

dfb.1_

158

0.000

0.083

0.001

−0.224

0.336

0.854

3.262

dfb.M__T

158

0.000

0.087

0.002

−0.329

0.357

−0.149

3.235

dfb.Age

158

0.000

0.074

0.000

−0.253

0.257

0.042

1.011

dfb.Eth_

158

−0.001

0.094

0.000

−0.501

0.380

−1.441

11.776

dfb.Ex_Y

158

0.000

0.081

0.008

−0.266

0.185

−0.388

0.623

dffit

158

−0.005

0.186

−0.032

−0.553

0.478

−0.066

0.157

cov.r

158

1.034

0.043

1.043

0.895

1.156

−0.808

1.605

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.

DFBETA threshold

DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.M__T

dfb.Age

dfb.Eth_

dfb.Ex_Y

dffit

cov.r

21

0.443

0.112

0.005

−0.100

0.048

0.045

0.134

−0.026

0.157

1.156

103

2.225

0.016

0.016

0.005

−0.061

0.129

−0.011

−0.111

0.287

0.895

61

2.353

0.040

0.044

−0.224

−0.134

0.161

0.380

−0.170

0.478

0.900

27

−1.400

0.124

0.055

0.329

0.106

−0.175

−0.491

0.124

−0.528

1.107

172

−1.614

0.105

0.060

0.336

0.107

−0.105

−0.501

−0.118

−0.553

1.061

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points
  • Cook’s Distance and other influence diagnostics indicated no observations of concern.
Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.098

0.0945

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.969

0.0012

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov supports residuals being normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • The Q-Q plot shows a minor violation of normality in the middle, so this should be checked.
Assessing collinearity with VIF

Term

VIF

Tolerance

Mechanism_of_TBI

1.208

0.828

Age

1.235

0.810

Ethicity_short

1.089

0.918

Extracranial_trauma

1.102

0.908

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

−0.008

2.009

0.9800

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

The model met the assumptions of homoscedasticity and independence. However, linearity issues were observed for age and ethnicity. Ethnicity was treated as a continuous variable, which may be inappropriate given its nominal nature. Normality tests and the Q-Q plot suggested the residuals may not be normally distributed. No collinearity issues or influential outliers were identified.

Forest plot showing original and reproduced coefficients and 95% confidence intervals for RPQ (full)

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

22.7822

Mechanism_of_TBI:

Assault – Non-assault

5.708

5.7181

0.0101

Not Reproduced

Age

−0.1405

Ethicity_short

2.8423

Extracranial_trauma:

Yes – No

4.6786

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in Standard error

term

O_SE

R_SE

Change.SE

Reproduce.SE

Intercept

5.6607

Mechanism_of_TBI:

Assault – Non-assault

3.495

3.4949

−0.0001

Reproduced

Age

0.0774

Ethicity_short

2.8362

Extracranial_trauma:

Yes – No

2.9709

O_SE = original standard error; R_SE = reproduced standard error; Change.SE = change in R_SE - O_SE; Reproduce.lower = standard error reproduced.

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

Mechanism_of_TBI:

Assault – Non-assault

0.105

0.1039

−0.0011

Not Reproduced

Remains non-sig, B same direction

Age

0.0716

Ethicity_short

0.3179

Extracranial_trauma:

Yes – No

0.1174

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Results for p-values

The p-value for this model was not reproduced.

Conclusion computational reproducibility

This model was not computationally reproducible, although some aspects were partially reproduced such as Standard error. Minor discrepancies in regression coefficients and p-values suggest there may be small differences in the underlying data.

Methods

While the model was partially reproduced, it is likely that minor differences in the underlying data account for the remaining discrepancies. However, these differences are unlikely to have affected the model assumptions, and inferential reproducibility was assessed. The original model exhibited violations of normality and linearity, with 9.6% of RPQ_full values equal to zero. To account for the excess zeros and the non-normal distribution, a Tweedie model was fitted and compared to the reproduced model using AIC. Ethnicity was included as a categorical variable, and age was assessed for linearity. Age was standardized for the analysis, but RPQ_full was not standardized to maintain compatibility with the Tweedie distribution.

Results

Treating ethnicity as a continuous variable was not appropriate, as the differences between groups were not equally spaced and the relationship with the outcome was not linear. The categorical variable improved model fit, with a decrease in AIC of 3.13.

Reproduced

Categorical Ethnicity

Characteristic

Beta

SE1

Statistic

95% CI1

p-value

Beta

SE1

Statistic

95% CI1

p-value

(Intercept)

16.66

3.74

4.45

9.27, 24.05

<0.001

19.29

1.99

9.67

15.35, 23.23

<0.001

Mechanism_of_TBI

Non-assault

Assault

5.72

3.49

1.64

−1.19, 12.62

0.104

4.49

3.49

1.29

−2.41, 11.39

0.201

Extracranial_trauma

No

Yes

4.68

2.97

1.57

−1.19, 10.55

0.117

4.63

2.93

1.58

−1.16, 10.42

0.116

z_Age

−2.86

1.58

−1.81

−5.98, 0.25

0.072

−2.54

1.56

−1.63

−5.63, 0.55

0.106

Ethicity_short

2.84

2.84

1.00

−2.76, 8.45

0.318

Ethicity

White

Asian

10.72

4.49

2.39

1.84, 19.59

0.018

Other

−2.20

6.61

−0.33

−15.26, 10.87

0.740

1SE = Standard Error, CI = Confidence Interval

Note: Reproduced model – Adjusted R² = 0.071; AIC = 1352.16; BIC = 1370.54

Categorical Ethnicity model – Adjusted R² = 0.095; AIC = 1349.03; BIC = 1370.47

Fitting a Tweedie model to account for zero’s, the estimated dispersion parameter (φ) was 4.70, indicating greater variability than would be expected under a Gaussian model, and the Tweedie power parameter (p) was 1.36, consistent with a compound Poisson–Gamma distribution. Compared to the linear model, the Tweedie model yielded a substantial AIC reduction of 68.96. Although inclusion of a squared age term marginally improved model fit (AIC and BIC) and residual diagnostics, these improvements were small, and the simpler model was retained.

In this model, the intercept represents the geometric mean of the RPQ_full score for patients who were white, had not been assaulted, had not experienced extracranial trauma, and were at the average age (since age was standardized). This corresponds to the expected RPQ_full score when all predictors are at their reference level or zero. The exponentiated coefficient for Mechanism_of_TBI indicates that patients who had been assaulted had an estimated 19% higher RPQ_full score compared to those who had not (p = 0.271). The coefficient for standardized age was 0.87, suggesting that the RPQ_full decreases by 13% as age increased by one standard deviation.

Tweedie with linear age

Tweedie with quadratic age

Characteristic

exp(Beta)

95% CI1

p-value

exp(Beta)

95% CI1

p-value

(Intercept)

19.5

16.1, 23.5

<0.001

24.4

19.1, 31.1

<0.001

Mechanism_of_TBI

Non-assault

Assault

1.19

0.87, 1.61

0.271

1.16

0.86, 1.57

0.339

Extracranial_trauma

No

Yes

1.19

0.91, 1.56

0.202

1.14

0.88, 1.49

0.316

Ethicity

White

Asian

1.41

0.98, 2.04

0.068

1.41

0.98, 2.03

0.062

Other

0.90

0.48, 1.67

0.731

0.91

0.50, 1.68

0.770

z_Age

0.87

0.75, 1.01

0.068

0.87

0.75, 1.01

0.074

z_Age2

0.78

0.66, 0.93

0.006

1CI = Confidence Interval

Tweedie with linear age — Dispersion (φ): 4.70; Power (p): 1.360; AIC: 1283.20; BIC: 1307.70

Tweedie with quadratic age — Dispersion (φ): 4.62; Power (p): 1.350; AIC: 1277.59; BIC: 1305.16

The residual diagnostics for the final Tweedie model appeared satisfactory, with the DHARMa package producing appropriate residual plots to assess overdispersion, zero inflation, and uniformity. DHARMa residuals are based on the rank of each observed value among many simulated values from the fitted model. This approach produces discrete residual values (e.g., 0.25, 0.5, 0.75) that reflect the percentile position of the observed value within the simulated distribution.

Inferential Reproducibility conclusion

This model was not found to be inferentially reproducible, due to differences in model distribution and specification. A Tweedie distribution was identified as more appropriate for the data, given the presence of zero inflation and non-normality. In addition, model parameterisation differed, including treating categorical variables as continuous.

Model 2

Model results for IES

Term

B

SE

Lower

Upper

t

p-value

Intercept

Mechanism_of_TBI:

Assault – Non-assault

6.550

4.736

0.170

Age

Ethicity_short

Extracranial_trauma:

Yes – No

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for IES

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for IES

Term

SS

DF

MS

F

p-value

Mechanism_of_TBI

Age

Ethicity_short

Extracranial_trauma

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results IES

Term

B

SE

Lower

Upper

t

p-value

Intercept

8.742

7.982

−7.081

24.564

1.095

0.2759

Mechanism_of_TBI:

Assault – Non-assault

6.512

4.737

−2.877

15.901

1.375

0.1720

Age

−0.049

0.097

−0.241

0.144

−0.502

0.6169

Ethicity_short

7.784

4.657

−1.447

17.015

1.672

0.0975

Extracranial_trauma:

Yes – No

3.146

3.883

−4.551

10.844

0.810

0.4196

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Model fit for IES

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.271

0.074

0.039

991.981

18.491

2.148

4

108

0.0798

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for IES

Term

SS

DF

MS

F

p-value

Mechanism_of_TBI

676.212

1

676.212

1.890

0.1720

Age

90.033

1

90.033

0.252

0.6169

Ethicity_short

999.480

1

999.480

2.794

0.0975

Extracranial_trauma

234.865

1

234.865

0.657

0.4196

Residuals

38,634.804

108

357.730

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

Mechanism_of_TBI

Age

−1.163

0.2476

No linearity violation

Ethicity_short

−0.680

0.4980

No linearity violation

Extracranial_trauma

Tukey test

0.475

0.6348

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results

No linearity violation was observed in either plots or tests.

Testing for homoscedasticity

Statistic

p-value

Parameter

Method

0.177

0.9963

4

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

IES

113

17.796

19.298

10.000

0.000

62.000

0.791

−0.756

Mechanism_of_TBI*

113

1.204

0.404

1.000

1.000

2.000

1.453

0.113

Age

113

43.788

20.400

43.000

16.000

88.000

0.238

−1.330

Ethicity_short

113

1.124

0.404

1.000

1.000

3.000

3.359

10.943

Extracranial_trauma*

113

1.354

0.480

1.000

1.000

2.000

0.603

−1.651

.fitted

113

17.796

5.239

16.408

12.239

37.633

1.652

2.629

.resid

113

−0.000

18.573

−7.895

−26.461

48.397

0.847

−0.462

.leverage

113

0.044

0.035

0.035

0.017

0.229

3.601

15.486

.sigma

113

18.913

0.111

18.953

18.405

19.002

−2.349

5.845

.cooksd

113

0.010

0.019

0.004

0.000

0.142

4.911

28.171

.std.resid

113

−0.001

1.005

−0.421

−1.585

2.583

0.823

−0.496

dfb.1_

113

0.000

0.095

0.001

−0.415

0.421

0.350

5.360

dfb.M__T

113

0.001

0.097

0.004

−0.274

0.396

0.485

2.862

dfb.Age

113

0.000

0.085

−0.002

−0.215

0.206

0.038

−0.232

dfb.Eth_

113

−0.001

0.118

0.001

−0.766

0.678

−0.859

23.013

dfb.Ex_Y

113

−0.000

0.097

0.011

−0.209

0.324

0.388

0.753

dffit

113

−0.004

0.225

−0.083

−0.849

0.770

0.293

1.543

cov.r

113

1.049

0.071

1.057

0.776

1.319

−0.787

4.515

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.

DFBETA threshold

DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.M__T

dfb.Age

dfb.Eth_

dfb.Ex_Y

dffit

cov.r

103

2.433

0.020

0.024

−0.001

−0.077

0.150

0.009

−0.140

0.352

0.817

125

−0.647

0.221

0.024

0.207

0.074

−0.026

−0.321

−0.042

−0.345

1.319

147

2.655

0.019

0.026

0.030

−0.097

0.122

0.002

−0.162

0.368

0.776

99

1.413

0.229

0.117

−0.415

0.157

0.031

0.678

−0.181

0.770

1.238

131

−1.596

0.221

0.142

0.421

0.220

0.056

−0.766

−0.078

−0.849

1.195

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points
  • While two observations tended to higher Cook’s values than the rest of the observations these were below 0.5 and dbetas and dffit were within reasonable range indicating the outliers are not substantially effecting results. Although the COVRATIO indicated there maybe some observations that may affect the precision of the parameter estimates.
Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.180

0.0014

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.890

<0.001

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov test indicates residuals may not be normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • QQ-plot indicates the residuals are not normally distributed.
Assessing collinearity with VIF

Term

VIF

Tolerance

Mechanism_of_TBI

1.149

0.870

Age

1.229

0.814

Ethicity_short

1.107

0.903

Extracranial_trauma

1.089

0.918

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

0.125

1.744

0.1700

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

The model met the assumptions of homoscedasticity and independence. However, linearity issues were observed for age and ethnicity. Ethnicity was treated as a continuous variable, which may be inappropriate given its nominal nature. Normality tests and the Q-Q plot suggested that the residuals were not normally distributed. A strong pattern in the residuals was observed, largely driven by the substantial proportion of zero values in the IES outcome variable (24%), which contributed to the non-normality. No collinearity issues or influential outliers were identified.

Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for IES

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

8.7417

Mechanism_of_TBI:

Assault – Non-assault

6.55

6.5124

−0.0376

Not Reproduced

Age

−0.0487

Ethicity_short

7.7843

Extracranial_trauma:

Yes – No

3.1465

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in Standard error

term

O_SE

R_SE

Change.SE

Reproduce.SE

Intercept

7.9823

Mechanism_of_TBI:

Assault – Non-assault

4.736

4.7367

0.0007

Incorrect Rounding

Age

0.0971

Ethicity_short

4.6570

Extracranial_trauma:

Yes – No

3.8833

O_SE = original standard error; R_SE = reproduced standard error; Change.SE = change in R_SE - O_SE; Reproduce.lower = standard error reproduced.

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

0.2759

Mechanism_of_TBI:

Assault – Non-assault

0.170

0.1720

0.0020

Not Reproduced

Remains non-sig, B same direction

Age

0.6169

Ethicity_short

0.0975

Extracranial_trauma:

Yes – No

0.4196

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Results for p-values

The p-value was not reproduced for this model.

Conclusion computational reproducibility

This model was not computationally reproducible, although some aspects were partially reproduced such as the standard error. Minor discrepancies in regression coefficients and p-values suggest there may be small differences in the underlying data.

Methods

While the model was partially reproduced, it is likely that minor differences in the underlying data account for the remaining discrepancies. However, these differences are unlikely to have affected the model assumptions, and inferential reproducibility was assessed. The original model exhibited violations of normality, with 24.0% of IES values equal to zero. To account for the excess zeros and the non-normal distribution, a Tweedie model was fitted and compared to the reproduced model using AIC. Ethnicity was included as a categorical variable, and age was assessed for linearity. Age was standardized for the analysis, but IES was not transformed to maintain compatibility with the Tweedie distribution.

Results

Although the AIC for categorical ethnicity was was slightly higher (1.51), this difference is small enough to be considered negligible. However, treating ethnicity as a continuous variable is not appropriate, as the group differences are not equally spaced. This leads to inaccurate estimation of both effect sizes and confidence intervals, even though the direction of the estimates remained consistent.

Reproduced

Categorical Ethnicity

Characteristic

Beta

SE1

Statistic

95% CI1

p-value

Beta

SE1

Statistic

95% CI1

p-value

(Intercept)

6.62

5.59

1.18

−4.46, 17.70

0.239

14.34

2.56

5.59

9.25, 19.42

<0.001

Mechanism_of_TBI

Non-assault

Assault

6.51

4.74

1.37

−2.88, 15.90

0.172

6.14

4.78

1.28

−3.34, 15.61

0.202

Extracranial_trauma

No

Yes

3.15

3.88

0.81

−4.55, 10.84

0.420

3.12

3.89

0.80

−4.60, 10.84

0.425

z_Age

−0.99

1.98

−0.50

−4.91, 2.93

0.617

−0.96

1.98

−0.49

−4.90, 2.97

0.629

Ethicity_short

7.78

4.66

1.67

−1.45, 17.02

0.098

Ethicity

White

Asian

11.55

7.25

1.59

−2.81, 25.92

0.114

Other

11.17

11.36

0.98

−11.35, 33.69

0.328

1SE = Standard Error, CI = Confidence Interval

Note: Reproduced model – Adjusted R² = 0.039; AIC = 991.98; BIC = 1008.35

Categorical Ethnicity model – Adjusted R² = 0.035; AIC = 993.49; BIC = 1012.59

Fitting a Tweedie model to account for zero’s, the estimated dispersion parameter (φ) was 6.32, indicating greater variability than would be expected under a Gaussian model, and the Tweedie power parameter (p) was 1.46, consistent with a compound Poisson–Gamma distribution. Compared to the linear model, the Tweedie model yielded a substantial AIC reduction of 159.43. Including a squared age term did not improve fit with similar AIC, therefore age was found to be linear.

In this model, the intercept represents the geometric mean of the IES score for patients who were white, had not been assaulted, had not experienced extracranial trauma, and were at the average age (since age was standardized). This corresponds to the expected IES score when all predictors are at their reference level or zero. The exponentiated coefficient for Mechanism_of_TBI indicates that patients who had been assaulted had an estimated 34% higher IES score compared to those who had not (p = 0.291). The coefficient for standardized age was 0.93, suggesting that the expected IES score decreases by 7% as age increases by one standard deviation, although this was not statistically significant.

Tweedie with linear age

Tweedie with quadratic age

Characteristic

exp(Beta)

95% CI1

p-value

exp(Beta)

95% CI1

p-value

(Intercept)

14.6

10.7, 20.1

<0.001

17.5

11.5, 26.5

<0.001

Mechanism_of_TBI

Non-assault

Assault

1.34

0.78, 2.31

0.291

1.36

0.79, 2.33

0.268

Extracranial_trauma

No

Yes

1.17

0.73, 1.87

0.511

1.15

0.72, 1.83

0.568

Ethicity

White

Asian

1.59

0.74, 3.40

0.235

1.62

0.76, 3.47

0.210

Other

1.55

0.47, 5.07

0.471

1.69

0.51, 5.55

0.387

z_Age

0.93

0.72, 1.19

0.563

0.96

0.74, 1.24

0.749

z_Age2

0.83

0.62, 1.11

0.204

1CI = Confidence Interval

Tweedie with linear age — Dispersion (φ): 6.32; Power (p): 1.460; AIC: 832.55; BIC: 854.37

Tweedie with quadratic age — Dispersion (φ): 6.29; Power (p): 1.460; AIC: 832.94; BIC: 857.48

The residual diagnostics for the final Tweedie model showed mild deviations from the expected uniform distribution, but these were not substantial and did not indicate serious model misfit. The DHARMa package producing appropriate residual plots to assess overdispersion, zero inflation, and uniformity. DHARMa residuals are based on the rank of each observed value among many simulated values from the fitted model. This approach produces discrete residual values (e.g., 0.25, 0.5, 0.75) that reflect the percentile position of the observed value within the simulated distribution.

Inferential Reproducibility conclusion

This model was not found to be inferentially reproducible, due to differences in model distribution and specification. A Tweedie distribution was identified as more appropriate for the data, given the presence of zero inflation and non-normality. In addition, model parameterisation differed, as categorical variables as continuous.

Model 3

Model results for RPQ 13

Term

B

SE

Lower

Upper

t

p-value

Intercept

Mechanism_of_TBI:

Assault – Non-assault

4.462

2.4940

0.131

Age

Ethicity_short

Extracranial_trauma:

Yes – No

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit Statistics RPQ 13

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for RPQ 13

Term

SS

DF

MS

F

p-value

Mechanism_of_TBI

Age

Ethicity_short

Extracranial_trauma

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for RPQ 13

Term

B

SE

Lower

Upper

t

p-value

Intercept

19.748

4.668

10.528

28.969

4.230

<0.001

Mechanism_of_TBI:

Assault – Non-assault

4.466

2.940

−1.340

10.273

1.519

0.1307

Age

−0.116

0.064

−0.242

0.011

−1.800

0.0737

Ethicity_short

2.106

2.411

−2.656

6.868

0.874

0.3837

Extracranial_trauma:

Yes – No

3.973

2.497

−0.958

8.904

1.591

0.1135

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for RPQ 13

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.291

0.085

0.061

1,334.175

14.323

3.625

4

157

0.0074

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for RPQ 13

Term

SS

DF

MS

F

p-value

Mechanism_of_TBI

488.680

1

488.680

2.309

0.1307

Age

686.116

1

686.116

3.241

0.0737

Ethicity_short

161.535

1

161.535

0.763

0.3837

Extracranial_trauma

536.103

1

536.103

2.533

0.1135

Residuals

33,232.885

157

211.674

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Checking residuals plots for patterns

Blue line showing quadratic fit for residuals

Testing residuals for non linear relationships

Term

Statistic

p-value

Results

Mechanism_of_TBI

Age

−2.078

0.0393

Linearity violation

Ethicity_short

−2.184

0.0304

Linearity violation

Extracranial_trauma

Tukey test

−0.044

0.9649

No linearity violation

Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity.

Checking univariate relationships with the dependent variable using scatterplots

Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling

Linearity results

Plots and tests show an unaccounted non-linear relationship may be present.

Testing for homoscedasticity

Statistic

p-value

Parameter

Method

1.167

0.8835

4

studentized Breusch-Pagan test

Homoscedasticity results
  • The studentized Breusch-Pagan test supports homoscedasticity.
  • There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers

Term

N

Mean

SD

Median

Min

Max

Skewness

Kurtosis

RPQ 13

162

19.870

15.016

16.500

0.000

50.000

0.298

−1.145

Mechanism_of_TBI*

162

1.235

0.425

1.000

1.000

2.000

1.241

−0.462

Age

162

41.920

19.597

39.000

16.000

88.000

0.359

−1.126

Ethicity_short

162

1.198

0.496

1.000

1.000

3.000

2.484

5.289

Extracranial_trauma*

162

1.352

0.479

1.000

1.000

2.000

0.615

−1.632

.fitted

162

19.870

4.366

19.680

11.682

30.088

0.184

−0.732

.resid

162

−0.000

14.367

−3.021

−28.098

32.428

0.335

−0.845

.leverage

162

0.031

0.019

0.027

0.012

0.124

2.667

8.315

.sigma

162

14.549

0.051

14.567

14.359

14.596

−1.590

2.178

.cooksd

162

0.007

0.010

0.003

0.000

0.055

2.756

8.629

.std.resid

162

−0.001

1.003

−0.215

−1.981

2.247

0.325

−0.848

dfb.1_

162

0.000

0.081

0.000

−0.210

0.329

0.818

3.086

dfb.M__T

162

0.000

0.087

0.002

−0.331

0.372

−0.101

3.521

dfb.Age

162

0.000

0.073

0.000

−0.220

0.264

0.107

0.976

dfb.Eth_

162

−0.001

0.091

0.000

−0.482

0.355

−1.456

11.851

dfb.Ex_Y

162

0.000

0.079

0.005

−0.266

0.178

−0.370

0.573

dffit

162

−0.005

0.183

−0.032

−0.526

0.455

−0.075

0.050

cov.r

162

1.033

0.041

1.042

0.891

1.157

−0.725

1.862

* categorical variable

Cooks threshold

Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a theshold of 0.5 is often used to identify influential observations.

DFFIT threshold

DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potiential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.

DFBETA threshold

DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.

Influence plot

Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.

COVRATIO plot

COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.

Observations of interest identified by the influence plot

ID

StudRes

Leverage

CookD

dfb.1_

dfb.M__T

dfb.Age

dfb.Eth_

dfb.Ex_Y

dffit

cov.r

21

0.331

0.111

0.003

−0.075

0.035

0.033

0.100

−0.021

0.117

1.157

57

2.227

0.014

0.014

0.172

−0.131

−0.092

−0.045

−0.154

0.266

0.896

103

2.277

0.016

0.016

−0.001

−0.059

0.139

−0.010

−0.110

0.291

0.891

27

−1.375

0.124

0.053

0.329

0.108

−0.171

−0.482

0.126

−0.518

1.110

172

−1.537

0.105

0.055

0.323

0.107

−0.096

−0.477

−0.109

−0.526

1.070

StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed.

Results for outliers and influential points
  • Cook’s Distance and other influence diagnostics indicated no observations of concern.. Although the COVRATIO indicated there maybe some observations that may affect the precision of the parameter estimates.
Checking for normality of the residuals using a Q–Q plot

Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests

Statistic

p-value

Method

0.095

0.1106

Asymptotic one-sample Kolmogorov-Smirnov test

Statistic

p-value

Method

0.968

<0.001

Shapiro-Wilk normality test

Normality results
  • The Kolmogorov-Smirnov supports residuals being normally distributed.
  • The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
  • The Q-Q plot shows a minor violation of normality in the middle, so this should be checked.
Assessing collinearity with VIF

Term

VIF

Tolerance

Mechanism_of_TBI

1.187

0.842

Age

1.204

0.831

Ethicity_short

1.090

0.918

Extracranial_trauma

1.088

0.919

VIF = Variance Inflation Factor.

Collinearity results
  • All VIF values are under three, indicating no collinearity issues.
  • Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation

AutoCorrelation

Statistic

p-value

−0.041

2.075

0.6280

Independence results
  • The Durbin–Watson test suggests there are no auto-correlation issues.
  • The study design seems to be independent.
Assumption conclusions

The model met the assumptions of homoscedasticity and independence. However, linearity issues were observed for age and ethnicity. Ethnicity was treated as a continuous variable, which may be inappropriate given its nominal nature. Normality tests and the Q-Q plot suggested the residuals may not be normally distributed. No collinearity issues or influential outliers were identified.

Forest plot showing original and reproduced coefficients and 95% confidence intervals for RPQ 13

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

19.7483

Mechanism_of_TBI:

Assault – Non-assault

4.462

4.4665

0.0045

Not Reproduced

Age

−0.1156

Ethicity_short

2.1061

Extracranial_trauma:

Yes – No

3.9731

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in Standard error

term

O_SE

R_SE

Change.SE

Reproduce.SE

Intercept

4.6683

Mechanism_of_TBI:

Assault – Non-assault

2.494

2.9396

0.4456

Not Reproduced

Age

0.0642

Ethicity_short

2.4109

Extracranial_trauma:

Yes – No

2.4965

O_SE = original standard error; R_SE = reproduced standard error; Change.SE = change in R_SE - O_SE; Reproduce.lower = standard error reproduced.

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

<0.001

Mechanism_of_TBI:

Assault – Non-assault

0.131

0.1307

−0.0003

Reproduced

Remains non-sig, B same direction

Age

0.0737

Ethicity_short

0.3837

Extracranial_trauma:

Yes – No

0.1135

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Results for p-values

The P-value was reproduced for this model.

Conclusion computational reproducibility

This model was not computationally reproducible, although some aspects were partially reproduced, such as p-values. Minor discrepancies in regression coefficients and standard errors suggest small differences in the underlying data.

Methods

While the model was partially reproduced, it is likely that minor differences in the underlying data account for the remaining discrepancies. However, these differences are unlikely to have affected the model assumptions, and inferential reproducibility was assessed. The original model exhibited violations of normality and linearity, with 11.0% of RPQ13 values equal to zero. To account for the excess zeros and the non-normal distribution, a Tweedie model was fitted and compared to the reproduced model using AIC. Ethnicity was included as a categorical variable, and age was assessed for linearity. Age was standardized for the analysis, but RPQ13 was not transformed to maintain compatibility with the Tweedie distribution.

Results

Treating ethnicity as a continuous variable was not appropriate, as the differences between groups were not equally spaced and the relationship with the outcome was not linear. As categorical variable there was a small decrease in AIC of 2.88.

Reproduced

Categorical Ethnicity

Characteristic

Beta

SE1

Statistic

95% CI1

p-value

Beta

SE1

Statistic

95% CI1

p-value

(Intercept)

14.71

3.13

4.70

8.53, 20.89

<0.001

16.66

1.64

10.15

13.42, 19.91

<0.001

Mechanism_of_TBI

Non-assault

Assault

4.47

2.94

1.52

−1.34, 10.27

0.131

3.42

2.94

1.16

−2.39, 9.24

0.247

Extracranial_trauma

No

Yes

3.97

2.50

1.59

−0.96, 8.90

0.114

3.91

2.47

1.59

−0.96, 8.79

0.115

z_Age

−2.35

1.31

−1.80

−4.94, 0.23

0.074

−2.11

1.30

−1.62

−4.67, 0.46

0.107

Ethicity_short

2.11

2.41

0.87

−2.66, 6.87

0.384

Ethicity

White

Asian

8.64

3.82

2.26

1.09, 16.19

0.025

Other

−2.35

5.63

−0.42

−13.48, 8.78

0.677

1SE = Standard Error, CI = Confidence Interval

Note: Reproduced model – Adjusted R² = 0.061; AIC = 1334.18; BIC = 1352.70

Categorical Ethnicity model – Adjusted R² = 0.083; AIC = 1331.29; BIC = 1352.91

Fitting a Tweedie model to account for zero’s, the estimated dispersion parameter (φ) was 4.47, indicating greater variability than would be expected under a Gaussian model, and the Tweedie power parameter (p) was 1.35, consistent with a compound Poisson–Gamma distribution. Compared to the linear model, the Tweedie model yielded a substantial AIC reduction of 69.25. Inclusion of a squared age term marginally improved model fit (AIC and BIC) and residual diagnostics, these improvements were small, and the simpler model was retained.

In this model, the intercept represents the geometric mean of the RPQ13 score for patients who were white, had not been assaulted, had not experienced extracranial trauma, and were at the average age (since age was standardized). This corresponds to the expected RPQ13 score when all predictors are at their reference level or zero. The exponentiated coefficient for Mechanism_of_TBI indicates that patients who had been assaulted had an estimated 17% higher RPQ13 score compared to those who had not (p = 0.321). The coefficient for standardized age was 0.88, suggesting that the expected RPQ13 score decreases by 12% as age increases by one standard deviation, although this was not statistically significant.

Tweedie with linear age

Tweedie with quadratic age

Characteristic

exp(Beta)

95% CI1

p-value

exp(Beta)

95% CI1

p-value

(Intercept)

16.8

14.0, 20.1

<0.001

20.8

16.3, 26.5

<0.001

Mechanism_of_TBI

Non-assault

Assault

1.17

0.86, 1.58

0.321

1.14

0.84, 1.54

0.409

Extracranial_trauma

No

Yes

1.19

0.91, 1.55

0.195

1.14

0.88, 1.49

0.319

Ethicity

White

Asian

1.39

0.96, 2.02

0.084

1.39

0.96, 2.01

0.080

Other

0.88

0.47, 1.64

0.679

0.89

0.48, 1.65

0.707

z_Age

0.88

0.76, 1.01

0.077

0.87

0.75, 1.01

0.075

z_Age2

0.80

0.67, 0.95

0.012

1CI = Confidence Interval

Tweedie with linear age — Dispersion (φ): 4.47; Power (p): 1.350; AIC: 1264.93; BIC: 1289.63

Tweedie with quadratic age — Dispersion (φ): 4.40; Power (p): 1.340; AIC: 1260.53; BIC: 1288.31

The residual diagnostics for the final Tweedie model appeared satisfactory, with the DHARMa package producing appropriate residual plots to assess overdispersion, zero inflation, and uniformity. DHARMa residuals are based on the rank of each observed value among many simulated values from the fitted model. This approach produces discrete residual values (e.g., 0.25, 0.5, 0.75) that reflect the percentile position of the observed value within the simulated distribution.

Inferential Reproducibility conclusion

This model was not found to be inferentially reproducible, due to differences in model distribution and specification. A Tweedie distribution was identified as more appropriate for the data, given the presence of zero inflation and non-normality. In addition, model parameterisation differed, including treating categorical variables as continuous.