Paper 22: Post-traumatic stress disorder and self reported outcomes after traumatic brain injury in victims of assault
References
Bown D, Belli A, Qureshi K, Davies D, Toman E, Upthegrove R (2019) Post-traumatic stress disorder and self-reported outcomes after traumatic brain injury in victims of assault. PLoS ONE 14(2): e0211684. https://doi.org/10.1371/journal.pone.0211684
Bown, Dominic and Belli, Antonio and Davies, David and Toman, Emma (2019) Research data supporting “Post-traumatic stress disorder and self-reported outcomes after traumatic brain injury in victims of assault”. https://doi.org/10.25500/eData.bham.00000293
Disclosure
This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.
Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.
Methods used in our reproducibility analyses
There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.
The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.
To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.
For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.
Results from the reproduction of the Bown et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.
Summary from statistical review
This study examines the association of assault with self-reported outcomes, including quality of life in patients attending traumatic brain injury clinics. The article is easy to read and uses a mix of non-parametric and linear regression but does not discuss normality, other assumptions, or collinearity. The authors state a cross-sectional design and describe a clinically meaningful difference between groups. Univariate modelling is done with different tests, and the authors talk in terms of significant or borderline variables for inclusion in multivariable models.
Data availability and software used
Data is available from University of Birmingham open access UBRIA eData repository in an Excel in a wide format with no data dictionary. SPSS was used for analysis.
Regression sample
There were eight multivariable model identified in the paper with the main focusing on physical, cognitive and emotional symptoms after traumatic brain injury (TBI). Three linear regressions were randomly sampled these outcomes included Rivermead Post-concussion Questionnaire (RPQ), both the full scale and the subscale RPQ13 and the Impact of Event Scale (IES). All analysis was adjusted for age, ethnicity and extracranial trauma.
Computational reproducibility results
The three models assessed in this paper were not found to be computationally reproducible and were only partially reproduced. A partial data dictionary was provided, with label values for categorical variables and missing data definitions included in the accompanying Excel file. Identification of “extracranial trauma” was unclear; it appeared to correspond to a variable named Polytrauma in the dataset, which was inferred by matching descriptive statistics. Multiple missing value codes were used inconsistently, for example, age was documented as missing if coded 100, but in the dataset, missing values appeared as zero. A filter variable was available to identify included cases, which is good practice.
Although the reproduced sample sizes matched the publication, minor discrepancies in medians and regression results suggested potential differences in the data. These may have arisen from updates to the dataset after results were generated. For instance, the median age for participants experiencing assault was reported as 29.3 (22.1, 36.4), whereas the reproduced median was 29.0 (22.0, 35.0). Similarly, RPQ (full) scores for the same group were reported as 31.0 (13.4, 42.0) but reproduced as 31.0 (14.75, 42.00). These small differences may reflect slight variations in the dataset used or differences in rounding or summary procedures.
Linear regression results were not fully reproduced; coefficients and p-values differed at the decimal level. The authors stated that pairwise deletion was used for missing data or incomplete questionnaires to maximize data usage. It was unclear whether this applied to the regression models. Testing the pairwise option in SPSS yielded results that differed substantially from the reported findings, suggesting that discrepancies between the reproduced and reported analyses were likely due to minor differences in the data rather than analysis strategy. Linear regression results were also difficult to reproduce due to incomplete reporting. Only the regression coefficients and p-values for the primary independent variable were provided, with limited detail on the model specification. While the covariates used for adjustment were listed, it was unclear how categorical variables were handled. For example, Ethnicity, which included levels such as White, Asian, and Other, appeared to have been treated as a continuous variable. In SPSS, the regression function does not automatically handle categorical variables and requires the user to manually create dummy variables.
Inferential reproducibility results
None of the three models assessed in this paper were found to be inferentially reproducible. The primary reasons were violations of model assumptions and issues with data distribution and parameterisation. Notably, ethnicity was treated as a continuous variable, which is inappropriate given that it is a nominal variable. While treating ordinal variables as continuous can be reasonable if the relationship is approximately linear, this does not apply to nominal variables like ethnicity, which should be modelled categorically. Across the three outcome variables, 10–24% of values were zero. This was most pronounced in the IES model, where a strong residual pattern indicated violation of the normality assumption. A Tweedie distribution, which accounts for zero inflation and non-normality, provided a substantially better fit, reducing AIC by 69–159 across all outcomes.
Recommended Changes
- A more comprehensive data dictionary should be provided, ensuring that all variable names and definitions exactly match those reported in the paper.
- Check that reported results match the data provided.
- Include tables with full reporting of statistical analysis.
- Include ethnicity in models as a categorical variable.
- Use an appropriate distribution to fit the data, such as a Tweedie model, to account for zeros.
- Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.
Model 1
Model results for RPQ (full)
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | ||||||
Mechanism_of_TBI: | ||||||
Assault – Non-assault | 5.708 | 3.495 | 0.105 | |||
Age | ||||||
Ethicity_short | ||||||
Extracranial_trauma: | ||||||
Yes – No | ||||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for RPQ (full)
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for RPQ (full)
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Mechanism_of_TBI | |||||
Age | |||||
Ethicity_short | |||||
Extracranial_trauma | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for RPQ (full)
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 22.782 | 5.661 | 11.599 | 33.965 | 4.025 | <0.001 |
Mechanism_of_TBI: | ||||||
Assault – Non-assault | 5.718 | 3.495 | −1.186 | 12.623 | 1.636 | 0.1039 |
Age | −0.141 | 0.077 | −0.294 | 0.012 | −1.814 | 0.0716 |
Ethicity_short | 2.842 | 2.836 | −2.761 | 8.446 | 1.002 | 0.3179 |
Extracranial_trauma: | ||||||
Yes – No | 4.679 | 2.971 | −1.191 | 10.548 | 1.575 | 0.1174 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for RPQ (full)
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.308 | 0.095 | 0.071 | 1,352.162 | 16.812 | 4.013 | 4 | 153 | 0.0040 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for RPQ (full)
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Mechanism_of_TBI | 781.299 | 1 | 781.299 | 2.677 | 0.1039 |
Age | 960.764 | 1 | 960.764 | 3.292 | 0.0716 |
Ethicity_short | 293.130 | 1 | 293.130 | 1.004 | 0.3179 |
Extracranial_trauma | 723.878 | 1 | 723.878 | 2.480 | 0.1174 |
Residuals | 44,656.668 | 153 | 291.874 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
Mechanism_of_TBI | |||
Age | −2.245 | 0.0262 | Linearity violation |
Ethicity_short | −2.241 | 0.0265 | Linearity violation |
Extracranial_trauma | |||
Tukey test | −0.164 | 0.8700 | No linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
- Plots and tests show an unaccounted non-linear relationship may be present.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
1.147 | 0.8868 | 4 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
RPQ (full) | 158 | 23.332 | 17.728 | 20.000 | 0.000 | 60.000 | 0.334 | −1.092 |
Mechanism_of_TBI* | 158 | 1.241 | 0.429 | 1.000 | 1.000 | 2.000 | 1.203 | −0.557 |
Age | 158 | 42.209 | 19.564 | 39.500 | 16.000 | 88.000 | 0.348 | −1.123 |
Ethicity_short | 158 | 1.203 | 0.502 | 1.000 | 1.000 | 3.000 | 2.439 | 5.049 |
Extracranial_trauma* | 158 | 1.361 | 0.482 | 1.000 | 1.000 | 2.000 | 0.574 | −1.681 |
.fitted | 158 | 23.332 | 5.463 | 22.976 | 13.259 | 36.053 | 0.210 | −0.745 |
.resid | 158 | −0.000 | 16.865 | −3.426 | −33.351 | 38.824 | 0.373 | −0.770 |
.leverage | 158 | 0.032 | 0.020 | 0.028 | 0.013 | 0.124 | 2.611 | 7.962 |
.sigma | 158 | 17.084 | 0.063 | 17.109 | 16.837 | 17.140 | −1.602 | 2.174 |
.cooksd | 158 | 0.007 | 0.010 | 0.003 | 0.000 | 0.060 | 2.829 | 9.293 |
.std.resid | 158 | −0.001 | 1.004 | −0.206 | −2.003 | 2.319 | 0.362 | −0.773 |
dfb.1_ | 158 | 0.000 | 0.083 | 0.001 | −0.224 | 0.336 | 0.854 | 3.262 |
dfb.M__T | 158 | 0.000 | 0.087 | 0.002 | −0.329 | 0.357 | −0.149 | 3.235 |
dfb.Age | 158 | 0.000 | 0.074 | 0.000 | −0.253 | 0.257 | 0.042 | 1.011 |
dfb.Eth_ | 158 | −0.001 | 0.094 | 0.000 | −0.501 | 0.380 | −1.441 | 11.776 |
dfb.Ex_Y | 158 | 0.000 | 0.081 | 0.008 | −0.266 | 0.185 | −0.388 | 0.623 |
dffit | 158 | −0.005 | 0.186 | −0.032 | −0.553 | 0.478 | −0.066 | 0.157 |
cov.r | 158 | 1.034 | 0.043 | 1.043 | 0.895 | 1.156 | −0.808 | 1.605 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.
DFBETA threshold
DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.M__T | dfb.Age | dfb.Eth_ | dfb.Ex_Y | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|---|---|
21 | 0.443 | 0.112 | 0.005 | −0.100 | 0.048 | 0.045 | 0.134 | −0.026 | 0.157 | 1.156 |
103 | 2.225 | 0.016 | 0.016 | 0.005 | −0.061 | 0.129 | −0.011 | −0.111 | 0.287 | 0.895 |
61 | 2.353 | 0.040 | 0.044 | −0.224 | −0.134 | 0.161 | 0.380 | −0.170 | 0.478 | 0.900 |
27 | −1.400 | 0.124 | 0.055 | 0.329 | 0.106 | −0.175 | −0.491 | 0.124 | −0.528 | 1.107 |
172 | −1.614 | 0.105 | 0.060 | 0.336 | 0.107 | −0.105 | −0.501 | −0.118 | −0.553 | 1.061 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | ||||||||||
Results for outliers and influential points
- Cook’s Distance and other influence diagnostics indicated no observations of concern.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.098 | 0.0945 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.969 | 0.0012 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- The Q-Q plot shows a minor violation of normality in the middle, so this should be checked.
Assessing collinearity with VIF
Term | VIF | Tolerance |
|---|---|---|
Mechanism_of_TBI | 1.208 | 0.828 |
Age | 1.235 | 0.810 |
Ethicity_short | 1.089 | 0.918 |
Extracranial_trauma | 1.102 | 0.908 |
VIF = Variance Inflation Factor. | ||
Collinearity results
- All VIF values are under three, indicating no collinearity issues.
- Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
−0.008 | 2.009 | 0.9800 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- The study design seems to be independent.
Assumption conclusions
The model met the assumptions of homoscedasticity and independence. However, linearity issues were observed for age and ethnicity. Ethnicity was treated as a continuous variable, which may be inappropriate given its nominal nature. Normality tests and the Q-Q plot suggested the residuals may not be normally distributed. No collinearity issues or influential outliers were identified.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for RPQ (full)
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 22.7822 | |||
Mechanism_of_TBI: | ||||
Assault – Non-assault | 5.708 | 5.7181 | 0.0101 | Not Reproduced |
Age | −0.1405 | |||
Ethicity_short | 2.8423 | |||
Extracranial_trauma: | ||||
Yes – No | 4.6786 | |||
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in Standard error
term | O_SE | R_SE | Change.SE | Reproduce.SE |
|---|---|---|---|---|
Intercept | 5.6607 | |||
Mechanism_of_TBI: | ||||
Assault – Non-assault | 3.495 | 3.4949 | −0.0001 | Reproduced |
Age | 0.0774 | |||
Ethicity_short | 2.8362 | |||
Extracranial_trauma: | ||||
Yes – No | 2.9709 | |||
O_SE = original standard error; R_SE = reproduced standard error; Change.SE = change in R_SE - O_SE; Reproduce.lower = standard error reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
Mechanism_of_TBI: | |||||
Assault – Non-assault | 0.105 | 0.1039 | −0.0011 | Not Reproduced | Remains non-sig, B same direction |
Age | 0.0716 | ||||
Ethicity_short | 0.3179 | ||||
Extracranial_trauma: | |||||
Yes – No | 0.1174 | ||||
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
The p-value for this model was not reproduced.
Conclusion computational reproducibility
This model was not computationally reproducible, although some aspects were partially reproduced such as Standard error. Minor discrepancies in regression coefficients and p-values suggest there may be small differences in the underlying data.
Methods
While the model was partially reproduced, it is likely that minor differences in the underlying data account for the remaining discrepancies. However, these differences are unlikely to have affected the model assumptions, and inferential reproducibility was assessed. The original model exhibited violations of normality and linearity, with 9.6% of RPQ_full values equal to zero. To account for the excess zeros and the non-normal distribution, a Tweedie model was fitted and compared to the reproduced model using AIC. Ethnicity was included as a categorical variable, and age was assessed for linearity. Age was standardized for the analysis, but RPQ_full was not standardized to maintain compatibility with the Tweedie distribution.
Results
Treating ethnicity as a continuous variable was not appropriate, as the differences between groups were not equally spaced and the relationship with the outcome was not linear. The categorical variable improved model fit, with a decrease in AIC of 3.13.
| Reproduced | Categorical Ethnicity | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
Characteristic | Beta | SE1 | Statistic | 95% CI1 | p-value | Beta | SE1 | Statistic | 95% CI1 | p-value |
(Intercept) | 16.66 | 3.74 | 4.45 | 9.27, 24.05 | <0.001 | 19.29 | 1.99 | 9.67 | 15.35, 23.23 | <0.001 |
Mechanism_of_TBI | ||||||||||
Non-assault | ||||||||||
Assault | 5.72 | 3.49 | 1.64 | −1.19, 12.62 | 0.104 | 4.49 | 3.49 | 1.29 | −2.41, 11.39 | 0.201 |
Extracranial_trauma | ||||||||||
No | ||||||||||
Yes | 4.68 | 2.97 | 1.57 | −1.19, 10.55 | 0.117 | 4.63 | 2.93 | 1.58 | −1.16, 10.42 | 0.116 |
z_Age | −2.86 | 1.58 | −1.81 | −5.98, 0.25 | 0.072 | −2.54 | 1.56 | −1.63 | −5.63, 0.55 | 0.106 |
Ethicity_short | 2.84 | 2.84 | 1.00 | −2.76, 8.45 | 0.318 | |||||
Ethicity | ||||||||||
White | ||||||||||
Asian | 10.72 | 4.49 | 2.39 | 1.84, 19.59 | 0.018 | |||||
Other | −2.20 | 6.61 | −0.33 | −15.26, 10.87 | 0.740 | |||||
1SE = Standard Error, CI = Confidence Interval | ||||||||||
Note: Reproduced model – Adjusted R² = 0.071; AIC = 1352.16; BIC = 1370.54 | ||||||||||
Categorical Ethnicity model – Adjusted R² = 0.095; AIC = 1349.03; BIC = 1370.47 | ||||||||||
Fitting a Tweedie model to account for zero’s, the estimated dispersion parameter (φ) was 4.70, indicating greater variability than would be expected under a Gaussian model, and the Tweedie power parameter (p) was 1.36, consistent with a compound Poisson–Gamma distribution. Compared to the linear model, the Tweedie model yielded a substantial AIC reduction of 68.96. Although inclusion of a squared age term marginally improved model fit (AIC and BIC) and residual diagnostics, these improvements were small, and the simpler model was retained.
In this model, the intercept represents the geometric mean of the RPQ_full score for patients who were white, had not been assaulted, had not experienced extracranial trauma, and were at the average age (since age was standardized). This corresponds to the expected RPQ_full score when all predictors are at their reference level or zero. The exponentiated coefficient for Mechanism_of_TBI indicates that patients who had been assaulted had an estimated 19% higher RPQ_full score compared to those who had not (p = 0.271). The coefficient for standardized age was 0.87, suggesting that the RPQ_full decreases by 13% as age increased by one standard deviation.
| Tweedie with linear age | Tweedie with quadratic age | ||||
|---|---|---|---|---|---|---|
Characteristic | exp(Beta) | 95% CI1 | p-value | exp(Beta) | 95% CI1 | p-value |
(Intercept) | 19.5 | 16.1, 23.5 | <0.001 | 24.4 | 19.1, 31.1 | <0.001 |
Mechanism_of_TBI | ||||||
Non-assault | — | — | — | — | ||
Assault | 1.19 | 0.87, 1.61 | 0.271 | 1.16 | 0.86, 1.57 | 0.339 |
Extracranial_trauma | ||||||
No | — | — | — | — | ||
Yes | 1.19 | 0.91, 1.56 | 0.202 | 1.14 | 0.88, 1.49 | 0.316 |
Ethicity | ||||||
White | — | — | — | — | ||
Asian | 1.41 | 0.98, 2.04 | 0.068 | 1.41 | 0.98, 2.03 | 0.062 |
Other | 0.90 | 0.48, 1.67 | 0.731 | 0.91 | 0.50, 1.68 | 0.770 |
z_Age | 0.87 | 0.75, 1.01 | 0.068 | 0.87 | 0.75, 1.01 | 0.074 |
z_Age2 | 0.78 | 0.66, 0.93 | 0.006 | |||
1CI = Confidence Interval | ||||||
Tweedie with linear age — Dispersion (φ): 4.70; Power (p): 1.360; AIC: 1283.20; BIC: 1307.70 | ||||||
Tweedie with quadratic age — Dispersion (φ): 4.62; Power (p): 1.350; AIC: 1277.59; BIC: 1305.16 | ||||||
The residual diagnostics for the final Tweedie model appeared satisfactory, with the DHARMa package producing appropriate residual plots to assess overdispersion, zero inflation, and uniformity. DHARMa residuals are based on the rank of each observed value among many simulated values from the fitted model. This approach produces discrete residual values (e.g., 0.25, 0.5, 0.75) that reflect the percentile position of the observed value within the simulated distribution.
Inferential Reproducibility conclusion
This model was not found to be inferentially reproducible, due to differences in model distribution and specification. A Tweedie distribution was identified as more appropriate for the data, given the presence of zero inflation and non-normality. In addition, model parameterisation differed, including treating categorical variables as continuous.
Model 2
Model results for IES
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | ||||||
Mechanism_of_TBI: | ||||||
Assault – Non-assault | 6.550 | 4.736 | 0.170 | |||
Age | ||||||
Ethicity_short | ||||||
Extracranial_trauma: | ||||||
Yes – No | ||||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for IES
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for IES
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Mechanism_of_TBI | |||||
Age | |||||
Ethicity_short | |||||
Extracranial_trauma | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results IES
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 8.742 | 7.982 | −7.081 | 24.564 | 1.095 | 0.2759 |
Mechanism_of_TBI: | ||||||
Assault – Non-assault | 6.512 | 4.737 | −2.877 | 15.901 | 1.375 | 0.1720 |
Age | −0.049 | 0.097 | −0.241 | 0.144 | −0.502 | 0.6169 |
Ethicity_short | 7.784 | 4.657 | −1.447 | 17.015 | 1.672 | 0.0975 |
Extracranial_trauma: | ||||||
Yes – No | 3.146 | 3.883 | −4.551 | 10.844 | 0.810 | 0.4196 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Model fit for IES
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.271 | 0.074 | 0.039 | 991.981 | 18.491 | 2.148 | 4 | 108 | 0.0798 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for IES
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Mechanism_of_TBI | 676.212 | 1 | 676.212 | 1.890 | 0.1720 |
Age | 90.033 | 1 | 90.033 | 0.252 | 0.6169 |
Ethicity_short | 999.480 | 1 | 999.480 | 2.794 | 0.0975 |
Extracranial_trauma | 234.865 | 1 | 234.865 | 0.657 | 0.4196 |
Residuals | 38,634.804 | 108 | 357.730 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
Mechanism_of_TBI | |||
Age | −1.163 | 0.2476 | No linearity violation |
Ethicity_short | −0.680 | 0.4980 | No linearity violation |
Extracranial_trauma | |||
Tukey test | 0.475 | 0.6348 | No linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
No linearity violation was observed in either plots or tests.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
0.177 | 0.9963 | 4 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
IES | 113 | 17.796 | 19.298 | 10.000 | 0.000 | 62.000 | 0.791 | −0.756 |
Mechanism_of_TBI* | 113 | 1.204 | 0.404 | 1.000 | 1.000 | 2.000 | 1.453 | 0.113 |
Age | 113 | 43.788 | 20.400 | 43.000 | 16.000 | 88.000 | 0.238 | −1.330 |
Ethicity_short | 113 | 1.124 | 0.404 | 1.000 | 1.000 | 3.000 | 3.359 | 10.943 |
Extracranial_trauma* | 113 | 1.354 | 0.480 | 1.000 | 1.000 | 2.000 | 0.603 | −1.651 |
.fitted | 113 | 17.796 | 5.239 | 16.408 | 12.239 | 37.633 | 1.652 | 2.629 |
.resid | 113 | −0.000 | 18.573 | −7.895 | −26.461 | 48.397 | 0.847 | −0.462 |
.leverage | 113 | 0.044 | 0.035 | 0.035 | 0.017 | 0.229 | 3.601 | 15.486 |
.sigma | 113 | 18.913 | 0.111 | 18.953 | 18.405 | 19.002 | −2.349 | 5.845 |
.cooksd | 113 | 0.010 | 0.019 | 0.004 | 0.000 | 0.142 | 4.911 | 28.171 |
.std.resid | 113 | −0.001 | 1.005 | −0.421 | −1.585 | 2.583 | 0.823 | −0.496 |
dfb.1_ | 113 | 0.000 | 0.095 | 0.001 | −0.415 | 0.421 | 0.350 | 5.360 |
dfb.M__T | 113 | 0.001 | 0.097 | 0.004 | −0.274 | 0.396 | 0.485 | 2.862 |
dfb.Age | 113 | 0.000 | 0.085 | −0.002 | −0.215 | 0.206 | 0.038 | −0.232 |
dfb.Eth_ | 113 | −0.001 | 0.118 | 0.001 | −0.766 | 0.678 | −0.859 | 23.013 |
dfb.Ex_Y | 113 | −0.000 | 0.097 | 0.011 | −0.209 | 0.324 | 0.388 | 0.753 |
dffit | 113 | −0.004 | 0.225 | −0.083 | −0.849 | 0.770 | 0.293 | 1.543 |
cov.r | 113 | 1.049 | 0.071 | 1.057 | 0.776 | 1.319 | −0.787 | 4.515 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.
DFBETA threshold
DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.M__T | dfb.Age | dfb.Eth_ | dfb.Ex_Y | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|---|---|
103 | 2.433 | 0.020 | 0.024 | −0.001 | −0.077 | 0.150 | 0.009 | −0.140 | 0.352 | 0.817 |
125 | −0.647 | 0.221 | 0.024 | 0.207 | 0.074 | −0.026 | −0.321 | −0.042 | −0.345 | 1.319 |
147 | 2.655 | 0.019 | 0.026 | 0.030 | −0.097 | 0.122 | 0.002 | −0.162 | 0.368 | 0.776 |
99 | 1.413 | 0.229 | 0.117 | −0.415 | 0.157 | 0.031 | 0.678 | −0.181 | 0.770 | 1.238 |
131 | −1.596 | 0.221 | 0.142 | 0.421 | 0.220 | 0.056 | −0.766 | −0.078 | −0.849 | 1.195 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | ||||||||||
Results for outliers and influential points
- While two observations tended to higher Cook’s values than the rest of the observations these were below 0.5 and dbetas and dffit were within reasonable range indicating the outliers are not substantially effecting results. Although the COVRATIO indicated there maybe some observations that may affect the precision of the parameter estimates.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.180 | 0.0014 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.890 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov test indicates residuals may not be normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot indicates the residuals are not normally distributed.
Assessing collinearity with VIF
Term | VIF | Tolerance |
|---|---|---|
Mechanism_of_TBI | 1.149 | 0.870 |
Age | 1.229 | 0.814 |
Ethicity_short | 1.107 | 0.903 |
Extracranial_trauma | 1.089 | 0.918 |
VIF = Variance Inflation Factor. | ||
Collinearity results
- All VIF values are under three, indicating no collinearity issues.
- Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
0.125 | 1.744 | 0.1700 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- The study design seems to be independent.
Assumption conclusions
The model met the assumptions of homoscedasticity and independence. However, linearity issues were observed for age and ethnicity. Ethnicity was treated as a continuous variable, which may be inappropriate given its nominal nature. Normality tests and the Q-Q plot suggested that the residuals were not normally distributed. A strong pattern in the residuals was observed, largely driven by the substantial proportion of zero values in the IES outcome variable (24%), which contributed to the non-normality. No collinearity issues or influential outliers were identified.
Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for IES
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 8.7417 | |||
Mechanism_of_TBI: | ||||
Assault – Non-assault | 6.55 | 6.5124 | −0.0376 | Not Reproduced |
Age | −0.0487 | |||
Ethicity_short | 7.7843 | |||
Extracranial_trauma: | ||||
Yes – No | 3.1465 | |||
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in Standard error
term | O_SE | R_SE | Change.SE | Reproduce.SE |
|---|---|---|---|---|
Intercept | 7.9823 | |||
Mechanism_of_TBI: | ||||
Assault – Non-assault | 4.736 | 4.7367 | 0.0007 | Incorrect Rounding |
Age | 0.0971 | |||
Ethicity_short | 4.6570 | |||
Extracranial_trauma: | ||||
Yes – No | 3.8833 | |||
O_SE = original standard error; R_SE = reproduced standard error; Change.SE = change in R_SE - O_SE; Reproduce.lower = standard error reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | 0.2759 | ||||
Mechanism_of_TBI: | |||||
Assault – Non-assault | 0.170 | 0.1720 | 0.0020 | Not Reproduced | Remains non-sig, B same direction |
Age | 0.6169 | ||||
Ethicity_short | 0.0975 | ||||
Extracranial_trauma: | |||||
Yes – No | 0.4196 | ||||
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
The p-value was not reproduced for this model.
Conclusion computational reproducibility
This model was not computationally reproducible, although some aspects were partially reproduced such as the standard error. Minor discrepancies in regression coefficients and p-values suggest there may be small differences in the underlying data.
Methods
While the model was partially reproduced, it is likely that minor differences in the underlying data account for the remaining discrepancies. However, these differences are unlikely to have affected the model assumptions, and inferential reproducibility was assessed. The original model exhibited violations of normality, with 24.0% of IES values equal to zero. To account for the excess zeros and the non-normal distribution, a Tweedie model was fitted and compared to the reproduced model using AIC. Ethnicity was included as a categorical variable, and age was assessed for linearity. Age was standardized for the analysis, but IES was not transformed to maintain compatibility with the Tweedie distribution.
Results
Although the AIC for categorical ethnicity was was slightly higher (1.51), this difference is small enough to be considered negligible. However, treating ethnicity as a continuous variable is not appropriate, as the group differences are not equally spaced. This leads to inaccurate estimation of both effect sizes and confidence intervals, even though the direction of the estimates remained consistent.
| Reproduced | Categorical Ethnicity | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
Characteristic | Beta | SE1 | Statistic | 95% CI1 | p-value | Beta | SE1 | Statistic | 95% CI1 | p-value |
(Intercept) | 6.62 | 5.59 | 1.18 | −4.46, 17.70 | 0.239 | 14.34 | 2.56 | 5.59 | 9.25, 19.42 | <0.001 |
Mechanism_of_TBI | ||||||||||
Non-assault | ||||||||||
Assault | 6.51 | 4.74 | 1.37 | −2.88, 15.90 | 0.172 | 6.14 | 4.78 | 1.28 | −3.34, 15.61 | 0.202 |
Extracranial_trauma | ||||||||||
No | ||||||||||
Yes | 3.15 | 3.88 | 0.81 | −4.55, 10.84 | 0.420 | 3.12 | 3.89 | 0.80 | −4.60, 10.84 | 0.425 |
z_Age | −0.99 | 1.98 | −0.50 | −4.91, 2.93 | 0.617 | −0.96 | 1.98 | −0.49 | −4.90, 2.97 | 0.629 |
Ethicity_short | 7.78 | 4.66 | 1.67 | −1.45, 17.02 | 0.098 | |||||
Ethicity | ||||||||||
White | ||||||||||
Asian | 11.55 | 7.25 | 1.59 | −2.81, 25.92 | 0.114 | |||||
Other | 11.17 | 11.36 | 0.98 | −11.35, 33.69 | 0.328 | |||||
1SE = Standard Error, CI = Confidence Interval | ||||||||||
Note: Reproduced model – Adjusted R² = 0.039; AIC = 991.98; BIC = 1008.35 | ||||||||||
Categorical Ethnicity model – Adjusted R² = 0.035; AIC = 993.49; BIC = 1012.59 | ||||||||||
Fitting a Tweedie model to account for zero’s, the estimated dispersion parameter (φ) was 6.32, indicating greater variability than would be expected under a Gaussian model, and the Tweedie power parameter (p) was 1.46, consistent with a compound Poisson–Gamma distribution. Compared to the linear model, the Tweedie model yielded a substantial AIC reduction of 159.43. Including a squared age term did not improve fit with similar AIC, therefore age was found to be linear.
In this model, the intercept represents the geometric mean of the IES score for patients who were white, had not been assaulted, had not experienced extracranial trauma, and were at the average age (since age was standardized). This corresponds to the expected IES score when all predictors are at their reference level or zero. The exponentiated coefficient for Mechanism_of_TBI indicates that patients who had been assaulted had an estimated 34% higher IES score compared to those who had not (p = 0.291). The coefficient for standardized age was 0.93, suggesting that the expected IES score decreases by 7% as age increases by one standard deviation, although this was not statistically significant.
| Tweedie with linear age | Tweedie with quadratic age | ||||
|---|---|---|---|---|---|---|
Characteristic | exp(Beta) | 95% CI1 | p-value | exp(Beta) | 95% CI1 | p-value |
(Intercept) | 14.6 | 10.7, 20.1 | <0.001 | 17.5 | 11.5, 26.5 | <0.001 |
Mechanism_of_TBI | ||||||
Non-assault | — | — | — | — | ||
Assault | 1.34 | 0.78, 2.31 | 0.291 | 1.36 | 0.79, 2.33 | 0.268 |
Extracranial_trauma | ||||||
No | — | — | — | — | ||
Yes | 1.17 | 0.73, 1.87 | 0.511 | 1.15 | 0.72, 1.83 | 0.568 |
Ethicity | ||||||
White | — | — | — | — | ||
Asian | 1.59 | 0.74, 3.40 | 0.235 | 1.62 | 0.76, 3.47 | 0.210 |
Other | 1.55 | 0.47, 5.07 | 0.471 | 1.69 | 0.51, 5.55 | 0.387 |
z_Age | 0.93 | 0.72, 1.19 | 0.563 | 0.96 | 0.74, 1.24 | 0.749 |
z_Age2 | 0.83 | 0.62, 1.11 | 0.204 | |||
1CI = Confidence Interval | ||||||
Tweedie with linear age — Dispersion (φ): 6.32; Power (p): 1.460; AIC: 832.55; BIC: 854.37 | ||||||
Tweedie with quadratic age — Dispersion (φ): 6.29; Power (p): 1.460; AIC: 832.94; BIC: 857.48 | ||||||
The residual diagnostics for the final Tweedie model showed mild deviations from the expected uniform distribution, but these were not substantial and did not indicate serious model misfit. The DHARMa package producing appropriate residual plots to assess overdispersion, zero inflation, and uniformity. DHARMa residuals are based on the rank of each observed value among many simulated values from the fitted model. This approach produces discrete residual values (e.g., 0.25, 0.5, 0.75) that reflect the percentile position of the observed value within the simulated distribution.
Inferential Reproducibility conclusion
This model was not found to be inferentially reproducible, due to differences in model distribution and specification. A Tweedie distribution was identified as more appropriate for the data, given the presence of zero inflation and non-normality. In addition, model parameterisation differed, as categorical variables as continuous.
Model 3
Model results for RPQ 13
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | ||||||
Mechanism_of_TBI: | ||||||
Assault – Non-assault | 4.462 | 2.4940 | 0.131 | |||
Age | ||||||
Ethicity_short | ||||||
Extracranial_trauma: | ||||||
Yes – No | ||||||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit Statistics RPQ 13
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for RPQ 13
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Mechanism_of_TBI | |||||
Age | |||||
Ethicity_short | |||||
Extracranial_trauma | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for RPQ 13
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 19.748 | 4.668 | 10.528 | 28.969 | 4.230 | <0.001 |
Mechanism_of_TBI: | ||||||
Assault – Non-assault | 4.466 | 2.940 | −1.340 | 10.273 | 1.519 | 0.1307 |
Age | −0.116 | 0.064 | −0.242 | 0.011 | −1.800 | 0.0737 |
Ethicity_short | 2.106 | 2.411 | −2.656 | 6.868 | 0.874 | 0.3837 |
Extracranial_trauma: | ||||||
Yes – No | 3.973 | 2.497 | −0.958 | 8.904 | 1.591 | 0.1135 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for RPQ 13
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.291 | 0.085 | 0.061 | 1,334.175 | 14.323 | 3.625 | 4 | 157 | 0.0074 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for RPQ 13
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Mechanism_of_TBI | 488.680 | 1 | 488.680 | 2.309 | 0.1307 |
Age | 686.116 | 1 | 686.116 | 3.241 | 0.0737 |
Ethicity_short | 161.535 | 1 | 161.535 | 0.763 | 0.3837 |
Extracranial_trauma | 536.103 | 1 | 536.103 | 2.533 | 0.1135 |
Residuals | 33,232.885 | 157 | 211.674 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Testing residuals for non linear relationships
Term | Statistic | p-value | Results |
|---|---|---|---|
Mechanism_of_TBI | |||
Age | −2.078 | 0.0393 | Linearity violation |
Ethicity_short | −2.184 | 0.0304 | Linearity violation |
Extracranial_trauma | |||
Tukey test | −0.044 | 0.9649 | No linearity violation |
Specification test for predictors using quadratic tests, for fitted values curvature is tested through Tukey's one-degree-of-freedom test for nonadditivity. | |||
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Linearity results
Plots and tests show an unaccounted non-linear relationship may be present.
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
1.167 | 0.8835 | 4 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- There is no distinct funnelling pattern observed, supporting homoscedasticity of residuals.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
RPQ 13 | 162 | 19.870 | 15.016 | 16.500 | 0.000 | 50.000 | 0.298 | −1.145 |
Mechanism_of_TBI* | 162 | 1.235 | 0.425 | 1.000 | 1.000 | 2.000 | 1.241 | −0.462 |
Age | 162 | 41.920 | 19.597 | 39.000 | 16.000 | 88.000 | 0.359 | −1.126 |
Ethicity_short | 162 | 1.198 | 0.496 | 1.000 | 1.000 | 3.000 | 2.484 | 5.289 |
Extracranial_trauma* | 162 | 1.352 | 0.479 | 1.000 | 1.000 | 2.000 | 0.615 | −1.632 |
.fitted | 162 | 19.870 | 4.366 | 19.680 | 11.682 | 30.088 | 0.184 | −0.732 |
.resid | 162 | −0.000 | 14.367 | −3.021 | −28.098 | 32.428 | 0.335 | −0.845 |
.leverage | 162 | 0.031 | 0.019 | 0.027 | 0.012 | 0.124 | 2.667 | 8.315 |
.sigma | 162 | 14.549 | 0.051 | 14.567 | 14.359 | 14.596 | −1.590 | 2.178 |
.cooksd | 162 | 0.007 | 0.010 | 0.003 | 0.000 | 0.055 | 2.756 | 8.629 |
.std.resid | 162 | −0.001 | 1.003 | −0.215 | −1.981 | 2.247 | 0.325 | −0.848 |
dfb.1_ | 162 | 0.000 | 0.081 | 0.000 | −0.210 | 0.329 | 0.818 | 3.086 |
dfb.M__T | 162 | 0.000 | 0.087 | 0.002 | −0.331 | 0.372 | −0.101 | 3.521 |
dfb.Age | 162 | 0.000 | 0.073 | 0.000 | −0.220 | 0.264 | 0.107 | 0.976 |
dfb.Eth_ | 162 | −0.001 | 0.091 | 0.000 | −0.482 | 0.355 | −1.456 | 11.851 |
dfb.Ex_Y | 162 | 0.000 | 0.079 | 0.005 | −0.266 | 0.178 | −0.370 | 0.573 |
dffit | 162 | −0.005 | 0.183 | −0.032 | −0.526 | 0.455 | −0.075 | 0.050 |
cov.r | 162 | 1.033 | 0.041 | 1.042 | 0.891 | 1.157 | −0.725 | 1.862 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a theshold of 0.5 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potiential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.
DFBETA threshold
DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.M__T | dfb.Age | dfb.Eth_ | dfb.Ex_Y | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|---|---|
21 | 0.331 | 0.111 | 0.003 | −0.075 | 0.035 | 0.033 | 0.100 | −0.021 | 0.117 | 1.157 |
57 | 2.227 | 0.014 | 0.014 | 0.172 | −0.131 | −0.092 | −0.045 | −0.154 | 0.266 | 0.896 |
103 | 2.277 | 0.016 | 0.016 | −0.001 | −0.059 | 0.139 | −0.010 | −0.110 | 0.291 | 0.891 |
27 | −1.375 | 0.124 | 0.053 | 0.329 | 0.108 | −0.171 | −0.482 | 0.126 | −0.518 | 1.110 |
172 | −1.537 | 0.105 | 0.055 | 0.323 | 0.107 | −0.096 | −0.477 | −0.109 | −0.526 | 1.070 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | ||||||||||
Results for outliers and influential points
- Cook’s Distance and other influence diagnostics indicated no observations of concern.. Although the COVRATIO indicated there maybe some observations that may affect the precision of the parameter estimates.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.095 | 0.1106 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.968 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- The Q-Q plot shows a minor violation of normality in the middle, so this should be checked.
Assessing collinearity with VIF
Term | VIF | Tolerance |
|---|---|---|
Mechanism_of_TBI | 1.187 | 0.842 |
Age | 1.204 | 0.831 |
Ethicity_short | 1.090 | 0.918 |
Extracranial_trauma | 1.088 | 0.919 |
VIF = Variance Inflation Factor. | ||
Collinearity results
- All VIF values are under three, indicating no collinearity issues.
- Overall, when taking into account VIF and SE, the model does not have collinearity issues.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
−0.041 | 2.075 | 0.6280 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- The study design seems to be independent.
Assumption conclusions
The model met the assumptions of homoscedasticity and independence. However, linearity issues were observed for age and ethnicity. Ethnicity was treated as a continuous variable, which may be inappropriate given its nominal nature. Normality tests and the Q-Q plot suggested the residuals may not be normally distributed. No collinearity issues or influential outliers were identified.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for RPQ 13
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 19.7483 | |||
Mechanism_of_TBI: | ||||
Assault – Non-assault | 4.462 | 4.4665 | 0.0045 | Not Reproduced |
Age | −0.1156 | |||
Ethicity_short | 2.1061 | |||
Extracranial_trauma: | ||||
Yes – No | 3.9731 | |||
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in Standard error
term | O_SE | R_SE | Change.SE | Reproduce.SE |
|---|---|---|---|---|
Intercept | 4.6683 | |||
Mechanism_of_TBI: | ||||
Assault – Non-assault | 2.494 | 2.9396 | 0.4456 | Not Reproduced |
Age | 0.0642 | |||
Ethicity_short | 2.4109 | |||
Extracranial_trauma: | ||||
Yes – No | 2.4965 | |||
O_SE = original standard error; R_SE = reproduced standard error; Change.SE = change in R_SE - O_SE; Reproduce.lower = standard error reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
Mechanism_of_TBI: | |||||
Assault – Non-assault | 0.131 | 0.1307 | −0.0003 | Reproduced | Remains non-sig, B same direction |
Age | 0.0737 | ||||
Ethicity_short | 0.3837 | ||||
Extracranial_trauma: | |||||
Yes – No | 0.1135 | ||||
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
The P-value was reproduced for this model.
Conclusion computational reproducibility
This model was not computationally reproducible, although some aspects were partially reproduced, such as p-values. Minor discrepancies in regression coefficients and standard errors suggest small differences in the underlying data.
Methods
While the model was partially reproduced, it is likely that minor differences in the underlying data account for the remaining discrepancies. However, these differences are unlikely to have affected the model assumptions, and inferential reproducibility was assessed. The original model exhibited violations of normality and linearity, with 11.0% of RPQ13 values equal to zero. To account for the excess zeros and the non-normal distribution, a Tweedie model was fitted and compared to the reproduced model using AIC. Ethnicity was included as a categorical variable, and age was assessed for linearity. Age was standardized for the analysis, but RPQ13 was not transformed to maintain compatibility with the Tweedie distribution.
Results
Treating ethnicity as a continuous variable was not appropriate, as the differences between groups were not equally spaced and the relationship with the outcome was not linear. As categorical variable there was a small decrease in AIC of 2.88.
| Reproduced | Categorical Ethnicity | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
Characteristic | Beta | SE1 | Statistic | 95% CI1 | p-value | Beta | SE1 | Statistic | 95% CI1 | p-value |
(Intercept) | 14.71 | 3.13 | 4.70 | 8.53, 20.89 | <0.001 | 16.66 | 1.64 | 10.15 | 13.42, 19.91 | <0.001 |
Mechanism_of_TBI | ||||||||||
Non-assault | ||||||||||
Assault | 4.47 | 2.94 | 1.52 | −1.34, 10.27 | 0.131 | 3.42 | 2.94 | 1.16 | −2.39, 9.24 | 0.247 |
Extracranial_trauma | ||||||||||
No | ||||||||||
Yes | 3.97 | 2.50 | 1.59 | −0.96, 8.90 | 0.114 | 3.91 | 2.47 | 1.59 | −0.96, 8.79 | 0.115 |
z_Age | −2.35 | 1.31 | −1.80 | −4.94, 0.23 | 0.074 | −2.11 | 1.30 | −1.62 | −4.67, 0.46 | 0.107 |
Ethicity_short | 2.11 | 2.41 | 0.87 | −2.66, 6.87 | 0.384 | |||||
Ethicity | ||||||||||
White | ||||||||||
Asian | 8.64 | 3.82 | 2.26 | 1.09, 16.19 | 0.025 | |||||
Other | −2.35 | 5.63 | −0.42 | −13.48, 8.78 | 0.677 | |||||
1SE = Standard Error, CI = Confidence Interval | ||||||||||
Note: Reproduced model – Adjusted R² = 0.061; AIC = 1334.18; BIC = 1352.70 | ||||||||||
Categorical Ethnicity model – Adjusted R² = 0.083; AIC = 1331.29; BIC = 1352.91 | ||||||||||
Fitting a Tweedie model to account for zero’s, the estimated dispersion parameter (φ) was 4.47, indicating greater variability than would be expected under a Gaussian model, and the Tweedie power parameter (p) was 1.35, consistent with a compound Poisson–Gamma distribution. Compared to the linear model, the Tweedie model yielded a substantial AIC reduction of 69.25. Inclusion of a squared age term marginally improved model fit (AIC and BIC) and residual diagnostics, these improvements were small, and the simpler model was retained.
In this model, the intercept represents the geometric mean of the RPQ13 score for patients who were white, had not been assaulted, had not experienced extracranial trauma, and were at the average age (since age was standardized). This corresponds to the expected RPQ13 score when all predictors are at their reference level or zero. The exponentiated coefficient for Mechanism_of_TBI indicates that patients who had been assaulted had an estimated 17% higher RPQ13 score compared to those who had not (p = 0.321). The coefficient for standardized age was 0.88, suggesting that the expected RPQ13 score decreases by 12% as age increases by one standard deviation, although this was not statistically significant.
| Tweedie with linear age | Tweedie with quadratic age | ||||
|---|---|---|---|---|---|---|
Characteristic | exp(Beta) | 95% CI1 | p-value | exp(Beta) | 95% CI1 | p-value |
(Intercept) | 16.8 | 14.0, 20.1 | <0.001 | 20.8 | 16.3, 26.5 | <0.001 |
Mechanism_of_TBI | ||||||
Non-assault | — | — | — | — | ||
Assault | 1.17 | 0.86, 1.58 | 0.321 | 1.14 | 0.84, 1.54 | 0.409 |
Extracranial_trauma | ||||||
No | — | — | — | — | ||
Yes | 1.19 | 0.91, 1.55 | 0.195 | 1.14 | 0.88, 1.49 | 0.319 |
Ethicity | ||||||
White | — | — | — | — | ||
Asian | 1.39 | 0.96, 2.02 | 0.084 | 1.39 | 0.96, 2.01 | 0.080 |
Other | 0.88 | 0.47, 1.64 | 0.679 | 0.89 | 0.48, 1.65 | 0.707 |
z_Age | 0.88 | 0.76, 1.01 | 0.077 | 0.87 | 0.75, 1.01 | 0.075 |
z_Age2 | 0.80 | 0.67, 0.95 | 0.012 | |||
1CI = Confidence Interval | ||||||
Tweedie with linear age — Dispersion (φ): 4.47; Power (p): 1.350; AIC: 1264.93; BIC: 1289.63 | ||||||
Tweedie with quadratic age — Dispersion (φ): 4.40; Power (p): 1.340; AIC: 1260.53; BIC: 1288.31 | ||||||
The residual diagnostics for the final Tweedie model appeared satisfactory, with the DHARMa package producing appropriate residual plots to assess overdispersion, zero inflation, and uniformity. DHARMa residuals are based on the rank of each observed value among many simulated values from the fitted model. This approach produces discrete residual values (e.g., 0.25, 0.5, 0.75) that reflect the percentile position of the observed value within the simulated distribution.
Inferential Reproducibility conclusion
This model was not found to be inferentially reproducible, due to differences in model distribution and specification. A Tweedie distribution was identified as more appropriate for the data, given the presence of zero inflation and non-normality. In addition, model parameterisation differed, including treating categorical variables as continuous.