Paper 19: Associations between industry involvement and study characteristics at the time of trial registration in biomedical research
References
Seidler AL, Hunter KE, Chartres N, Askie LM (2019) Associations between industry involvement and study characteristics at the time of trial registration in biomedical research. PLoS ONE 14(9): e0222117. https://doi.org/10.1371/journal.pone.0222117
Disclosure
This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.
Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.
Methods used in our reproducibility analyses
There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.
The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.
To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.
For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.
Results from the reproduction of the Seidler et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.
Summary from statistical review
This study examined associations between industry involvement and study characteristics of trial registration in biomedical research. Linear regression was a minor part of the paper, focusing on the mean difference for target sample size and industry involvement. There was poor interpretation of the regression coefficient, rather than saying the target sample size mean difference for industry was smaller MD = -153 (95% CI = -231 to -74, p < 0.001) than without industry involvement. Authors stated, ‘The mean difference between trials with and without industry involvement was -153 (95% CI = -231 to -74, p < .001)’. Although this seems relatively minor, it impedes interpretation. Assumptions were not discussed.
Data availability and software used
The data were provided in the supporting information as an Excel file in both long and wide formats, with no data dictionary. All analyses were performed using R.
Regression sample
There was one linear regression reported with target sample size as the outcome variable, therefore no randomisation of models was required.
Computational reproducibility results
The results for this paper were mostly reproduced. A summary of results were presented in text (coefficient and 95% confidence interval), with more decimal places presented in the Supporting information, which was used for reproducing results. These results were moderately difficult to reproduce because the authors provided raw data that required manipulating and merging four datasheets into both long and wide formatted data. Understanding how to reproduce the required reading both the paper and the supporting information, as six outliers were excluded and were not mentioned in the main text of the paper. Original results found that the target sample size mean difference for the industry was smaller, MD = -153 (95% CI = -231 to -74, p < 0.001). There was a mistake (typo) in the supplementary results for the upper confidence interval with reporting of -74.98, which, when reproduced, was -73.987, and correctly reported in the text of the paper as -74.
Inferential reproducibility results
The regression in this paper was not inferentially reproducible, with gross violations of normality and heteroscedasticity. Removing outliers did not address the skewed distribution of target sample size, as indicated by large discrepancies between means and medians and SDs greatly exceeding IQRs (e.g., no industry: mean = 249, 95% CI 211–295, SD = 689; median = 70, IQR = 125; industry: mean = 96, 95% CI 26–166, SD = 198; median = 45, IQR = 76). Outlier removal reduced the variance but risked bias, as large sample sizes are often required when rare events occur or small effects are of interest.
Target sample size was standardised and a heteroscedastic bootstrap model was fitted, which showed a standardised change in confidence interval range of –0.1. While bootstrapping provided a comparison on the original additive scale, the clear mean–variance relationship (variance increasing with mean) indicated multiplicative rather than additive error. Alternative distributions were therefore considered. The negative binomial and gamma models improved fit , but residuals continued to show overdispersion. The inverse Gaussian model provided the best fit, with an AIC of 4,784 lower than linear regression (a AIC difference of 10 is considered large), and produced narrower confidence intervals than the negative binomial and gamma models, suggesting greater precision. This model estimated the mean sample size was 61% lower with industry involvement (ratio = 0.39, 95% CI 0.30–0.50), with geometric means of 249 (95% CI 211–295) for no industry involvement and 96 (95% CI 78–118) for industry involvement. Notably, the confidence interval for industry involvement under the inverse Gaussian model was much narrower than under linear regression, reflecting improved fit and greater precision in estimating the mean.
Recommended changes
- Outliers should not be removed. Instead, the underlying data distribution should be modelled more appropriately, for example using an inverse Gaussian model.
- Consider creating a reproducible analysis workflow and sharing the code.
- Include a data dictionary.
- As the dataset contained many additional variables in wide and long formats, the authors should provide a minimal dataset including only the variables used in the paper.
- Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.
- Potential clusters identified in the data with some organisations contributing multiple sample sizes, this should be investigated using mixed models.
Model 1
Model results for Target Sample Size
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | ||||||
Sponsor_Type: | ||||||
Any industry involvement – No industry involvement | −152.99 | −231.99 | −74.98 | <0.001 | ||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Target Sample Size
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Target Sample Size
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Sponsor_Type | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for Target Sample Size
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 249.145 | 18.435 | 212.981 | 285.308 | 13.515 | <0.001 |
Sponsor_Type: | ||||||
Any industry involvement – No industry involvement | −152.991 | 40.274 | −231.994 | −73.987 | −3.799 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for Target Sample Size
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.100 | 0.010 | 0.009 | 22,400.217 | 618.730 | 14.430 | 1 | 1425 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for Target Sample Size
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Sponsor_Type | 5,532,050.506 | 1 | 5,532,050.506 | 14.430 | <0.001 |
Residuals | 546,293,544.369 | 1425 | 383,363.891 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
5.368 | 0.0205 | 1 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test indicates heteroscedasticity.
- The variance was substantially larger in the no industry involment group, which can be seen visially in plots.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
Target Sample Size | 1427 | 217.088 | 622.073 | 60.000 | 1.000 | 6,800.000 | 6.863 | 54.913 |
Sponsor_Type* | 1427 | 1.790 | 0.407 | 2.000 | 1.000 | 2.000 | −1.426 | 0.033 |
.fitted | 1427 | 217.088 | 62.285 | 249.145 | 96.154 | 249.145 | −1.426 | 0.033 |
.resid | 1427 | −0.000 | 618.947 | −149.145 | −247.145 | 6,550.855 | 6.829 | 54.701 |
.leverage | 1427 | 0.001 | 0.001 | 0.001 | 0.001 | 0.003 | 1.426 | 0.033 |
.sigma | 1427 | 619.162 | 1.662 | 619.363 | 594.534 | 619.381 | −10.785 | 125.432 |
.cooksd | 1427 | 0.000 | 0.003 | 0.000 | 0.000 | 0.050 | 10.414 | 117.972 |
.std.resid | 1427 | −0.000 | 1.000 | −0.241 | −0.399 | 10.585 | 6.829 | 54.696 |
dfb.1_ | 1427 | −0.000 | 0.009 | −0.000 | −0.009 | 0.188 | 14.536 | 263.958 |
dfb.S_ii | 1427 | 0.000 | 0.016 | −0.003 | −0.168 | 0.150 | 3.171 | 47.966 |
dffit | 1427 | 0.000 | 0.031 | −0.007 | −0.012 | 0.328 | 6.644 | 51.734 |
cov.r | 1427 | 1.001 | 0.010 | 1.002 | 0.851 | 1.005 | −10.397 | 118.087 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if an observation is removed. Potiential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a theshold of 0.5 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potiential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.
DFBETA threshold
DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each paremeter in the model. Potiential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentised residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.S_ii | dffit | cov.r |
|---|---|---|---|---|---|---|---|
4 | 0.039 | 0.003 | 0.000 | 0.002 | −0.002 | 0.002 | 1.005 |
16 | 0.310 | 0.003 | 0.000 | 0.018 | −0.016 | 0.018 | 1.005 |
1188 | 10.694 | 0.001 | 0.047 | 0.000 | 0.146 | 0.319 | 0.859 |
832 | 11.023 | 0.001 | 0.050 | 0.000 | 0.150 | 0.328 | 0.851 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||
Results for outliers and influential points
- Although the largest studentised residual was 11, Cook’s distance, DFBETAS, and DFFITS were within conventional thresholds, indicating the outliers were not influential on the mean. The covratio indicated several values that may indicate inflation of the variance.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.345 | <0.001 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.328 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov test indicates residuals may not be normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot indicates the residuals are not normally distributed.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
0.061 | 1.878 | 0.0540 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- The study design is not independent and should be assessed using linear mixed models or generalized estimating equations.
Assumption conclusions
Assumptions of linear regression were not satisfied: residuals showed marked non-normality and heteroscedasticity, and a clear mean–variance relationship suggested a multiplicative error structure, indicating that alternative distributions should be considered. Independence also appeared questionable, with potential clustering where some organisations contributed multiple observations; this should be examined using mixed-effects models.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for Target Sample Size
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 249.1445 | |||
Sponsor_Type: | ||||
Any industry involvement – No industry involvement | −152.99 | −152.9907 | −0.0007 | Reproduced |
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in lower 95% confidence intervals for coefficients
term | O_lower | R_lower | Change.lci | Reproduce.lower |
|---|---|---|---|---|
Intercept | 212.9812 | |||
Sponsor_Type: | ||||
Any industry involvement – No industry involvement | −231.99 | −231.9938 | −0.0038 | Reproduced |
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced. | ||||
Change in upper 95% confidence intervals for coefficients
term | O_upper | R_upper | Change.uci | Reproduce.upper |
|---|---|---|---|---|
Intercept | 285.3078 | |||
Sponsor_Type: | ||||
Any industry involvement – No industry involvement | −74.98 | −73.9875 | 0.9925 | Not Reproduced |
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | <0.001 | ||||
Sponsor_Type: | |||||
Any industry involvement – No industry involvement | <0.001 | <0.001 | 0.0000 | Reproduced | Remains sig, B same direction |
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison. | |||||
Results for p-values
The p-value for this model was reproduced.
Conclusion computational reproducibility
This model was mostly computationally reproducible. The regression coefficient and p-value were reproduced, with a typographical error identified in the upper confidence interval.
Methods
While the model was mostly computationally reproduced, the residuals showed substantial non-normality and heteroscedasticity. The original aim was to estimate the mean difference in target sample size by sponsor type; since linear regression is equivalent to a two-sample t-test in this setting, a rank-based alternative such as the Mann–Whitney U test could provide a simple comparison. However, potential clustering by organisation may also have been present. This was not assessed in the current sensitivity analysis, as the documentation was unclear, but ideally the authors should have considered a mixed-effects model to account for such clustering.
Target sample size was standardised and a heteroscedastic bootstrap model was fitted. While bootstrapping provides a comparison on the original additive scale, the mean–variance relationship warranted further modelling because the outcome was a count-like variable and initial descriptive statistics indicated overdispersion (with variance exceeding the mean). Therefore, comparisons of alternative distributions were conducted on the unstandardised outcome.
A negative binomial model was first fitted; however, residual diagnostics continued to show evidence of dispersion, and visual inspection of fitted values revealed a clear mean–variance relationship. To better capture this, other candidate distributions were considered: a gamma model, which assumes a quadratic mean–variance relationship, and an inverse Gaussian model, which allows for a cubic mean–variance relationship. Model performance was then evaluated using the Akaike Information Criterion (AIC) to determine the best-fitting distribution.
Results
The regression coefficients were consistent between the bootstrap and reproduced models, indicating no difference in point estimates. However, the upper and lower confidence intervals shifted by more than 10%, although the shifts did not exceed the threshold of ±0.1.. The difference in standardised confidence interval range width was –0.1, with the bootstrap interval being 42% narrower than that of the reproduced model, indicating the model is not inferentially reproducible.
Boostrapping results
Wild bootstrapping was performed with 10,000 iterations.
Change in regression coefficients
Term | B | boot.B | B_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Sponsor_Type1 | −0.2459 | −0.2458 | −0.0001 | −0.0500 | No | No | No |
B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in lower 95% confidence interval
Term | Lower | boot.Lower | Lower_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Sponsor_Type1 | −0.3729 | −0.3216 | −0.0513 | −13.7600 | Yes | No | No |
Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in upper 95% confidence interval
Term | Upper | boot.Upper | Upper_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Sponsor_Type1 | −0.1189 | −0.1718 | 0.0528 | 44.4300 | Yes | No | No |
Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in Range of 95% confidence interval
Term | Range | boot.Range | Range_Diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Sponsor_Type1 | 0.2540 | 0.1498 | −0.1042 | −41.0200 | Yes | Yes | No |
Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in p-value significance and regression coefficient direction
Term | p-value | boot.p-value | changep | SigChangeDirection |
|---|---|---|---|---|
Sponsor_Type1 | 0.0002 | 0.0000 | 0.0002 | Remains sig, B same direction |
p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model. | ||||
Check distribution of bootstrap estimates
The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.
Comparison of distributions
The linear model had an AIC of 22400. A negative binomial model improved fit (AIC 17757), but DHARMa residual diagnostics indicated deviations from ideal assumptions, including overdispersion, outliers, and a lack-of-fit signal. A gamma model was then fitted; although overdispersion and outliers were no longer evident, its AIC was higher (19239) and the residual diagnostic worsened, indicating misspecification. Finally, an inverse Gaussian model achieved the lowest AIC (17616). Its diagnostics improved, with outliers and overdispersion no longer an issue; a slight residual departure from the target distribution remained, but the inverse Gaussian was retained as the best overall fit (inverse Gaussian residuals shown below). Because outliers and potential organisational clustering were not formally evaluated, the final choice of distribution should be based on models that explicitly account for these features.
Note on “normality” in DHARMa: for GLMs (e.g., negative binomial, gamma, inverse Gaussian), residuals are not expected to be Gaussian. DHARMa uses simulated, scaled residuals that should be uniform on (0,1) under a correctly specified model. Apparent “non-normality” in this context means non-uniformity of these residuals (e.g., a significant KS test or QQ plot deviations), which signals potential misspecification (wrong mean–variance form, link, or missing structure), not a violation of Gaussian residual assumptions.
| Characteristic |
Linear (AIC = 22400)
|
Negative Binomial (AIC = 17757)
|
Gamma (AIC = 19239)
|
Inverse Gaussian (AIC = 17616)
|
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Beta | 95% CI1 | p-value | Ratio1 | 95% CI1 | p-value | Ratio | 95% CI1 | p-value | Ratio | 95% CI1 | p-value | |
| (Intercept) | 249 | 213, 285 | <0.001 | 249 | 231, 269 | <0.001 | 249 | 214, 291 | <0.001 | 249 | 210, 295 | <0.001 |
| Sponsor_Type1 | -153 | -232, -74 | <0.001 | 0.39 | 0.33, 0.46 | <0.001 | 0.39 | 0.28, 0.54 | <0.001 | 0.39 | 0.30, 0.50 | <0.001 |
| 1 CI = Confidence Interval, IRR = Incidence Rate Ratio | ||||||||||||
Residuals for the inverse gaussian model
Inferential reproducibility conclusion
This model was not inferentially reproducible: residuals were heteroscedastic, grossly non-normal, and demonstrated a clear mean–variance relationship indicative of multiplicative error. The bootstrap model showed a standardised change the confidence interval range of –0.1, whereas an inverse Gaussian model substantially improved fit, reducing the AIC by 4,784.