Paper 23: Quantitative assessment of fecal contamination in multiple environmental sample types in urban communities in Dhaka, Bangladesh using SaniPath microbial approach
References
Amin N, Rahman M, Raj S, Ali S, Green J, Das S, et al. (2019) Quantitative assessment of fecal contamination in multiple environmental sample types in urban communities in Dhaka, Bangladesh using SaniPath microbial approach. PLoS ONE 14(12): e0221193. https://doi.org/10.1371/journal.pone.0221193
Disclosure
This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.
Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.
Methods used in our reproducibility analyses
There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.
The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.
To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.
For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.
Results from the reproduction of the Amin et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.
Summary from statistical review
This paper explores faecal contamination in multiple environmental sample types in Bangladesh. The authors state they used generalised linear models; however, they do not state the link function or family. In Stata, the Gaussian (normal) distribution is the default. Although it may be a Poisson model, this is unlikely as the authors log 10 to transform the dependent variable, which would not be done before performing a Poisson model. The complex study design should have discussed whether observations in the model could be correlated due to geographic regions and sampling strategy. The authors state generalised linear models, but the tables have no identification of what type of analysis was used and did not describe if the modelling was all univariate or multivariable. There was no continuous variables in the model, so linearity is not required.
Data availability and software used
The authors reported that all relevant data are within the paper and its Supporting Information files. Analyses were performed using Stata with a dta file in long format provided in the supporting information.
Regression sample
There were at least 35 linear regression models, three of which were randomly chosen. Univariate models measured Ecoli from different sources of drain water, water from produce and drinking water. Models one and two compared neighbourhood type, and model three compared different sources of drinking water.
Computational reproducibility results
The analyses were mostly computationally reproducible, with differences from rounding errors. The data were in long format by sample type, which required some formatting for analyses but was manageable to reproduce.
Inferential reproducibility results
The study was geographically clustered; accordingly, linear mixed models (LMMs) with a random intercept for neighbourhood were fitted to assess correlation. Residual diagnostics indicated non-normality for all three outcomes. For drain water, the ICC was low (0.06), and the LMM had a higher AIC than the linear model. For produce water, no random-intercept variance could be estimated, indicating no neighbourhood correlation. In contrast, for drinking water, the ICC was higher (0.24) and the random-intercept model reduced the AIC by 47 units (10 is a large difference) relative to the linear model. For comparison to the reproduced results, LMMs were bootstrapped for drain and drinking water, whereas the linear model was bootstrapped for produce water, given the absence of a cluster effect. Results for drain and produce were inferentially reproducible with bootstrap changes <0.1; for drinking water, the CI range differed by 0.16 and was not inferentially reproducible, leading to the paper being assessed as not inferentially reproducible.
Recommended Changes
- A neighbourhood-based sampling strategy should be accounted for by fitting linear mixed models with a random intercept for neighbourhood to accommodate within-neighbourhood correlation.
- Consider creating a reproducible analysis workflow and sharing the code.
- Ensure that the modelling process is clearly described and applied consistently throughout the paper. The type of model and its key specifications should be included in the footnotes of the tables to improve transparency.
- Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.
Model 1
Model results for e_coli drain water
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 6.95 | |||||
neighborhood_type: | ||||||
Low – High | 0.16 | −0.25 | 0.56 | 0.05 | ||
Floating – High | −0.49 | −0.98 | −0.00 | 0.05 | ||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for e_coli drain water
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for e_coli drain water
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
neighborhood_type | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for e_coli drain water
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 6.955 | 0.143 | 6.670 | 7.239 | 48.541 | <0.001 |
neighborhood_type: | ||||||
Low – High | 0.154 | 0.203 | −0.248 | 0.556 | 0.761 | 0.4485 |
Floating – High | −0.493 | 0.248 | −0.985 | −0.000 | −1.985 | 0.0500 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for e_coli drain water
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.258 | 0.066 | 0.047 | 269.031 | 0.892 | 3.452 | 2 | 97 | 0.0356 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for e_coli drain water
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
neighborhood_type | 5.668 | 2 | 2.834 | 3.452 | 0.0356 |
Residuals | 79.647 | 97 | 0.821 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
2.571 | 0.2765 | 2 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- No substantial evidence of heteroscedasticity was detected across groups.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
e_coli drain water | 100 | 6.918 | 0.928 | 7.046 | 4.699 | 8.714 | −0.723 | −0.174 |
neighborhood_type* | 100 | 1.800 | 0.752 | 2.000 | 1.000 | 3.000 | 0.338 | −1.190 |
.fitted | 100 | 6.918 | 0.239 | 6.955 | 6.462 | 7.109 | −1.177 | −0.205 |
.resid | 100 | −0.000 | 0.897 | 0.132 | −2.410 | 1.759 | −0.610 | 0.022 |
.leverage | 100 | 0.030 | 0.010 | 0.025 | 0.025 | 0.050 | 1.478 | 0.185 |
.sigma | 100 | 0.906 | 0.007 | 0.909 | 0.876 | 0.911 | −2.143 | 4.706 |
.cooksd | 100 | 0.011 | 0.016 | 0.003 | 0.000 | 0.062 | 1.558 | 1.136 |
.std.resid | 100 | −0.000 | 1.006 | 0.148 | −2.693 | 1.966 | −0.603 | 0.009 |
dfb.1_ | 100 | −0.000 | 0.096 | 0.000 | −0.415 | 0.320 | −1.153 | 6.800 |
dfb.ng_L | 100 | −0.000 | 0.097 | 0.000 | −0.315 | 0.294 | −0.268 | 1.921 |
dfb.ng_F | 100 | 0.000 | 0.117 | 0.000 | −0.313 | 0.334 | 0.084 | 2.053 |
dffit | 100 | −0.000 | 0.187 | 0.024 | −0.446 | 0.409 | −0.427 | 0.027 |
cov.r | 100 | 1.032 | 0.045 | 1.045 | 0.838 | 1.086 | −2.214 | 5.289 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.
DFBETA threshold
DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.ng_L | dfb.ng_F | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|
6 | −0.049 | 0.050 | 0.000 | 0.000 | 0.000 | −0.009 | −0.011 | 1.086 |
5 | −1.671 | 0.050 | 0.048 | 0.000 | 0.000 | −0.313 | −0.383 | 0.996 |
70 | −2.594 | 0.025 | 0.054 | −0.415 | 0.294 | 0.240 | −0.415 | 0.863 |
21 | 1.783 | 0.050 | 0.055 | −0.000 | −0.000 | 0.334 | 0.409 | 0.985 |
41 | −2.786 | 0.025 | 0.062 | 0.000 | −0.315 | 0.000 | −0.446 | 0.838 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | ||||||||
Results for outliers and influential points
The Cook’s values were below 0.5 and dbetas and dffit were within reasonable range indicating the outliers are not substantially effecting results. The COVRATIO suggested that certain observations may influence the precision of the estimates.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.100 | 0.2673 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.964 | 0.0081 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- The Q-Q-plot suggest there maybe minor normality issues.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
−0.079 | 2.121 | 0.5440 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- The study design is not independent and should be assessed using linear mixed models or generalized estimating equations.
Assumption conclusions
The model was found to meet the assumption of homoscedasticity. As there were no continuous predictors, linearity was not relevant. Normality tests and the Q–Q plot suggested that the residuals may not be normally distributed. Although no evidence of autocorrelation was observed, the study design involved geographic clustering, warranting a check for residual correlation within clusters.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for e_coli drain water
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 6.95 | 6.9546 | 0.0046 | Reproduced |
neighborhood_type: | ||||
Low – High | 0.16 | 0.1542 | −0.0058 | Incorrect Rounding |
Floating – High | −0.49 | −0.4926 | −0.0026 | Reproduced |
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in lower 95% confidence intervals for coefficients
term | O_lower | R_lower | Change.lci | Reproduce.lower |
|---|---|---|---|---|
Intercept | 6.6703 | |||
neighborhood_type: | ||||
Low – High | −0.25 | −0.2480 | 0.0020 | Reproduced |
Floating – High | −0.98 | −0.9851 | −0.0051 | Incorrect Rounding |
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced. | ||||
Change in upper 95% confidence intervals for coefficients
term | O_upper | R_upper | Change.uci | Reproduce.upper |
|---|---|---|---|---|
Intercept | 7.2390 | |||
neighborhood_type: | ||||
Low – High | 0.56 | 0.5563 | −0.0037 | Reproduced |
Floating – High | 0.00 | −0.0001 | −0.0001 | Reproduced |
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | 0.049 | ||||
neighborhood_type: | |||||
Low – High | 0.05 | 0.050 | 0.0000 | Reproduced | Remains non-sig, B same direction |
Floating – High | 0.05 | 0.049 | −0.0010 | Reproduced | Non-sig to sig, B same direction |
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.05 were set to 0.049 and >=0.05 set to 0.05 for the purposes of comparison. | |||||
Bland Altman plot showing differences between original and reproduced p-values for e_coli drain water
Results for p-values
One p-value changed from non-significant to significant; however, the reproduced p-value was 0.499, and the change is seen as a rounding issue, rather than an error.
Conclusion computational reproducibility
This model was mostly computationally reproducible, with only minor rounding differences. P-values were reproduced; however, one p-value crossed the statistical significance threshold due to rounding, while the direction of the regression coefficients remained unchanged.
Methods
The linear model was computationally reproduced. Residual diagnostics indicated possible non-normality; given the geographically clustered design, a linear mixed model (LMM) with a random intercept for neighbourhood was fitted to accommodate within-neighbourhood correlation. Model fit was compared between the linear model and the LMM using the Akaike Information Criterion (AIC). To examine the sensitivity of inferences, parametric bootstrap sampling (10,000 draws) was used to obtain standardised fixed-effect estimates and percentile 95% confidence intervals from the LMM, and confidence-interval widths were inspected. Percentage and absolute changes in estimates and interval bounds relative to the linear model were summarised using <0.10 and <0.20 thresholds. Coefficient direction and statistical significance were assessed for consistency.
Results
The unadjusted ICC (null model) for drain water was low (0.061), and the mixed model’s AIC (286) was slightly higher than the linear model’s (284). Nevertheless, a bootstrapped mixed model was used for comparison with the reproduced results, as it better represents the study’s clustered sampling design and underlying data structure.
Bootstrapping results
Bias-Corrected and Accelerated (BCa) bootstrap confidence intervals were calculated using 10,000 resamples.
Change in regression coefficients
Term | B | boot.B | B_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.0397 | 0.0405 | −0.0008 | −2.1100 | No | No | No |
neighborhood_typeLow | 0.1661 | 0.1659 | 0.0001 | 0.0900 | No | No | No |
neighborhood_typeFloating | −0.5306 | −0.5353 | 0.0046 | 0.8700 | No | No | No |
B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in lower 95% confidence interval
Term | Lower | boot.Lower | Lower_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.2666 | −0.2848 | 0.0182 | 6.8200 | No | No | No |
neighborhood_typeLow | −0.2671 | −0.2772 | 0.0101 | 3.7900 | No | No | No |
neighborhood_typeFloating | −1.0612 | −1.0871 | 0.0259 | 2.4500 | No | No | No |
Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in upper 95% confidence interval
Term | Upper | boot.Upper | Upper_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.3460 | 0.3628 | −0.0168 | −4.8600 | No | No | No |
neighborhood_typeLow | 0.5993 | 0.6226 | −0.0233 | −3.8800 | No | No | No |
neighborhood_typeFloating | −0.0001 | 0.0208 | −0.0209 | −29,659.7100 | Yes | No | No |
Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in Range of 95% confidence interval
Term | Range | boot.Range | Range_Diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.6126 | 0.6476 | 0.0350 | 5.7100 | No | No | No |
neighborhood_typeLow | 0.8664 | 0.8998 | 0.0334 | 3.8500 | No | No | No |
neighborhood_typeFloating | 1.0611 | 1.1079 | 0.0468 | 4.4100 | No | No | No |
Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in p-value significance and regression coefficient direction
Term | p-value | boot.p-value | changep | SigChangeDirection |
|---|---|---|---|---|
Intercept | 0.7976 | 0.8045 | −0.0069 | Remains non-sig, B same direction |
neighborhood_typeLow | 0.4485 | 0.4830 | −0.0344 | Remains non-sig, B same direction |
neighborhood_typeFloating | 0.0500 | 0.0580 | −0.0080 | Sig to non-sig, B changes direction |
p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model. | ||||
Check distribution of bootstrap estimates
The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.
Conclusions based on bootstrapped model
This model was inferentially reproducible. Although some statistics changed by ≥10%, these differences were not meaningful: changes in standardised regression coefficients were <0.10. coefficient directions and statistical significance remained consistent between the reproduced and bootstrapped model.
Model 2
Model results for e_coli produce water
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 3.15 | |||||
Neighborhood_type: | ||||||
Low – High | −0.04 | −0.65 | 0.56 | 0.05 | ||
Floating – High | 0.30 | −0.44 | 1.04 | 0.05 | ||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for e_coli produce water
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for e_coli produce water
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Neighborhood_type | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results e_coli produce water
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 3.152 | 0.217 | 2.721 | 3.582 | 14.517 | <0.001 |
Neighborhood_type: | ||||||
Low – High | −0.045 | 0.307 | −0.654 | 0.564 | −0.146 | 0.8840 |
Floating – High | 0.302 | 0.376 | −0.444 | 1.048 | 0.803 | 0.4238 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Model fit for e_coli produce water
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.097 | 0.009 | −0.011 | 352.150 | 1.352 | 0.457 | 2 | 97 | 0.6342 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA table for e_coli produce water
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Neighborhood_type | 1.725 | 2 | 0.862 | 0.457 | 0.6342 |
Residuals | 182.873 | 97 | 1.885 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
0.677 | 0.7129 | 2 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- No substantial evidence of heteroscedasticity was detected across groups
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
e_coli produce water | 100 | 3.194 | 1.366 | 3.149 | 1.398 | 6.039 | 0.202 | −1.054 |
Neighborhood_type* | 100 | 1.800 | 0.752 | 2.000 | 1.000 | 3.000 | 0.338 | −1.190 |
.fitted | 100 | 3.194 | 0.132 | 3.152 | 3.107 | 3.454 | 1.392 | 0.072 |
.resid | 100 | 0.000 | 1.359 | −0.003 | −2.056 | 2.890 | 0.180 | −1.015 |
.leverage | 100 | 0.030 | 0.010 | 0.025 | 0.025 | 0.050 | 1.478 | 0.185 |
.sigma | 100 | 1.373 | 0.007 | 1.375 | 1.347 | 1.380 | −1.358 | 1.924 |
.cooksd | 100 | 0.011 | 0.013 | 0.007 | 0.000 | 0.065 | 1.877 | 3.576 |
.std.resid | 100 | −0.000 | 1.005 | −0.002 | −1.536 | 2.132 | 0.178 | −1.013 |
dfb.1_ | 100 | 0.000 | 0.102 | 0.000 | −0.208 | 0.333 | 0.281 | 1.439 |
dfb.Ng_L | 100 | 0.000 | 0.100 | 0.000 | −0.235 | 0.246 | 0.110 | −0.447 |
dfb.Ng_F | 100 | −0.000 | 0.109 | 0.000 | −0.290 | 0.367 | −0.181 | 2.129 |
dffit | 100 | 0.000 | 0.181 | −0.000 | −0.355 | 0.450 | 0.099 | −0.622 |
cov.r | 100 | 1.031 | 0.033 | 1.038 | 0.916 | 1.084 | −1.028 | 1.694 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.
DFBETA threshold
DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.Ng_L | dfb.Ng_F | dffit | cov.r |
|---|---|---|---|---|---|---|---|---|
5 | 0.392 | 0.050 | 0.003 | −0.000 | −0.000 | 0.073 | 0.090 | 1.081 |
84 | 2.140 | 0.025 | 0.038 | −0.000 | 0.242 | −0.000 | 0.343 | 0.920 |
74 | 2.172 | 0.025 | 0.039 | −0.000 | 0.246 | −0.000 | 0.348 | 0.916 |
7 | −1.547 | 0.050 | 0.041 | 0.000 | 0.000 | −0.290 | −0.355 | 1.009 |
62 | 1.617 | 0.050 | 0.045 | −0.000 | −0.000 | 0.303 | 0.371 | 1.002 |
96 | 1.960 | 0.050 | 0.065 | −0.000 | −0.000 | 0.367 | 0.450 | 0.965 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | ||||||||
Results for outliers and influential points
Cook’s values were below 0.5 and dbetas, dffit and COVRATIO were within reasonable range indicating the outliers are not substantially effecting results.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.106 | 0.2091 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.954 | 0.0016 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot indicates the residuals are not normally distributed.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
0.076 | 1.820 | 0.3260 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- The study design is not independent and should be assessed using linear mixed models or generalized estimating equations.
Assumption conclusions
The model was found to meet the assumption of homoscedasticity. As there were no continuous predictors, linearity was not relevant. Normality tests and the Q–Q plot suggested that the residuals may not be normally distributed. While no evidence of autocorrelation was observed, the study design involved geographic clustering, which warrants checking whether residuals are correlated within clusters.
Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for e_coli produce water
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 3.15 | 3.1516 | 0.0016 | Reproduced |
Neighborhood_type: | ||||
Low – High | −0.04 | −0.0449 | −0.0049 | Reproduced |
Floating – High | 0.30 | 0.3020 | 0.0020 | Reproduced |
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in lower 95% confidence intervals for coefficients
term | O_lower | R_lower | Change.lci | Reproduce.lower |
|---|---|---|---|---|
Intercept | 2.7207 | |||
Neighborhood_type: | ||||
Low – High | −0.65 | −0.6543 | −0.0043 | Reproduced |
Floating – High | −0.44 | −0.4443 | −0.0043 | Reproduced |
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced. | ||||
Change in upper 95% confidence intervals for coefficients
term | O_upper | R_upper | Change.uci | Reproduce.upper |
|---|---|---|---|---|
Intercept | 3.5825 | |||
Neighborhood_type: | ||||
Low – High | 0.56 | 0.5645 | 0.0045 | Reproduced |
Floating – High | 1.04 | 1.0483 | 0.0083 | Incorrect Rounding |
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | 0.049 | ||||
Neighborhood_type: | |||||
Low – High | 0.05 | 0.050 | 0.0000 | Reproduced | Remains non-sig, B same direction |
Floating – High | 0.05 | 0.050 | 0.0000 | Reproduced | Remains non-sig, B same direction |
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.05 were set to 0.049 and >=0.05 set to 0.05 for the purposes of comparison. | |||||
Results for p-values
The p-values were reproduced.
Conclusion computational reproducibility
This model was mostly computationally reproducible, with minor rounding errors. P-values were reproduced and had the same interpretation, and regression coefficients did not change direction.
Methods
The linear model was computationally reproduced. Residual diagnostics suggested possible non-normality. Given the geographically clustered design, a linear mixed model with a neighbourhood random intercept was also fitted to assess clustering. The random-intercept variance was essentially zero and the fit was singular, indicating no discernible between-neighbourhood correlation; consequently, inference proceeded with the ordinary linear model. Non-parametric BCa bootstrapping (10,000 resamples) was used to obtain standardised coefficients and 95% confidence intervals (CIs), and CI widths were examined. Percentage and absolute changes in estimates and CI bounds, relative to the original linear-model estimates, were summarised using <0.10 and <0.20 thresholds. Coefficient direction and statistical significance were assessed for consistency.
Results
Bootstrapping results
Bias-Corrected and Accelerated (BCa) bootstrap confidence intervals were calculated using 10,000 resamples.
Change in regression coefficients
Term | B | boot.B | B_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.0311 | −0.0297 | −0.0014 | −4.4300 | No | No | No |
Neighborhood_typeLow | −0.0329 | −0.0338 | 0.0009 | 2.8000 | No | No | No |
Neighborhood_typeFloating | 0.2212 | 0.2210 | 0.0002 | 0.0800 | No | No | No |
B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in lower 95% confidence interval
Term | Lower | boot.Lower | Lower_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.3466 | −0.3426 | −0.0041 | −1.1700 | No | No | No |
Neighborhood_typeLow | −0.4791 | −0.4608 | −0.0183 | −3.8200 | No | No | No |
Neighborhood_typeFloating | −0.3254 | −0.3487 | 0.0233 | 7.1600 | No | No | No |
Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in upper 95% confidence interval
Term | Upper | boot.Upper | Upper_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.2845 | 0.2838 | 0.0007 | 0.2300 | No | No | No |
Neighborhood_typeLow | 0.4134 | 0.3973 | 0.0161 | 3.8900 | No | No | No |
Neighborhood_typeFloating | 0.7677 | 0.7742 | −0.0065 | −0.8400 | No | No | No |
Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in Range of 95% confidence interval
Term | Range | boot.Range | Range_Diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.6311 | 0.6264 | −0.0047 | −0.7500 | No | No | No |
Neighborhood_typeLow | 0.8925 | 0.8581 | −0.0344 | −3.8600 | No | No | No |
Neighborhood_typeFloating | 1.0931 | 1.1229 | 0.0298 | 2.7200 | No | No | No |
Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in p-value significance and regression coefficient direction
Term | p-value | boot.p-value | changep | SigChangeDirection |
|---|---|---|---|---|
Intercept | 0.8454 | 0.8512 | −0.0058 | Remains non-sig, B same direction |
Neighborhood_typeLow | 0.8840 | 0.8770 | 0.0071 | Remains non-sig, B same direction |
Neighborhood_typeFloating | 0.4238 | 0.4422 | −0.0183 | Remains non-sig, B same direction |
p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model. | ||||
Check distribution of bootstrap estimates
The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.
Conclusions based on bootstrapped model
The model was inferentially reproducible: standardised regression coefficients showed no meaningful change in either percentage or absolute terms, and effect directions and statistical significance were consistent between the two models.
Model 3
Model results for e_coli municipal water
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 1.89 | |||||
Water_type: | ||||||
Surface water – Municipal water | 3.28 | 2.66 | 3.89 | 0.049 | ||
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit Statistics e_coli municipal water
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for e_coli municipal water
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Water_type | |||||
Residuals | |||||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square. | |||||
Model results for e_coli municipal water
Term | B | SE | Lower | Upper | t | p-value |
|---|---|---|---|---|---|---|
Intercept | 1.894 | 0.222 | 1.452 | 2.335 | 8.516 | <0.001 |
Water_type: | ||||||
Surface water – Municipal water | 3.277 | 0.314 | 2.653 | 3.901 | 10.422 | <0.001 |
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval. | ||||||
Fit statistics for e_coli municipal water
R | R2 | R2Adj | AIC | RMSE | F | DF1 | DF2 | p-value |
|---|---|---|---|---|---|---|---|---|
0.725 | 0.526 | 0.521 | 378.266 | 1.556 | 108.609 | 1 | 98 | <0.001 |
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals. | ||||||||
ANOVA Table for e_coli municipal water
Term | SS | DF | MS | F | p-value |
|---|---|---|---|---|---|
Water_type | 268.469 | 1 | 268.469 | 108.609 | <0.001 |
Residuals | 242.245 | 98 | 2.472 | ||
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS. | |||||
Visualisation of regression model
The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.
Checking residuals plots for patterns
Blue line showing quadratic fit for residuals
Checking univariate relationships with the dependent variable using scatterplots
Blue line shows linear relationship, red line indicates relationship inferred by GAM modelling
Testing for homoscedasticity
Statistic | p-value | Parameter | Method |
|---|---|---|---|
0.078 | 0.7797 | 1 | studentized Breusch-Pagan test |
Homoscedasticity results
- The studentized Breusch-Pagan test supports homoscedasticity.
- No substantial evidence of heteroscedasticity was detected across groups.
Model descriptives including cook’s distance and leverage to understand outliers
Term | N | Mean | SD | Median | Min | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
e_coli municipal water | 100 | 3.532 | 2.271 | 3.561 | −0.301 | 7.383 | 0.041 | −0.826 |
Water_type* | 100 | 1.500 | 0.503 | 1.500 | 1.000 | 2.000 | 0.000 | −2.020 |
.fitted | 100 | 3.532 | 1.647 | 3.532 | 1.894 | 5.171 | 0.000 | −2.020 |
.resid | 100 | −0.000 | 1.564 | 0.011 | −2.679 | 2.490 | 0.052 | −1.326 |
.leverage | 100 | 0.020 | 0.000 | 0.020 | 0.020 | 0.020 | 5.730 | 44.651 |
.sigma | 100 | 1.572 | 0.007 | 1.573 | 1.556 | 1.580 | −0.300 | −1.318 |
.cooksd | 100 | 0.010 | 0.009 | 0.009 | 0.000 | 0.030 | 0.296 | −1.324 |
.std.resid | 100 | −0.000 | 1.005 | 0.007 | −1.721 | 1.600 | 0.052 | −1.326 |
dfb.1_ | 100 | −0.000 | 0.101 | 0.000 | −0.202 | 0.230 | −0.013 | 0.510 |
dfb.W_Sw | 100 | −0.000 | 0.102 | −0.013 | −0.176 | 0.144 | 0.059 | −1.318 |
dffit | 100 | 0.000 | 0.144 | 0.001 | −0.248 | 0.230 | 0.053 | −1.320 |
cov.r | 100 | 1.020 | 0.018 | 1.022 | 0.980 | 1.042 | −0.282 | −1.341 |
* categorical variable | ||||||||
Cooks threshold
Cook’s distance measures the overall change in fit, if an observation is removed. Potential influential observations are identified by \(\text{Cook's Distance}_i > \frac{4}{n}\), where n is the number of observations. In practice a threshold of 0.5 is often used to identify influential observations.
DFFIT threshold
DFFIT measures how many standard deviations the fitted values will change when observation is removed. Potential influential observations \(\left| \text{DFFITS}_i \right| > \frac{2\sqrt{p}}{\sqrt{n}}\) where p is the number of predictors (including the intercept) and n is the number of observations. In practice this can result in a large number of points identified, often DFFIT \(\pm 1\) is used to identify highly influential observations.
DFBETA threshold
DFBETA measures the change in a regression coefficient, in units of its standard error, when a particular observation is removed from the model. There is a DFBETA for each parameter in the model. Potential influential observations \(|\text{DFBETA}_{ij}| > \frac{2}{\sqrt{n}}\), where n is the number of observations. In larger datasets this threshold can flag a high number of observations with only minor influence on the model. In practice, DFBETA \(\pm 1\) is often used to identify outliers.
Influence plot
Observations with high leverage (horizontal) and large residuals (vertical, typically at ±2 or ±3 studentized residuals) are concerning, as they may disproportionately influence the model. This combination is reflected by large bubbles with high Cook’s distance indicated by darker shadings of blue.
COVRATIO plot
COVRATIO measures the overall change in the precision (covariance matrix) of the estimated regression coefficients when the ith observation is removed. Values close to 1 indicate little influence on the model’s precision. Values below 1 suggest that an observation inflates the variances and reduces precision, resulting in wider confidence intervals, whereas values above 1 suggest deflated variances and narrower confidence intervals. A commonly cited guideline is \(\left|\mathrm{COVRATIO}_i - 1\right| > \frac{3p}{n}\), where p is the number of parameters and n is the number of observations. A practical cut-off between 0.9 to 1.1 was used to flag observations with meaningful impact on precision, although there is no agreed universal alternative cut-off.
Observations of interest identified by the influence plot
ID | StudRes | Leverage | CookD | dfb.1_ | dfb.W_Sw | dffit | cov.r |
|---|---|---|---|---|---|---|---|
1 | −1.417 | 0.020 | 0.020 | −0.202 | 0.143 | −0.202 | 1.000 |
51 | 1.429 | 0.020 | 0.021 | 0.000 | 0.144 | 0.204 | 0.999 |
14 | 1.613 | 0.020 | 0.026 | 0.230 | −0.163 | 0.230 | 0.988 |
98 | −1.739 | 0.020 | 0.030 | −0.000 | −0.176 | −0.248 | 0.980 |
StudRes = studentized residuals; CookD = Cook's Distance a combined measure of leverage and influence. DFBETAS (dfb.*) measures how much a specific regression coefficient changes (in standard errors) when an observation is removed; DFFITS measures how much the fitted (predicted) value for an observation changes (in standard deviations) when that observation is removed; cov.r = coefficient covariance ratio which measures how much the overall variance (precision) of the coefficients changes when that observation is removed. | |||||||
Results for outliers and influential points
Cook’s values were below 0.5 and dbetas, dffit and COVRATIO were within reasonable range indicating the outliers are not substantially effecting results.
Checking for normality of the residuals using a Q–Q plot
Normality of residuals using Shapiro-Wilk and Kolmogorov-Smirnov tests
Statistic | p-value | Method |
|---|---|---|
0.106 | 0.2095 | Asymptotic one-sample Kolmogorov-Smirnov test |
Statistic | p-value | Method |
|---|---|---|
0.930 | <0.001 | Shapiro-Wilk normality test |
Normality results
- The Kolmogorov-Smirnov supports residuals being normally distributed.
- The Shapiro-Wilk normality test indicates residuals may not be normally distributed.
- QQ-plot indicates the residuals are not normally distributed.
Assessing independence with the Durbin–Watson test for autocorrelation
AutoCorrelation | Statistic | p-value |
|---|---|---|
0.019 | 1.921 | 0.7120 |
Independence results
- The Durbin–Watson test suggests there are no auto-correlation issues.
- The study design is not independent and should be assessed using linear mixed models or generalized estimating equations.
Assumption conclusions
The model was found to meet the assumption of homoscedasticity. As there were no continuous predictors, linearity was not relevant. Normality tests and the Q–Q plot suggested that the residuals may not be normally distributed. While no evidence of autocorrelation was observed, the study design involved geographic clustering, which warrants checking whether residuals are correlated within clusters.
Forest plot showing original and reproduced coefficients and 95% confidence intervals for e_coli municipal water
Change in regression coefficients
term | O_B | R_B | Change.B | reproduce.B |
|---|---|---|---|---|
Intercept | 1.89 | 1.8936 | 0.0036 | Reproduced |
Water_type: | ||||
Surface water – Municipal water | 3.28 | 3.2770 | −0.0030 | Reproduced |
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced. | ||||
Change in lower 95% confidence intervals for coefficients
term | O_lower | R_lower | Change.lci | Reproduce.lower |
|---|---|---|---|---|
Intercept | 1.4523 | |||
Water_type: | ||||
Surface water – Municipal water | 2.66 | 2.6530 | −0.0070 | Incorrect Rounding |
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced. | ||||
Change in upper 95% confidence intervals for coefficients
term | O_upper | R_upper | Change.uci | Reproduce.upper |
|---|---|---|---|---|
Intercept | 2.3348 | |||
Water_type: | ||||
Surface water – Municipal water | 3.89 | 3.9010 | 0.0110 | Not Reproduced |
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced. | ||||
Change in p-values
Term | O_p | R_p | Change.p | Reproduce.p | SigChangeDirection |
|---|---|---|---|---|---|
Intercept | 0.049 | ||||
Water_type: | |||||
Surface water – Municipal water | 0.049 | 0.049 | 0.0000 | Reproduced | Remains sig, B same direction |
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.05 were set to 0.049 and >=0.05 set to 0.05 for the purposes of comparison. | |||||
Results for p-values
The p-value in this model was reproduced.
Conclusion computational reproducibility
This model was mostly computationally reproducible, with minor rounding errors. P-values were reproduced and had the same interpretation, and regression coefficients did not change direction.
Methods
The linear model was computationally reproduced. Residual diagnostics indicated possible non-normality; given the geographically clustered design, a linear mixed model (LMM) with a random intercept for neighbourhood was fitted to accommodate within-neighbourhood correlation. Model fit was compared between the linear model and the LMM using the Akaike Information Criterion (AIC). To examine the sensitivity of inferences, parametric bootstrap sampling (10,000 draws) was used to obtain standardised fixed-effect estimates and percentile 95% confidence intervals from the LMM, and confidence-interval widths were inspected. Percentage and absolute changes in estimates and interval bounds relative to the linear model were summarised using <0.10 and <0.20 thresholds. Coefficient direction and statistical significance were assessed for consistency.
Results
Bootstrap results
Parametric bootstrap resampling with 10,000 iterations was used to obtain percentile confidence intervals for the fixed effects of the linear mixed model.
Change in regression coefficients
Term | B | boot.B | B_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.7214 | −0.7202 | −0.0012 | −0.1700 | No | No | No |
Water_typeSurface water | 1.4428 | 1.4424 | 0.0004 | 0.0300 | No | No | No |
B = standardized regression coefficient reproduced B; boot.B = boostrapped standardized reproduced B; B_diff = change in B - boot.B; %_Diff = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in lower 95% confidence interval
Term | Lower | boot.Lower | Lower_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.9157 | −1.1535 | 0.2378 | 25.9700 | Yes | Yes | Yes |
Water_typeSurface water | 1.1681 | 1.2472 | −0.0791 | −6.7700 | No | No | No |
Lower = standardized reproduced lower CI; boot.Lower = boostrapped standardized reproduced lower CI; Lower_diff = change in Lower - boot.Lower; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in upper 95% confidence interval
Term | Upper | boot.Upper | Upper_diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | −0.5271 | −0.2786 | −0.2485 | −47.1500 | Yes | Yes | Yes |
Water_typeSurface water | 1.7175 | 1.6373 | 0.0802 | 4.6700 | No | No | No |
Upper = standardized reproduced upper CI; boot.Upper = boostrapped standardized reproduced upper CI; Upper_diff = change in Upper - boot.Upper; %_change = percentage difference; Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in Range of 95% confidence interval
Term | Range | boot.Range | Range_Diff | %_Diff | Diff_10% | Diff_0.1 | Diff_0.2 |
|---|---|---|---|---|---|---|---|
Intercept | 0.3885 | 0.8749 | 0.4863 | 125.1700 | Yes | Yes | Yes |
Water_typeSurface water | 0.5495 | 0.3901 | −0.1594 | −29.0000 | Yes | Yes | No |
Range = standardized reproduced CI range; boot.B = boostrapped standardized reproduced CI range; Range_diff = change in CI Range ; %_change = percentage difference, Diff_10% = difference ≥10% ; Diff_0.1 and Diff_0.2 = absolute difference ≥0.1 and ≥0.2, respectively. | |||||||
Change in p-value significance and regression coefficient direction
Term | p-value | boot.p-value | changep | SigChangeDirection |
|---|---|---|---|---|
Intercept | <0.001 | 0.0016 | −0.0016 | Remains sig, B same direction |
Water_typeSurface water | <0.001 | <0.001 | −0.0002 | Remains sig, B same direction |
p-value = standardized reproduced p-value; boot.p-value = boostrapped standardized reproduced p-value; changep = change in p-value - boot.p-value; SigChangeDirection = statistical significance and B change between reproduced and bootstrapped model. | ||||
Check distribution of bootstrap estimates
The bootstrap distribution of each coefficient appeared approximately normal and centered near the original estimate (red dashed line), suggesting that the estimates are relatively stable. No strong skewness or multimodality was observed.
Conclusions based on bootstrapped model
Substantial correlation of error residuals was observed as the study design was geographically clustered with the unadjusted ICC being 0.264 and the AIC was for the linear mixed model was 47 units lower than the linear model. The linear mixed model was bootstrapped and the CI range differed by a standardized difference of 0.16 and was not inferentially reproducible.