Paper 12: The bacterial communities of the small intestine and stool in children with short bowel syndrome

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

March 14, 2026

References

Zeichner SL, Mongodin EF, Hittle L, Huang S-H, Torres C (2019) The bacterial communities of the small intestine and stool in children with short bowel syndrome. PLoS ONE 14(5): e0215351. https://doi.org/10.1371/journal.pone.0215351

L. Zeichner, Steven; F. Mongodin, Emmanuel; Hittle, Lauren; Huang, Szu-Han; Torres, Clarivet (2019). S1 Table. PLOS ONE. Dataset. https://doi.org/10.1371/journal.pone.0215351.s001

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Summary from statistical review

This paper examined gut dysbiosis in children with short bowel syndrome, comparing those with adverse clinical features (parenteral nutrition dependence, shorter bowel length, and absence of an ileocecal valve) with those without parenteral nutrition dependence. Regression analyses assessed associations between Shannon diversity and bowel length, parenteral nutrition status, and their interaction. The statistical approach was described primarily in mathematical terms rather than in clear, basic terms. Visual inspection of scatterplots suggested potentially non-linear relationships. Model assumptions were not reported as having been assessed, outliers were removed without clear justification, and results were discussed largely in terms of direction of association rather than the magnitude or the clinical relevance of effect size.

Data availability and software used

The authors provided links to sequence data on an external website. While demographic data were also available from this site, it was not possible to create the outcome variable (species diversity) without additional instructions. Although the unproccessed dataset is technically accessible, the analyses cannot be reliably reproduced without the syntax used to generate the species diversity variable. Analyses of the linear regression models were conducted using SAS.

Regression sample

Two linear regression models were presented in Figure 1. The outcome variable was species diversity, with bowel length and parenteral nutrition as the main effects, and an interaction term included. Separate models were fitted for jejunal and stool samples.

Computational reproducibility results

This paper could not be computationally reproduced because insufficient detail was provided to derive the species diversity variable from the genetic data. A bioinformatician attempted to reproduce the results; however, the variable could not be derived from the available genetic data due to unclear methodological steps, older software versions, and the absence of the analysis code. To enable reproducibility, the code used to generate the outcome variable should be provided. Additionally, the derived data used in the linear regression analyses should be supplied in wide format to facilitate access.

Inferential reproducibility results

As this paper was not computationally reproducible. Therefore, inferential reproducibility was not considered.

Recommended changes

Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.
Consider creating a reproducible analysis workflow and sharing the code.
Provide the secondary data created from the genetic data in an accessible format to ensure reusability.