Paper 7: Root and alveolar bone changes in first premolars adjacent to the traction of buccal versus palatal maxillary impacted canines

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

March 3, 2026

Reference

Rodríguez-Cárdenas YA, Ruíz-Mora GA, Aliaga-Del Castillo A, Dias-Da Silveira HL, Arriola-Guillén LE (2019) Root and alveolar bone changes in first premolars adjacent to the traction of buccal versus palatal maxillary impacted canines. PLoS ONE 14(12): e0226267. https://doi.org/10.1371/journal.pone.0226267

Rodríguez-Cárdenas, Yalil Augusto; Ruíz-Mora, Gustavo Armando; Castillo, Aron Aliaga-Del; Silveira, Heraldo Luis Dias-Da; Arriola-Guillén, Luis Ernesto (2019). Premolars. PLOS ONE. Dataset. https://doi.org/10.1371/journal.pone.0226267.s001

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability, and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Results from the reproduction of the Rodríguez-Cárdenas et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.

Summary from statistical review

This research compared root and alveolar bone changes in first premolars adjacent to orthodontic traction on buccal versus palatal maxillary canines. The analysis in this paper is presented in tables, with eight linear regressions across two tables (Tables 6 and 7). However, the authors did not report checking the linearity assumptions, provide standard error or confidence intervals, or interpret the coefficients. Repeated measures may not have been accounted for, as there appear to be 25 people with 36 teeth; the number of observations was not reported in the regression tables, so it is unclear. The authors called their modelling approach the overfit method, with up to seven variables per model, which is likely to be overfit and may not be generalisable.

Data availability and software used

The authors provided the data in a Excel file in a long format, accompanied by a separate spreadsheet containing a data dictionary. The data were available both in the Supporting Information and on Figshare. SPSS was used for analyses of linear regression models.

Regression sample

Three of the eight identified linear regressions were randomly selected for reproduction. These outcomes were changes in total length in the sagittal section, changes in root area at the upper limit of the middle third of the axial section, and changes in buccal bone height. The authors reported regression coefficients, p-values, and R².

Computational reproducibility results

None of the statistics in this paper are reproducible due to discrepancies in the number of observations between the tables and the data provided. In the tables, there were 36 teeth from 25 people. However, the data file only contains information on 34 teeth, with no person ID reported. Table 1 shows 15 teeth in the buccal group; however, there are only 13 in the data. The basic mean and SD from these groups were also unable to be reproduced, even when group size was consistent. For example, in the palatal group with 21 teeth, the reported mean change in buccal bone height was reported as mean = 0.99 SD = 2.04 in the data this was mean = 0.957, SD = 1.886.

While this paper included a data dictionary, it wasn’t easy to directly match variables to the results; for example, the main grouping variable was called “condition” in the paper, but in the data dictionary it was called “POSITION”. It was made more difficult because the data lacked the correct number of observations and were not reproducible. Therefore, the reproduced models are the best guess based on the paper’s description of the data.

Inferential reproducibility results

As this paper was not computationally reproducible, inferential reproducibility was not considered, as the original analyses could not be reproduced and, therefore, statistical assumptions could not be meaningfully compared or interpreted.

Recommended Changes

Update the data, as it currently does not match the results reported in the paper. There are discrepancies between the number of observations reported and the number in the dataset.
Include a unique person identifier (ID) in the dataset to ensure each individual can be distinctly identified.
Linear mixed models should be considered to account for the likely correlation of participants with more than one tooth in the analyses.
Update the data dictionary so variables are referred to consistently.
Consider reducing the number of variables included in multivariable models. In some cases, descriptive (univariate) analyses may be sufficient, or models could focus on a smaller set of variables of clear clinical relevance. Overly complex models with many predictors relative to the sample size increase the risk of overfitting, unstable estimates, and limited generalisability. More parsimonious models are typically more interpretable, more stable, and more likely to perform consistently in external populations.
Report the number of observations, standard error, confidence intervals and test statistics in the regression tables.
Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.
Consider using Scatterplots to visualise the relationships between variables.
Adjusted R² should be reported for multivariable models.

Model 1

Model results for change in total length sagittal section

Term	B	p-value
Intercept	2.418	0.657
ANB	−0.461	0.034
APDI	−0.033	0.612
Duration_traction	0.204	0.010
Impaction_condition:
Palatine – Buccal	−0.437	0.380
Maxillary_length	0.391	0.063
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for change in total length sagittal section

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
	0.479
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for change in total length sagittal section

Term	SS	DF	MS	F	p-value
ANB
APDI
Duration_traction
Impaction_condition
Maxillary_length
Residuals
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for change in total length sagittal section

Term	B	SE	Lower	Upper	t	p-value
Intercept	−8.271	5.135	−18.789	2.247	−1.611	0.1184
ANB	0.422	0.122	0.171	0.672	3.445	0.0018
APDI	0.102	0.062	−0.025	0.230	1.643	0.1116
Duration_traction	−0.027	0.083	−0.197	0.142	−0.331	0.7430
Impaction_condition:
Palatine – Buccal	−0.136	0.536	−1.234	0.962	−0.253	0.8018
Maxillary_length	−0.006	0.059	−0.126	0.115	−0.095	0.9249
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for change in total length sagittal section

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
0.577	0.333	0.214	118.918	1.132	2.794	5	28	0.0361
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for change in total length sagittal section

Term	SS	DF	MS	F	p-value
ANB	18.470	1	18.470	11.870	0.0018
APDI	4.198	1	4.198	2.698	0.1116
Duration_traction	0.171	1	0.171	0.110	0.7430
Impaction_condition	0.100	1	0.100	0.064	0.8018
Maxillary_length	0.014	1	0.014	0.009	0.9249
Residuals	43.567	28	1.556
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for change in total length sagittal section

Change in regression coefficients

term	O_B	R_B	Change.B	reproduce.B
Intercept	2.418	−8.2713	−10.6893	Not Reproduced
ANB	−0.461	0.4217	0.8827	Not Reproduced
APDI	−0.033	0.1025	0.1355	Not Reproduced
Duration_traction	0.204	−0.0274	−0.2314	Not Reproduced
Impaction_condition:
Palatine – Buccal	−0.437	−0.1358	0.3012	Not Reproduced
Maxillary_length	0.391	−0.0056	−0.3966	Not Reproduced
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in R²

O_R2	R_R2	Change.R2	Reproduce.R2
0.479	0.3328	−0.1462	Not Reproduced
O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in p-values

Term	O_p	R_p	Change.p	Reproduce.p	SigChangeDirection
Intercept	0.657	0.1184	−0.5386	Not Reproduced	Remains non-sig, B changes direction
ANB	0.034	0.0018	−0.0322	Not Reproduced	Remains sig, B changes direction
APDI	0.612	0.1116	−0.5004	Not Reproduced	Remains non-sig, B changes direction
Duration_traction	0.010	0.7430	0.7330	Not Reproduced	Sig to non-sig, B changes direction
Impaction_condition:
Palatine – Buccal	0.380	0.8018	0.4218	Not Reproduced	Remains non-sig, B same direction
Maxillary_length	0.063	0.9249	0.8619	Not Reproduced	Remains non-sig, B changes direction
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman Plot showing differences between Original and Reproduced p-values for change in total length sagittal section

Results for p-values

The Bland Altman plot shows wide scatter with mean bias not close to zero, reflecting that p-values could not be reproduced.

Conclusion computational reproducibility

This model was not computationally reproducible.

As this model was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.

Model 2

Model results for root area changes in middle axial section

Term	B	p-value
Intercept	−59.805	0.164
Age	−0.566	0.047
APDI	0.610	0.087
Maxillary_length	0.420	0.297
Height_impacted_canine	−0.141	0.722
Alfa_angle	−0.030	0.795
Complexity_traction:
Complex – Not Complex	−0.193	0.942
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for root area changes in middle axial section

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
	0.367
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for root area changes in middle axial section

Term	SS	DF	MS	F	p-value
Age
APDI
Maxillary_length
Height_impacted_canine
Alfa_angle
Complexity_traction
Residuals
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results root area changes in middle axial section

Term	B	SE	Lower	Upper	t	p-value
Intercept	−34.572	14.179	−63.666	−5.478	−2.438	0.0216
Age	−0.258	0.105	−0.473	−0.043	−2.464	0.0204
APDI	0.409	0.141	0.121	0.697	2.910	0.0072
Maxillary_length	0.113	0.189	−0.275	0.501	0.596	0.5562
Height_impacted_canine	−0.116	0.268	−0.667	0.434	−0.434	0.6680
Alfa_angle	0.003	0.077	−0.155	0.162	0.042	0.9665
Complexity_traction:
Complex – Not Complex	−1.257	2.032	−5.426	2.913	−0.619	0.5414
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Model fit for root area changes in middle axial section

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
0.576	0.332	0.184	192.250	3.232	2.240	6	27	0.0698
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for root area changes in middle axial section

Term	SS	DF	MS	F	p-value
Age	79.843	1	79.843	6.071	0.0204
APDI	111.362	1	111.362	8.468	0.0072
Maxillary_length	4.670	1	4.670	0.355	0.5562
Height_impacted_canine	2.472	1	2.472	0.188	0.6680
Alfa_angle	0.024	1	0.024	0.002	0.9665
Complexity_traction	5.031	1	5.031	0.383	0.5414
Residuals	355.070	27	13.151
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.