Paper 7: Root and alveolar bone changes in first premolars adjacent to the traction of buccal versus palatal maxillary impacted canines

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

March 3, 2026

Reference

Rodríguez-Cárdenas YA, Ruíz-Mora GA, Aliaga-Del Castillo A, Dias-Da Silveira HL, Arriola-Guillén LE (2019) Root and alveolar bone changes in first premolars adjacent to the traction of buccal versus palatal maxillary impacted canines. PLoS ONE 14(12): e0226267. https://doi.org/10.1371/journal.pone.0226267

Rodríguez-Cárdenas, Yalil Augusto; Ruíz-Mora, Gustavo Armando; Castillo, Aron Aliaga-Del; Silveira, Heraldo Luis Dias-Da; Arriola-Guillén, Luis Ernesto (2019). Premolars. PLOS ONE. Dataset. https://doi.org/10.1371/journal.pone.0226267.s001

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability, and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Results from the reproduction of the Rodríguez-Cárdenas et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.

Summary from statistical review

This research compared root and alveolar bone changes in first premolars adjacent to orthodontic traction on buccal versus palatal maxillary canines. The analysis in this paper is presented in tables, with eight linear regressions across two tables (Tables 6 and 7). However, the authors did not report checking the linearity assumptions, provide standard error or confidence intervals, or interpret the coefficients. Repeated measures may not have been accounted for, as there appear to be 25 people with 36 teeth; the number of observations was not reported in the regression tables, so it is unclear. The authors called their modelling approach the overfit method, with up to seven variables per model, which is likely to be overfit and may not be generalisable.

Data availability and software used

The authors provided the data in a Excel file in a long format, accompanied by a separate spreadsheet containing a data dictionary. The data were available both in the Supporting Information and on Figshare. SPSS was used for analyses of linear regression models.

Regression sample

Three of the eight identified linear regressions were randomly selected for reproduction. These outcomes were changes in total length in the sagittal section, changes in root area at the upper limit of the middle third of the axial section, and changes in buccal bone height. The authors reported regression coefficients, p-values, and R2.

Computational reproducibility results

None of the statistics in this paper are reproducible due to discrepancies in the number of observations between the tables and the data provided. In the tables, there were 36 teeth from 25 people. However, the data file only contains information on 34 teeth, with no person ID reported. Table 1 shows 15 teeth in the buccal group; however, there are only 13 in the data. The basic mean and SD from these groups were also unable to be reproduced, even when group size was consistent. For example, in the palatal group with 21 teeth, the reported mean change in buccal bone height was reported as mean = 0.99 SD = 2.04 in the data this was mean = 0.957, SD = 1.886.

While this paper included a data dictionary, it wasn’t easy to directly match variables to the results; for example, the main grouping variable was called “condition” in the paper, but in the data dictionary it was called “POSITION”. It was made more difficult because the data lacked the correct number of observations and were not reproducible. Therefore, the reproduced models are the best guess based on the paper’s description of the data.

Inferential reproducibility results

As this paper was not computationally reproducible, inferential reproducibility was not considered, as the original analyses could not be reproduced and, therefore, statistical assumptions could not be meaningfully compared or interpreted.

Model 1

Model results for change in total length sagittal section

Term

B

SE

Lower

Upper

t

p-value

Intercept

2.418

0.657

ANB

−0.461

0.034

APDI

−0.033

0.612

Duration_traction

0.204

0.010

Impaction_condition:

Palatine – Buccal

−0.437

0.380

Maxillary_length

0.391

0.063

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for change in total length sagittal section

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.479

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for change in total length sagittal section

Term

SS

DF

MS

F

p-value

ANB

APDI

Duration_traction

Impaction_condition

Maxillary_length

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for change in total length sagittal section

Term

B

SE

Lower

Upper

t

p-value

Intercept

−8.271

5.135

−18.789

2.247

−1.611

0.1184

ANB

0.422

0.122

0.171

0.672

3.445

0.0018

APDI

0.102

0.062

−0.025

0.230

1.643

0.1116

Duration_traction

−0.027

0.083

−0.197

0.142

−0.331

0.7430

Impaction_condition:

Palatine – Buccal

−0.136

0.536

−1.234

0.962

−0.253

0.8018

Maxillary_length

−0.006

0.059

−0.126

0.115

−0.095

0.9249

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for change in total length sagittal section

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.577

0.333

0.214

118.918

1.132

2.794

5

28

0.0361

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for change in total length sagittal section

Term

SS

DF

MS

F

p-value

ANB

18.470

1

18.470

11.870

0.0018

APDI

4.198

1

4.198

2.698

0.1116

Duration_traction

0.171

1

0.171

0.110

0.7430

Impaction_condition

0.100

1

0.100

0.064

0.8018

Maxillary_length

0.014

1

0.014

0.009

0.9249

Residuals

43.567

28

1.556

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for change in total length sagittal section

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

2.418

−8.2713

−10.6893

Not Reproduced

ANB

−0.461

0.4217

0.8827

Not Reproduced

APDI

−0.033

0.1025

0.1355

Not Reproduced

Duration_traction

0.204

−0.0274

−0.2314

Not Reproduced

Impaction_condition:

Palatine – Buccal

−0.437

−0.1358

0.3012

Not Reproduced

Maxillary_length

0.391

−0.0056

−0.3966

Not Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in R2

O_R2

R_R2

Change.R2

Reproduce.R2

0.479

0.3328

−0.1462

Not Reproduced

O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

0.657

0.1184

−0.5386

Not Reproduced

Remains non-sig, B changes direction

ANB

0.034

0.0018

−0.0322

Not Reproduced

Remains sig, B changes direction

APDI

0.612

0.1116

−0.5004

Not Reproduced

Remains non-sig, B changes direction

Duration_traction

0.010

0.7430

0.7330

Not Reproduced

Sig to non-sig, B changes direction

Impaction_condition:

Palatine – Buccal

0.380

0.8018

0.4218

Not Reproduced

Remains non-sig, B same direction

Maxillary_length

0.063

0.9249

0.8619

Not Reproduced

Remains non-sig, B changes direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman Plot showing differences between Original and Reproduced p-values for change in total length sagittal section

Results for p-values
  • The Bland Altman plot shows wide scatter with mean bias not close to zero, reflecting that p-values could not be reproduced.
Conclusion computational reproducibility

This model was not computationally reproducible.

As this model was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.

Model 2

Model results for root area changes in middle axial section

Term

B

SE

Lower

Upper

t

p-value

Intercept

−59.805

0.164

Age

−0.566

0.047

APDI

0.610

0.087

Maxillary_length

0.420

0.297

Height_impacted_canine

−0.141

0.722

Alfa_angle

−0.030

0.795

Complexity_traction:

Complex – Not Complex

−0.193

0.942

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for root area changes in middle axial section

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.367

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for root area changes in middle axial section

Term

SS

DF

MS

F

p-value

Age

APDI

Maxillary_length

Height_impacted_canine

Alfa_angle

Complexity_traction

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results root area changes in middle axial section

Term

B

SE

Lower

Upper

t

p-value

Intercept

−34.572

14.179

−63.666

−5.478

−2.438

0.0216

Age

−0.258

0.105

−0.473

−0.043

−2.464

0.0204

APDI

0.409

0.141

0.121

0.697

2.910

0.0072

Maxillary_length

0.113

0.189

−0.275

0.501

0.596

0.5562

Height_impacted_canine

−0.116

0.268

−0.667

0.434

−0.434

0.6680

Alfa_angle

0.003

0.077

−0.155

0.162

0.042

0.9665

Complexity_traction:

Complex – Not Complex

−1.257

2.032

−5.426

2.913

−0.619

0.5414

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Model fit for root area changes in middle axial section

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.576

0.332

0.184

192.250

3.232

2.240

6

27

0.0698

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for root area changes in middle axial section

Term

SS

DF

MS

F

p-value

Age

79.843

1

79.843

6.071

0.0204

APDI

111.362

1

111.362

8.468

0.0072

Maxillary_length

4.670

1

4.670

0.355

0.5562

Height_impacted_canine

2.472

1

2.472

0.188

0.6680

Alfa_angle

0.024

1

0.024

0.002

0.9665

Complexity_traction

5.031

1

5.031

0.383

0.5414

Residuals

355.070

27

13.151

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for root area changes in middle axial section

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

−59.805

−34.5720

25.2330

Not Reproduced

Age

−0.566

−0.2579

0.3081

Not Reproduced

APDI

0.610

0.4090

−0.2010

Not Reproduced

Maxillary_length

0.420

0.1128

−0.3072

Not Reproduced

Height_impacted_canine

−0.141

−0.1163

0.0247

Not Reproduced

Alfa_angle

−0.030

0.0033

0.0333

Not Reproduced

Complexity_traction:

Complex – Not Complex

−0.193

−1.2569

−1.0639

Not Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in R2

O_R2

R_R2

Change.R2

Reproduce.R2

0.367

0.3323

−0.0347

Not Reproduced

O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

0.164

0.0216

−0.1424

Not Reproduced

Non-sig to sig, B same direction

Age

0.047

0.0204

−0.0266

Not Reproduced

Remains sig, B same direction

APDI

0.087

0.0072

−0.0798

Not Reproduced

Non-sig to sig, B same direction

Maxillary_length

0.297

0.5562

0.2592

Not Reproduced

Remains non-sig, B same direction

Height_impacted_canine

0.722

0.6680

−0.0540

Not Reproduced

Remains non-sig, B same direction

Alfa_angle

0.795

0.9665

0.1715

Not Reproduced

Remains non-sig, B changes direction

Complexity_traction:

Complex – Not Complex

0.942

0.5414

−0.4006

Not Reproduced

Remains non-sig, B same direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman Plot showing differences between Original and Reproduced p-values for root area changes in middle axial section

Results for p-values
  • The Bland Altman plot shows wide scatter with mean bias not close to zero, reflecting that p-values could not be reproduced.
Conclusion computational reproducibility

This model was not computationally reproducible.

As this model was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.

Model 3

Model results for change in Buccal bone height

Term

B

SE

Lower

Upper

t

p-value

Intercept

−5.457

0.548

Sex:

Female – Male

−1.247

0.288

SNA

0.072

0.470

ANB

−1.074

0.007

Duration_traction

0.314

0.112

Height_impacted_canine

−0.469

0.094

Beta_angle

0.110

0.033

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit Statistics change in Buccal bone height

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.477

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for change in Buccal bone height

Term

SS

DF

MS

F

p-value

Sex

SNA

ANB

Duration_traction

Height_impacted_canine

Beta_angle

Residuals

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for change in Buccal bone height

Term

B

SE

Lower

Upper

t

p-value

Intercept

−0.257

6.596

−13.791

13.278

−0.039

0.9692

Sex:

Female – Male

−0.096

0.668

−1.465

1.274

−0.143

0.8871

SNA

0.023

0.074

−0.129

0.174

0.306

0.7620

ANB

−0.253

0.152

−0.565

0.059

−1.663

0.1079

Duration_traction

0.060

0.134

−0.214

0.335

0.452

0.6548

Height_impacted_canine

−0.072

0.123

−0.325

0.181

−0.583

0.5644

Beta_angle

0.007

0.019

−0.033

0.047

0.343

0.7344

SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for change in Buccal bone height

R

R2

R2Adj

AIC

RMSE

F

DF1

DF2

p-value

0.341

0.116

−0.080

144.581

1.603

0.590

6

27

0.7350

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for change in Buccal bone height

Term

SS

DF

MS

F

p-value

Sex

0.067

1

0.067

0.021

0.8871

SNA

0.303

1

0.303

0.094

0.7620

ANB

8.946

1

8.946

2.764

0.1079

Duration_traction

0.662

1

0.662

0.204

0.6548

Height_impacted_canine

1.102

1

1.102

0.340

0.5644

Beta_angle

0.380

1

0.380

0.118

0.7344

Residuals

87.380

27

3.236

SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Forest plot showing Original and Reproduced coefficients and 95% confidence intervals for change in Buccal bone height

Change in regression coefficients

term

O_B

R_B

Change.B

reproduce.B

Intercept

−5.457

−0.2567

5.2003

Not Reproduced

Sex:

Female – Male

−1.247

−0.0957

1.1513

Not Reproduced

SNA

0.072

0.0225

−0.0495

Not Reproduced

ANB

−1.074

−0.2527

0.8213

Not Reproduced

Duration_traction

0.314

0.0604

−0.2536

Not Reproduced

Height_impacted_canine

−0.469

−0.0720

0.3970

Not Reproduced

Beta_angle

0.110

0.0067

−0.1033

Not Reproduced

O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in R2

O_R2

R_R2

Change.R2

Reproduce.R2

0.477

0.1160

−0.3610

Not Reproduced

O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in p-values

Term

O_p

R_p

Change.p

Reproduce.p

SigChangeDirection

Intercept

0.548

0.9692

0.4212

Not Reproduced

Remains non-sig, B same direction

Sex:

Female – Male

0.288

0.8871

0.5991

Not Reproduced

Remains non-sig, B same direction

SNA

0.470

0.7620

0.2920

Not Reproduced

Remains non-sig, B same direction

ANB

0.007

0.1079

0.1009

Not Reproduced

Sig to non-sig, B changes direction

Duration_traction

0.112

0.6548

0.5428

Not Reproduced

Remains non-sig, B same direction

Height_impacted_canine

0.094

0.5644

0.4704

Not Reproduced

Remains non-sig, B same direction

Beta_angle

0.033

0.7344

0.7014

Not Reproduced

Sig to non-sig, B changes direction

O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman Plot showing differences between Original and Reproduced p-values for change in Buccal bone height

Results for p-values
  • The Bland Altman plot shows wide scatter with mean bias not close to zero, reflecting that p-values could not be reproduced.
Conclusion computational reproducibility

This model was not computationally reproducible.

As this model was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.