Paper 17: To know or not to know? Mentalization as protection from somatic complaints

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

April 5, 2026

Reference

Ballespi S, Vives J, Alonso N, Sharp C, Ramirez MS, Fonagy P, et al. (2019) To know or not to know? Mentalization as protection from somatic complaints. PLoS ONE 14(5): e0215308. https://doi.org/10.1371/journal.pone.0215308

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Results from the reproduction of the Ballespi et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.

Summary from statistical review

This paper examined emotional insight and somatic complaints in adolescents. The primary analysis was linear regression, with coefficients, 95% confidence intervals, and p-values reported. No assumptions were reported. The authors used a recognised modelling strategy, although poorly described. Leaving out a group from regression reporting and changing group names made the paper confusing to read. No global p-value was reported, only the difference between groups.

Data availability and software used

The authors provide data in a wide formatted SPSS file, with a data dictionary. SPSS was used for analyses of linear regression models.

Regression sample

The primary outcome variable was somatic complaints, with the main independent variable being insight position. The model reportedly adjusted for demographic and clinical variables. For reproducibility, randomising the regression model was not required, as only one model was presented.

Computational reproducibility results

This model was not computationally reproducible. Several issues contributed to the failure to reproduce the results, including inconsistent reporting of the modelling process, lack of information on missing data, and incomplete reporting of the final model results. While the methods section clearly described the variables considered for linear regression, it is unclear whether a backward selection process was used or whether all variables were included in the adjusted model as suggested by the figure. The univariate means for somatic complaints were able to be reproduced, the multivariable model was not. An attempt was made to reproduce the results in SPSS using the backwards modelling described; however, this was also unsuccessful. The figure implies that the adjusted model had no missing data; however, the SES variable had over 20% of missing values. There was also some missing data in other variables, and the approach used to handle this missingness, whether complete case analysis or imputation, was not reported. Due to these reporting gaps, it is difficult to determine which variables were included in the final model.

Inferential reproducibility results

As this paper was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.

Recommended changes

Provide tables in the Supporting Information that present all analyses conducted in the paper, including full model outputs such as regression coefficients and all variables used for adjustment.
Provide a clear description of the amount and pattern of missing data, along with the methods used to address missingness in the analyses.
Ensure that the modelling process is clearly described and applied consistently throughout the paper. The type of model and its key specifications should be included in the footnotes of the tables to improve transparency.
Consider creating a reproducible analysis workflow and sharing the code.
Evaluate the assumptions of the linear regression models by examining residuals, identifying influential outliers, and assessing multicollinearity among predictors. If any assumptions are violated, address them using appropriate methods.

Model 1

Model results for Somatic complaints

Term	B	Lower	Upper	p-value
Intercept
Insight_positions:
Only comp – Nothing
Only att – Nothing	1.8	0.2	3.4	0.03
Att+Comp – Nothing	−2.2	−4.1	−0.2	0.03
Age
Sex:
Female – Male
Hollings
SASA
MASC
BFI_NEU
BDI
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for Somatic complaints

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value

R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for Somatic complaints

Term	SS	DF	MS	F	p-value
Insight_positions
Age
Sex
Hollings
SASA
MASC
BFI_NEU
BDI
Residuals
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for Somatic complaints

Term	B	SE	Lower	Upper	t	p-value
Intercept	−0.110	3.625	−7.263	7.042	−0.030	0.9757
Insight_positions:
Only comp – Nothing	−0.316	0.990	−2.270	1.639	−0.319	0.7503
Only att – Nothing	0.367	1.047	−1.699	2.433	0.351	0.7264
Att+Comp – Nothing	−3.285	1.160	−5.574	−0.996	−2.831	0.0052
Age	0.015	0.017	−0.018	0.049	0.910	0.3643
Sex:
Female – Male	0.706	0.687	−0.649	2.062	1.028	0.3054
Hollings	0.024	0.264	−0.497	0.546	0.093	0.9263
SASA	−0.051	0.035	−0.120	0.019	−1.445	0.1502
MASC	0.096	0.033	0.032	0.160	2.956	0.0035
BFI_NEU	0.290	0.587	−0.868	1.448	0.494	0.6221
BDI	0.192	0.053	0.087	0.297	3.622	<0.001
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for Somatic complaints

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
0.524	0.274	0.234	1,128.463	4.294	6.842	10	181	<0.001
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for Somatic complaints

Term	SS	DF	MS	F	p-value
Insight_positions	166.007	3	55.336	2.829	0.0399
Age	16.183	1	16.183	0.827	0.3643
Sex	20.664	1	20.664	1.056	0.3054
Hollings	0.168	1	0.168	0.009	0.9263
SASA	40.838	1	40.838	2.088	0.1502
MASC	170.930	1	170.930	8.738	0.0035
BFI_NEU	4.767	1	4.767	0.244	0.6221
BDI	256.560	1	256.560	13.116	<0.001
Residuals	3,540.529	181	19.561
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Forest plot showing original and reproduced coefficients and 95% confidence intervals for Somatic complaints

Change in regression coefficients

term	O_B	R_B	Change.B	reproduce.B
Intercept		−0.1105
Insight_positions:
Only comp – Nothing		−0.3157
Only att – Nothing	1.8	0.3670	−1.4330	Not Reproduced
Att+Comp – Nothing	−2.2	−3.2849	−1.0849	Not Reproduced
Age		0.0154
Sex:
Female – Male		0.7061
Hollings		0.0245
SASA		−0.0507
MASC		0.0961
BFI_NEU		0.2897
BDI		0.1920
O_B = original B; R_B = reproduced B; Change.B = change in R_B - O_B; Reproduce.B = B reproduced.

Change in lower 95% confidence intervals for coefficients

term	O_lower	R_lower	Change.lci	Reproduce.lower
Intercept		−7.2631
Insight_positions:
Only comp – Nothing		−2.2700
Only att – Nothing	0.2	−1.6990	−1.8990	Not Reproduced
Att+Comp – Nothing	−4.1	−5.5742	−1.4742	Not Reproduced
Age		−0.0180
Sex:
Female – Male		−0.6494
Hollings		−0.4971
SASA		−0.1198
MASC		0.0319
BFI_NEU		−0.8682
BDI		0.0874
O_lower = original lower confidence interval; R_lower = reproduced lower confidence interval; change.lci = change in R_lower - O_lower; Reproduce.lower = lower confidence interval reproduced.

Change in upper 95% confidence intervals for coefficients

term	O_upper	R_upper	Change.uci	Reproduce.upper
Intercept		7.0421
Insight_positions:
Only comp – Nothing		1.6387
Only att – Nothing	3.4	2.4330	−0.9670	Not Reproduced
Att+Comp – Nothing	−0.2	−0.9955	−0.7955	Not Reproduced
Age		0.0487
Sex:
Female – Male		2.0615
Hollings		0.5460
SASA		0.0185
MASC		0.1602
BFI_NEU		1.4475
BDI		0.2966
O_upper = original upper confidence interval; R_upper = reproduced upper confidence interval; change.uci = change in R_upper - O_upper; Reproduce.upper = upper confidence interval reproduced.

Change in p-values

Term	O_p	R_p	Change.p	Reproduce.p	SigChangeDirection
Intercept		0.9757
Insight_positions:
Only comp – Nothing		0.7503
Only att – Nothing	0.03	0.7264	0.6964	Not Reproduced	Sig to non-sig, B changes direction
Att+Comp – Nothing	0.03	0.0052	−0.0248	Not Reproduced	Remains sig, B same direction
Age		0.3643
Sex:
Female – Male		0.3054
Hollings		0.9263
SASA		0.1502
MASC		0.0035
BFI_NEU		0.6221
BDI		<0.001
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.001 were set to 0.00099 for the purposes of comparison.

Bland Altman plot showing differences between original and reproduced p-values for Somatic complaints

Results for p-values

P-values were not reproduced.

Conclusion computational reproducibility

This model was not computationally reproducible.

As this model was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.