Paper 44: The growing importance of lone star ticks in a Lyme disease endemic county: Passive tick surveillance in Monmouth County, NJ, 2006 – 2016

Author

Lee Jones - Senior Biostatistician - Statistical Review

Published

April 5, 2026

References

Jordan RA, Egizi A (2019) The growing importance of lone star ticks in a Lyme disease endemic county: Passive tick surveillance in Monmouth County, NJ, 2006 – 2016. PLoS ONE 14(2): e0211778. https://doi.org/10.1371/journal.pone.0211778

Jordan RA, Egizi A (2019). Ticks acquired in Monmouth County and submitted to Monmouth County Mosquito Control Division’s passive tick surveillance program, 2006–2016. PLOS ONE. Dataset. https://doi.org/10.1371/journal.pone.0211778.s001

Disclosure

This reproducibility project was conducted to the best of our ability, with careful attention to statistical methods and assumptions. The research team comprises four senior biostatisticians (three of whom are accredited), with 20 to 30 years of experience in statistical modelling and analysis of healthcare data. While statistical assumptions play a crucial role in analysis, their evaluation is inherently subjective, and contextual knowledge can influence judgements about the importance of assumption violations. Differences in interpretation may arise among statisticians and researchers, leading to reasonable disagreements about methodological choices.

Our approach aimed to reproduce published analyses as faithfully as possible, using the details provided in the original papers. We acknowledge that other statisticians may have differing success in reproducing results due to variations in data handling and implicit methodological choices not fully described in publications. However, we maintain that research articles should contain sufficient detail for any qualified statistician to reproduce the analyses independently.

Methods used in our reproducibility analyses

There were two parts to our study. First, 100 articles published in PLOS ONE were randomly selected from the health domain and sent for post-publication peer review by statisticians. Of these, 95 included linear regression analyses and were therefore assessed for reporting quality. The statisticians evaluated what was reported, including regression coefficients, 95% confidence intervals, and p-values, as well as whether model assumptions were described and how those assumptions were evaluated. This report provides a brief summary of the initial statistical review.

The second part of the study involved reproducing linear regression analyses for papers with available data to assess both computational and inferential reproducibility. All papers were initially assessed for data availability, and the statistical software used. From those with accessible data, the first 20 papers (from the original random sample) were evaluated for computational reproducibility. Within each paper, individual linear regression models were identified and assigned a unique number. A maximum of three models per paper were selected for assessment. When more than three models were reported, priority was given to the final model or the primary models of interest as identified by the authors; any remaining models were selected at random.

To assess computational reproducibility, differences between the original and reproduced results were evaluated using absolute discrepancies and rounding error thresholds, tailored to the number of decimal places reported in each paper. Results for each reported statistic, e.g., regression coefficient, were categorised as Reproduced, Incorrect Rounding, or Not Reproduced, depending on how closely they matched the original values. Each paper was then classified as Reproduced, Mostly Reproduced, Partially Reproduced, or Not Reproduced. The mostly reproduced category included cases with minor rounding or typographical errors, whereas partially reproduced indicated substantial errors were observed, but some results were reproduced.

For models deemed at least partially computationally reproducible, inferential reproducibility was further assessed by examining whether statistical assumptions were met and by conducting sensitivity analyses, including bootstrapping where appropriate. We examined changes in standardized regression coefficients, which reflect the change in the outcome (in standard deviation units) for a one standard deviation increase in the predictor. Meaningful differences were defined as a relative change of 10% or more, or absolute differences of 0.1 (moderate) and 0.2 (substantial). When non-linear relationships were identified, inferential reproducibility was assessed by comparing model fit measures, including R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). When the Gaussian distribution was not appropriate for the dependent variable, alternative distributions were considered, and model fit was evaluated using AIC and BIC.

Results from the reproduction of the Jordan et al. (2019) paper are presented below. An overall summary of results is presented first, followed by model-specific results organised within tab panels. Within each panel, the Original results tab displays the linear regression outputs extracted from the published paper. The Reproduced results tab presents estimates derived from the authors’ shared data, along with a comprehensive assessment of linear regression assumptions. The Differences tab compares the original and reproduced models to assess computational reproducibility. Finally, the Sensitivity analysis tab evaluates inferential reproducibility by examining whether identified assumption violations meaningfully affected the results.

Summary from statistical review

This paper explores tick prevalence and species in Lyme disease-endemic counties in the United States of America. Linear regression is used to examine annual tick numbers over time and the association between tick submissions and Lyme disease. Only R² and p-values were reported. The authors suggest a log transformation due to normality issues and checked for autocorrelation.

Data availability and software used

The authors provided data in a long-format Excel file with no accompanying data dictionary in the supporting information, and also hosted on Figshare. Statistica was used for analyses of linear regression models.

Regression sample

There were eight linear regression models examining temporal changes in different compositions. The three models selected at random comprised two analyses of species trends over time (I. scapularis and D. variabilis), and one analysis of Borrelia burgdorferi infection prevalence in I. scapularis nymphs removed from human hosts over time.

Computational reproducibility results

The three models were not computationally reproducible. While the authors supplied the original raw dataset, the regression variables were not directly provided and had to be reconstructed by aggregating records and computing annual species-specific counts. For I. scapularis and D. variabilis, annual tick counts were calculated, log-transformed, and analysed using linear regression with time as the predictor. For I. scapularis nymph infection prevalence, annual proportions were derived by dividing the number of positive cases by the total number tested, followed by an arcsine square-root transformation prior to regression on time. There were also missing years in the dataset, as well as a record dated 2005 which was not within stated timeframe, indicating that additional data cleaning or exclusion criteria may have been applied but were not reported. As a result, the analyses in the paper could not be computationally reproduced.

Inferential reproducibility results

As this paper was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.

Recommended changes

Provide the final analysis-ready dataset, including all derived variables used in the reported models.
Provide tables in the Supporting Information that present all analyses conducted in the paper, including full model outputs such as regression coefficients and 95% confidence intervals.
Include a data dictionary.

Model 1

Model results for log I.scapularis

Term	B	SE	Lower	Upper	t	p-value
Intercept
Year						0.047
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for log I.scapularis

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
	0.37
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for log I.scapularis

Term	SS	DF	MS	F	p-value
Year
Residuals
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for log I.scapularis

Term	B	SE	Lower	Upper	t	p-value
Intercept	−70.055	33.437	−145.694	5.584	−2.095	0.0656
Year	0.038	0.017	0.000	0.075	2.263	0.0499
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for log I.scapularis

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
0.602	0.363	0.292	−3.414	0.158	5.122	1	9	0.0499
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for log I.scapularis

Term	SS	DF	MS	F	p-value
Year	0.156	1	0.156	5.122	0.0499
Residuals	0.274	9	0.030
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

The blue line shows the best line of fit with shading representing 95% confidence intervals, while holding all other covariates constant. The dots show partial residuals, which reflect the observed data adjusted for all other predictors except the one being plotted.

Change in R²

O_R2	R_R2	Change.R2	Reproduce.R2
0.370	0.3627	−0.0073	Incorrect Rounding
O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in p-values

Term	O_p	R_p	Change.p	Reproduce.p
Intercept		0.0656
Year	0.047	0.0499	0.0029	Not Reproduced
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.01 were set to 0.0099 for the purposes of comparison.

Results for p-values

The p-value was not reproduced.

Conclusion computational reproducibility

This model was not computationally reproducible.

As this model was not computationally reproducible, inferential reproducibility was not considered, since the original analyses could not be reproduced and therefore, statistical assumptions could not be meaningfully compared or interpreted.

Model 2

Model results for log D.variabilis

Term	B	SE	Lower	Upper	t	p-value
Intercept
Year						<0.01
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for log D.variabilis

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
	0.76
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for log D.variabilis

Term	SS	DF	MS	F	p-value
Year
Residuals
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results log D.variabilis

Term	B	SE	Lower	Upper	t	p-value
Intercept	−218.924	30.015	−286.822	−151.026	−7.294	<0.01
Year	0.111	0.015	0.078	0.145	7.457	<0.01
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Model fit for log D.variabilis

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
0.928	0.861	0.845	−5.789	0.142	55.604	1	9	<0.01
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA table for log D.variabilis

Term	SS	DF	MS	F	p-value
Year	1.363	1	1.363	55.604	<0.01
Residuals	0.221	9	0.025
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

Change in R²

O_R2	R_R2	Change.R2	Reproduce.R2
0.760	0.8607	0.1007	Not Reproduced
O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in p-values

Term	O_p	R_p	Change.p	Reproduce.p
Intercept		<0.01
Year	<0.01	<0.01	0.0000	Reproduced
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.01 were set to 0.0099 for the purposes of comparison.

Results for p-values

The p-value for this model was reproduced.

Conclusion computational reproducibility

This model was not computationally reproducible.

Model 3

Model results for arcsin I.scapularis nymphs rate

Term	B	SE	Lower	Upper	t	p-value
Intercept
Year						0.52
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit Statistics arcsin I.scapularis nymphs rate

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
	0.0477
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for arcsin I.scapularis nymphs rate

Term	SS	DF	MS	F	p-value
Year
Residuals
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square.

Model results for arcsin I.scapularis nymphs rate

Term	B	SE	Lower	Upper	t	p-value
Intercept	−6.221	11.023	−31.157	18.715	−0.564	0.5863
Year	0.003	0.005	−0.009	0.016	0.608	0.5579
SE = Standard error; Lower = lower confidence interval; Upper = upper confidence interval.

Fit statistics for arcsin I.scapularis nymphs rate

R	R2	R2Adj	AIC	RMSE	F	DF1	DF2	p-value
0.199	0.040	−0.067	−27.826	0.052	0.370	1	9	0.5579
R2 Adj = Adjusted R2; AIC = Akaike Information Criterion; RMSE = The Root Mean Squared Error; DF1 = Degrees of freedom for the model; DF2 = Degrees of freedom for the residuals.

ANOVA Table for arcsin I.scapularis nymphs rate

Term	SS	DF	MS	F	p-value
Year	0.001	1	0.001	0.370	0.5579
Residuals	0.030	9	0.003
SS = Sum of Squares; DF = Degrees of freedom; MS = Mean Square; Calculated using type III SS.

Visualisation of regression model

Change in R²

O_R2	R_R2	Change.R2	Reproduce.R2
0.048	0.0395	−0.0082	Not Reproduced
O_R2 = original R2; R_R2 = reproduced R2; Change.R2 = change in R_R2 - O_R2

Change in p-values

Term	O_p	R_p	Change.p	Reproduce.p
Intercept		0.5863
Year	0.52	0.5579	0.0379	Not Reproduced
O_p = original p-value; R_p = reproduced p-value; Changep = change in p-value R_p - O_p; Reproduce.p = p-values reproduced. SigChangeDirection = statistical significance and B change between original and reproduced models. Note, p-values that were <0.01 were set to 0.0099 for the purposes of comparison.

Results for p-values

The p-value was not reproduced.

Conclusion computational reproducibility

This model was not computationally reproducible.

References

Disclosure

Methods used in our reproducibility analyses

Summary from statistical review

Data availability and software used

Regression sample

Computational reproducibility results

Inferential reproducibility results

Recommended changes

Model 1

Model results for log I.scapularis

Fit statistics for log I.scapularis

ANOVA table for log I.scapularis

Model results for log I.scapularis

Fit statistics for log I.scapularis

ANOVA table for log I.scapularis

Visualisation of regression model

Change in R2

Change in p-values

Results for p-values

Conclusion computational reproducibility

Model 2

Model results for log D.variabilis

Fit statistics for log D.variabilis

ANOVA table for log D.variabilis

Model results log D.variabilis

Model fit for log D.variabilis

ANOVA table for log D.variabilis

Visualisation of regression model

Change in R2

Change in p-values

Results for p-values

Conclusion computational reproducibility

Model 3

Model results for arcsin I.scapularis nymphs rate

Fit Statistics arcsin I.scapularis nymphs rate

ANOVA Table for arcsin I.scapularis nymphs rate

Model results for arcsin I.scapularis nymphs rate

Fit statistics for arcsin I.scapularis nymphs rate

ANOVA Table for arcsin I.scapularis nymphs rate

Visualisation of regression model

Change in R2

Change in p-values

Results for p-values

Conclusion computational reproducibility

Change in R²

Change in R²

Change in R²