Social vulnerability and Lyme disease incidence: A regional analysis of the United States, 2000-2014

Background: Lyme disease (LD), which is a highly preventable communicable illness, is the most commonly reported vector borne disease in the USA. The Social Vulnerability Index (SoVI) is a county level measure of SES and vulnerability to environmental hazards or disease outbreaks, but has not yet been used in the study of LD. The purpose of this study was to determine if a relationship existed between the SoVI and LD incidence at the national level and regional division level in the United States between 2000 and 2014. Methods: County level LD data were downloaded from the CDC. County level SoVI were downloaded from the HVRI at the University of South Carolina and the CDC. Data were sorted into regional divisions as per the US Census Bureau and condensed into three time intervals, 2000-2004, 2005-2009, and 2010-2014. QGIS was utilized to visually represent the data. Logarithmic OLS regression models were computed to determine the predictive power of the SoVI in LD incidence rates. Results: LD incidence was greatest in the Northeastern and upper Midwestern regions of the USA. The results of the regression analyses showed that SoVI exhibited a significant quadratic relationship with LD incidence rates at the national level. Conclusion: Our results showed that counties with the highest and lowest social vulnerability were at greatest risk for LD. The SoVI may be a useful risk assessment tool for public health practitioners within the context of LD control.


INTRODUCTION
Lyme disease (LD) is a tick-borne illness [1][2][3][4][5][6][7].Ticks carrying the LD bacterium are often found on humans in hard-to-see areas -such as the armpits, groin, or scalp -during the spring and summer months, and require a minimum of 36 to 48 hours to transmit the B. burgdorferi bacterium [8].LD has a low mortality rate, and if it is listed on a death certificate, is generally only listed as one of several contributing causes [9]; however, it does leave many sufferers with long-lasting multi-organ-system effects, neurological and musculoskeletal symptoms, including difficulty with memory and articulation [10][11][12][13][14], cardiac disease, and Lyme-induced joint inflammation [15,16].Zhang et al. [17] estimated that a LD patient, diagnosed with either early-or late-stage disease, accumulates nearly 3,000 USD in direct medical costs.Given that there are approximately 30,000 cases reported to the CDC each year, the yearly economic impact of LD is estimated to be 200 million USD per year [8].
LD is treatable with antibiotics.Treatment, in general, is much more effective if initiated in the early stages of infection [18].In cases that are not identified, or in which treatment is not initiated until late-stage disease (as is the case in many socially vulnerable/deprived regions in the USA), the course of treatment may be longer, more expensive, and include more intensive therapies [19].While further research regarding LD diagnostic and treatment protocols is needed, possibly the simplest, least expensive, and most effective way to reduce the burden of LD is to reduce the number of new infections.Personal prevention of LD includes protective behaviors, such as inspecting the body for tick attachment, using insect repellants with DEET or permethrin, and wearing protective clothing, whereas environmental prevention techniques include geographic application of acaricides, control of deer populations, and transformation of tick habitats [20].
Despite its highly preventable nature, LD is the most common vector borne illness in the USA [21] and a significant public health problem [22].Expansion in the distribution of LD has been observed since the inception of the National Notifiable Diseases Surveillance System (NNDSS) established by the Centers for Disease Control and Prevention (CDC) in 1991 [9].While the CDC has reported that an average of 30,000 cases of LD occur each year, true incidence -accounting for weaknesses in surveillance and inability of some populations to seek medical treatment -could be "greater than 300,000 cases and as high as one million cases per year in the United States" [23].
According to CDC surveillance data, 95 percent of LD cases in 2013 were reported from only 14 states, each of which is located in one of two regions: the northeastern United States or the midwestern United States [8].The geographic distribution of LD infection is closely related to the range of the tick vector, and as the tick's range expands with changing climate, so too does the number of counties reporting a high incidence of LD.The number of high-incidence counties has undergone a remarkable increase in recent decades, expanding outward from the aforementioned geographic centers [9].The rise and fall of tick populations in a particular region are also associated with fluctuations in the local deer population -which is in turn affected by local food and space availability, predation, and wildlife management [24].In addition to understanding these geographic and environmental factors, the study of LD epidemiology might benefit from an understanding of social factors that influence risk of infection.Social vulnerability is one construct that, like LD incidence, has been measured at the county level, and could be used to examine the impact of social factors on LD incidence rates.
Social vulnerability is a measure of a population's susceptibility to adverse outcomes following a negative event.For example, a socially vulnerable population will suffer more loss and hardship following a natural disaster than a population that is more socially secure.Factors that have been identified to affect social vulnerability include socioeconomic status, gender, age, race and ethnicity, housing status (including urban vs. rural location), occupation, level of education, and access to medical services.More broadly, a higher level of social vulnerability is associated with a lack of political power, low levels of access to resources, limited physical ability, and a lack of social connections.In order to quantify a particular populations' level of vulnerability while accounting for the factors listed above, a composite measure has been developed called the Social Vulnerability Index (SoVI).
Cutter, Boruff, and Shirley [25], authors of the SoVI, examined United States Census data for all 3,141 counties in the United States of America (USA) and, from an initial list of over 250 variables, identified 11 social conditions with principal components factor analysis that explained 76.4 percent of the variance in social vulnerability among counties.Social conditions have been implicated as an upstream contributor to the development of many diseases, both infectious and non-infectious.Studies conducted since the 1960s and 1970s have shown that social indicators are important predictors of longevity and quality of life [25].The SoVI captures several social and economic variables in a single index, permitting researchers and practitioners the ability to rapidly assess county level socioeconomic status and vulnerability.It is plausible that the SoVI, then, may correlate in some way with LD risk [26], although studies to date have not examined the aforementioned relationship as operationalized in the present study.

Background
In an effort to control the incidence and distribution of LD in the USA and provide public health professionals with the information needed to prepare their respective communities, several epidemiological studies have been conducted.Numerous risk factors and determinants of LD have been identified in correlational studies.In the present literature review, the following correlates are discussed: climatic factors, regional faunal characteristics, dendrological factors, human knowledge, attitudes, and behavior, encroachment of suburban areas into deciduous forests, and socioeconomic factors.
Epidemiology Biostatistics and Public Health -2017, Volume 14, Number X Social vulnerability and Lyme disease incidence: A regional analysis of the United States, 2000-2014 Early studies showed that ambient temperature patterns, which are characteristic of the Northeastern and Midwestern regions, are hospitable to the survival rates of ticks and thus the incidence of LD [27].Thus, high temperatures may be a regulating mechanism in tick abundance and the distribution of LD [28]; however, more recent studies have shown that lagged summer temperatures are positively correlated with the reproductive activity of blacklegged ticks and LD incidence, indicating that LD incidence may increase as summer temperatures in the year prior increase [29,30,31].
McCabe and Bunnell [32] showed that LD incidence increased linearly with late spring/early summer precipitation, as ticks favor wetter conditions.Other studies have shown that lagged summer precipitation is negatively related with LD incidence, as abundant moisture may result in the proliferation or efficacy of natural tick enemies -proving detrimental to nymphal tick survival [31].In an effort to resolve these apparent discrepancies, Tran and Waller [33] investigated the divisional affects of precipitation on LD incidence within the Northeastern region of the US and showed that while some regions exhibit a positive correlation (i.e., New England), other regions exhibit a negative correlation (i.e., Mid-Atlantic).
Dendrological variation may influence regional faunal characteristics [34].This interaction has been hypothesized as a critical determinant of the distribution of LD.The abundance of Oak trees and, thus, the ubiquity of acorns, in the Northeast ensures that populations of vertebrae species on which ticks feed have a life supporting resource [35].Research has shown that acorn production t-2 years is strongly and positively correlated with rodent populations [30,36].As the community of blacklegged tick hosts increases in density, the likelihood of tick population growth increases [30].Therefore, acorn crop yields may serve as an important correlate of LD.
The following preventative measures provide protection against LD: avoiding forested areas, wearing protective clothing, performing tick checks, and using insect repellants [20].Several studies have shown that knowledge of and attitudes towards LD and the aforementioned preventive behaviors are high/appropriate in LD endemic regions [37,38,39].According to the Health Belief Model, knowledge and attitudes are positive determinants of preventive behavior [40].Thus, residents in endemic areas, theoretically, should possess the motivation to protect themselves against LD; however, adherence to LD preventative behaviors among populations in these areas has not been observed [20].
Low-density suburban sprawl (SS) has been suspected as a risk factor for the development of LD [41].SS, a phenomenon that has been particularly prevalent in Midwestern and Northeastern divisions, has been described as the migration away from urban areas to and subsequent development of forested areas [42].Suburban development that fails to preclude significant forest-herbaceous edges -and only partially fragments woodland areas -permits peridomestic contact with the habitats in which tick populations are characteristically found [43].Therefore, low-density SS puts residents at risk of being bitten by an infected nymphal tick [44].
In addition to ecological and behavioral factors, socioeconomic status (SES) has been hypothesized as a predictor of LD [45].Systematic reviews of the literature have shown that SES, often measured as income, plays a significant role in the distribution of LD [46].Some studies have shown that individuals with higher SES exhibit higher risk for LD than individuals in lower SES classes.For example, Gould et al. [20] revealed that a greater proportion of LD cases were reported in counties with the highest annual median incomes.Cromley and Cromley [47] found similar results.
Other studies, which employed curvilinear modeling techniques, showed that the relationship between SES and LD incidence was quadratic.Specifically, results of two studies have shown that the greatest risk for LD has historically been distributed among those with the least household income and the highest household income [41,48].Theoretical reasoning for the aforementioned relationships has been provided in the literature (49,50).
Social vulnerability has been described as a "lack of access to resources (including information, knowledge, and technology), limited access to political power and representation, social capital … beliefs and customs … building stock and age, frail and physically limited individuals, and density of infrastructure" [25, p. 245].Research has shown that the above vulnerability indicators are often characteristic of particular demographic groups, as defined by gender, race, age, or SES [51,52,53].An index of social vulnerability, called the Social Vulnerability Index (SoVI), was developed by researchers at the Hazards and Vulnerability Research Institute at the University of South Carolina [54] based on the Hazards of Place model [55].Applications of the SoVI include risk assessment in the incidence of natural or human-caused disasters as well as disease outbreaks [56]; however, hitherto the present paper, the SoVI has only been used in the study of the former [57][58][59][60][61][62][63][64].Application of the SoVI to LD may afford public health professionals the ability to -with a single planning index -understand their community's level of risk.

Purpose
The SoVI provides researchers with a more robust index for understanding SES related risk than has been utilized in previous studies.Because LD symptoms are often hidden -leading to misdiagnosis -the SoVI has the potential to improve health outcomes primary to greater clarity of disease distribution.The purpose of the present study was twofold: (1) to determine if a relationship existed between the SoVI and LD incidence at the national level and regional division level in the United States between 2000 and 2014, and (2) to determine whether or not the aforementioned relationship, if significant, was consistent across three consecutive five-year time intervals.

METHODS
Data on annually reported LD cases in the USA between 2000 and 2014 were obtained at the county level from the CDC [21].Incidence rates per 100,000 were calculated for each county in five-year time intervals, as in Kugeler  The aforementioned population data was retrieved from the United States Census Bureau [66].Secondly, the 2000 and 2005-2009 social vulnerability indices (SoVI) were downloaded for every county in the USA from the HVRI [54], while the 2010 SoVI was downloaded from the CDC [67].For a more detailed description of the methodology used to produce the SoVI, please refer to Cutter, Boruff, and Shirley [25] and the CDC [67].
Following retrieval of data, we sorted the county level data into nine regional groupings according to the United States Census Bureau's regional division methodology: (1) [68].We removed counties from the data set prior to calculation of inferential tests if (a) the incidence rate was zero [69] and (b) the county population was below 100,000 [70].
To the extent that LD incidence rates were skewed in each regional division at each time interval, a Briggsian logarithmic transformation was applied to the LD data [71].After further study of the logarithmic LD data, we realized -owing to the application of a Kolmogorov-Smirnov test -that the data failed to conform to the Gaussian distribution.To that end -and also because of the common underreporting of LD cases and other weaknesses associated with LD surveillance [21] -all inferential statistical techniques were conducted using bootstrapping with 1,000 resamples [72].
Geometric means of county incidence rates were computed for each USA Census regional division in order to obtain a geographic profile of (a) LD incidence for the periods [2000][2001][2002][2003][2004][2005][2006][2007][2008][2009], and 2010-2014 and (b) social vulnerability within each of the same three time periods.QGIS version 2.14 [73] was used to produce visual representations of the means for LD incidence rates and the SoVI at each time interval within each Census division.Pearson correlation coefficients were generated between LD incidence rates and social vulnerability at each time interval so as to manufacture an understanding of the relationship between these two variables at the US Census regional division level.Ordinary least squares (OLS) regression models [74] for each five-year time interval were constructed in an attempt to understand whether or not social vulnerability (SoVI) could predict the incidence of LD.Specifically, the SoVI was the independent variable in each model and LD incidence rates (IR) served as the dependent variable.SoVI and LD datasets were matched spatially and temporally.All models were inspected for a second-degree polynomial given the existence of such relationships in previous studies [41,48].Given the Briggsian logarithmic transformation applied to the LD data, the predictive model for each Census division followed the ensuing format:

RESULTS
Means are shown below for LD incidence rates (Figure 1) and the SoVI (Figure 2) for the periods 2000-2004, 2005-2009, and 2010-2014 for each US Census regional division.The descriptive results showed that LD incidence rates were (a) most prevalent and temporally increasing in the Northeastern and Midwestern regions and (b) lowest in the Western and more centrally located Southern regions across the three time intervals.Expansion of the distribution of LD since 2000 was clearly evident in the Northeastern and Midwestern regions (Figure 1).Social vulnerability decreased in most regions of the USA across the three time intervals, with the exception of the Pacific division in the Western region (Figure 2).The least vulnerable areas in the USA were evident consistently in the northeastern divisions, while the most socially vulnerable geographic location was the West South Central division of the Southern region.
The results of the inferential analysis for 2000-2004 showed that, when aggregated to the national level, social vulnerability had a statistically significant negative relationship with LD incidence, F(1, 444) = 25.18,p < 0.001, R 2 = 0.05.At the regional division level, correlation coefficients and beta coefficients for the OLS regression models varied in their representation of the strength of association and prediction of LD incidence rates (Table 1).After Bonferroni adjustment of the per-comparison alpha level, only one model was statistically significant at the Epidemiology Biostatistics and Public Health -2017, Volume 14, Number X Social vulnerability and Lyme disease incidence: A regional analysis of the United States, 2000-2014 regional division level: the Middle Atlantic (MA) division of the Northeastern region, which exhibited a curvilinear relationship (Table 1).The statistically significant quadratic relationship between the SoVI and LD IR in the MA division indicated that the least and most socially vulnerable counties were at greatest risk for LD.
During the 2005-2009 time interval, social vulnerability -modeled as a quadratic function -explained a small proportion of the variability in LD incidence rates when aggregated to the national level, F(2, 477) = 27.88,p < 0.001, R 2 = 0.11.Table 2 summarizes some of the characteristic divisional differences in the strength of association and prediction of the SoVI for LD incidence.The results obtained from the OLS regression analysis showed that two models were statistically significant after Bonferroni correction to the per-comparison alpha level: the South Atlantic division in the Southern region, which demonstrated a quadratic relationship, and the Middle Atlantic division of the Northeastern region (Table 2).
After analysis of data from the 2010-2014 time

DISCUSSION
The purpose of the present study was twofold: (1) to determine if a relationship existed between the SoVI and LD incidence at the national level and regional division level in the United States between 2000 and 2014, and (2) to determine whether or not the aforementioned relationship, if significant, was consistent across three consecutive fiveyear time intervals.Although the total explained variance in our models was relatively low, our results showed that the SoVI could be used in the prediction of LD incidence.At the national level, the SoVI predicted a maximum of 11 percent of the variance in LD.The national models for each time interval, with the exception of the 2000-2004 timeinterval, were best fitted with a second-degree polynomial, indicating that counties with the lowest and the highest social vulnerability exhibited the greatest risk for LD.
One regional division exhibited statistically significant regression models at each of the three time intervals: the Middle Atlantic division.During the 2000-2004 time-interval, the aforementioned relationship was quadratic, while in the two more proximal timeintervals the aforementioned relationship failed to reach statistically significant changes in R 2 with the inclusion of a second-degree polynomial.In the 2005-2009 and the 2010-2014 time-intervals, the South Atlantic division exhibited significant quadratic relationshipsindicating that the greatest and least socially vulnerable counties in this division were at greatest risk for LD.A greater proportion of the variance in LD was accounted for in the regional division models than the national models.Specifically, while the greatest proportion of explained variance in LD at the national level was 11 percent, the regional division models demonstrated far greater explanatory power.Specifically, the statistically significant regional division models explained at least 24 percent of the variance in LD.
While the explained variance in each of the aforementioned models was low, the usefulness of the SoVI as a predictor of Lyme disease should not be dismissed for two primary reasons.First, as discussed in Abelson [75], models with low explained variance do not excuse their respective independent variables from significant influence on the dependent variable -especially in cases where the index grows over time, as was demonstrated between 2000-2004 and 2010-2014 in the present study.Specifically, the R 2 value for the national models increased by 0.03 from 2000-2004 to 2010-2014.Second, the present study is exploratory.The importance of socioeconomic status and vulnerability has been demonstrated; however, the inclusion of additional variables will be needed in future studies in order to comprehensively and more robustly estimate LD incidence at the county level.Candidate variables for future studies are discussed momentarily.

Agreement with Previous Research
Our results were consistent with previous studies on the relationship between social indicators and LD incidence [41,48].To the extent that our analysis indicated that the relationship between SoVI and LD incidence wasprimarily -quadratic, especially with regard to the most recently available data from the CDC, public health efforts should be directed accordingly.In particular, given the expanding distribution of LD in the Northeastern and Midwestern regions [9], focus should be directed towards highly and minimally vulnerable counties in these areas.
The SoVI in this case highlights possible environmental and social differences that influence variability in disease risk between geographically separated populations.The SoVI may be useful as a tool for identifying regions and target populations within which public health education will be most effective at reducing the incidence of disease.In areas where LD is prevalent, state and local health departments could host awareness campaigns to disseminate LD prevention information.Healthcare providers in these areas should be more vigilant about testing for LD, making early detection and treatment more likely.The combination of prevention, early detection, and treatment -afforded by awareness of a heightened risk of LD in a particular region or population -should reduce LD associated morbidity and economic loss.

Limitations and Future Directions
This study is the first to investigate the relationship between social vulnerability, as measured by the SoVI [54], and LD incidence in the USA.While our analysis yielded consistently significant results at three time intervals, the regression models presented herein should be interpreted with caution owing to the presence of several limitations.First, our analysis was limited in that it was conducted with county-level, population data.Second, to the extent that LD cases may be underreported [24], our geographic depiction of the distribution of LD may have been conservative.
Third, we did not account for population transience.Fourth, the scope of the present study was limited to the relationship between social vulnerability and LD incidence.Because we only included one independent variable in our analysis, the total explained variance in our regression models was low.Several other variables have been shown to exert influence on the development of LD.Future studies should be conducted with a more comprehensive focus at finer levels of resolution -that is, future studies should examine the influence of weather, forestation, and, perhaps, social media influence, in addition to social vulnerability.The World Health Organization has described several areas internationally, besides the USA, that provide a hospitable environment for the development of Lyme disease [76]: Asia, central, north-western, and eastern Europe.The present study provided an investigation of the USA only; therefore, future studies should consider the relationship between social vulnerability and LD in Asia and Europe.
Given the increasing distribution of Lyme disease, coupled with the fact that LD is the most common vectorborne disease in the USA, policy-level efforts directed toward the prevention of LD are needed.First, research is needed for the development of an effective LD vaccine, as such an intervention could provide the greatest population health benefits, especially for residents in the greatest and least vulnerable counties in the USA [77].Second, the promotion of entomological based approaches to prevention, including the reduction of the LD vector, could minimize the national LD burden.Third, more directed efforts are needed toward ensuring the equal dissemination of benefits from legislation developed to create a tick-borne advisory committee in the USA [78].Fourth, because policy and planning efforts among government officials often occur at the county and census tract level, the SoVI provides a useful tool for Lyme disease risk assessment [59].The SoVI illustrates that vulnerability and, in particular, capacity for responsiveness to disease outbreaks (Lyme disease specifically) is not uniform across counties in the United States.Resources -such as those that might advance the second policy implication referenced above -should be distributed and prioritized based on an understanding of the SoVI, as such efforts will minimize the risk for the expansion of human Lyme disease.
et al.'s [65] study of Lyme disease, so as to attenuate the effect of county-level transiency and modifications to surveillance techniques: 2000-2004, 2005-2009, and 2010-2014.Incidence rates within each time-interval were standardized based on the standard population of the base year in each interval (i.e., the 2000-2004 time-interval was standardized based on the 2000 standard population, the 2005-2009 timeinterval was standardized based on the 2005 standard population, and the 2010-2014 time-interval was standardized based on the 2010 standard population).
Note.SoVI was used as the predictor variable for each model.Bootstrap results are based on 1,000 bootstrap samples.All regression models are based on a natural logarithmic transformation of the Lyme disease incidence rates from 2000-2004.Bonferroni adjusted alpha value = 0.005.a Quadratic models shown above exhibited statistically significant (< 0.05) changes in R 2 values when the squared term was entered into the model.

TABLE 2 .
OLS Regression Models for Lyme disease by US Census Region Divisions (2005-2009).Note.SoVI was used as the predictor variable for each model.Bootstrap results are based on 1,000 bootstrap samples.All regression models are based on a natural logarithmic transformation of the Lyme disease incidence rates from 2005-2009.Bonferroni adjusted alpha value = 0.005.a Quadratic models shown above exhibited statistically significant (< 0.05) changes in R 2 values when the squared term was entered into the model.

TABLE 3 .
OLS Regression Models for Lyme disease by US Census Region Divisions (2010-2014).Note.SoVI was used as the predictor variable for each model.Bootstrap results are based on 1,000 bootstrap samples.All regression models are based on a natural logarithmic transformation of the Lyme disease incidence rates from 2010-2014.Bonferroni adjusted alpha value = 0.005.a Quadratic models shown above exhibited statistically significant (< 0.05) changes in R 2 values when the squared term was entered into the model.