2  Background

2.1 Overview

The section begins with an overview of the spatial methods for mapping mortality for small areas and describes some examples of applications from the UK and beyond looking at mortality for small subnational regions. There is a brief history of the Small Area Health Statistics Unit (SAHSU), who manage the mortality data used in the thesis and have developed and applied many of the spatial methods discussed.

This is followed by a history of separating total mortality into different causes of death and the epidemiologic transition theory.

The chapter finishes by exploring the picture of inequalities in the UK over the past few decades through to the present, focussing on class, income, geography, and deprivation.

2.2 Mapping mortality and disease at small areas

Many studies compare the prevalence of diseases or mortality in different subgroups of the population by dividing the population geographically into small areas. The number of cases, or number of deaths, in an area is likely to be small. This sparseness issue is even more pertinent when the population is further stratified by age group. When calculating rates of incidence from the observed data, there is an apparent variability between spatial units, which is often larger than the true differences in risk due to the noise in the data. To overcome these issues, we can use statistical smoothing techniques to obtain robust estimates of rates by sharing information between strata.

2.2.1 Disease mapping methods

In small-area studies, it is common to smooth data using models with explicit spatial dependence, which are designed to give more weight to nearby areas than those further away. There are three main categories for modelling spatial effects. First, we can treat space as a continuous surface using Gaussian processes or splines. Second, we can use hierarchical models for areal data, which make use of the spatial neighbourhood structure of the units. Third, we can again use hierarchical models for areal data but instead we can exploit a nested hierarchy of geographical units, for example between state, county and census tract in the US. Each of these methods, which can be used separately or in combination if the context of the problem allows, rely on assumptions which may make them more or less appropriate in different applications.

Space as a continuous process

In the context of disease mapping, events are usually aggregated to areas rather than assigned specific geographical coordinates. Wakefield and Elliott (1999) model aggregated counts as realisations of a Poisson process, in which the expected number of cases is calculated by integrating a continuous surface that generates the cases over the area of the spatial unit. The surface was a function of spatially-referenced covariates. Kelsall and Wakefield (2002) describe an alternative model, where the log-transformed risk surface is modelled by a Gaussian process, whose correlation (or “kernel”) function depends on distance.

Best et al. (2005) provide a review of the use of hierarchical models with spatial dependence for disease mapping. In particular, the authors focus on Bayesian estimation, and different classes of spatial prior distributions.

The first prior proposed for spatial effects \(\mathbf{S} = {S_1, ..., S_n}\) is the multivariate normal \[ \mathbf{S} \sim \mathcal{N}(\pmb{\mu}, \pmb{\Sigma}), \tag{2.1}\]

where \(\pmb{\mu}\) is the mean effect vector, \(\pmb{\Sigma} = \sigma^2 \pmb{\Omega}\) and \(\pmb{\Omega}\) is a symmetric, positive semi-definite matrix defining the correlation between spatial units. A common choice when specifying the structure of the correlation matrix is to assume a kernel function that decays with the distance between the centroids of the areas, so that places nearby in space share similar disease profiles. Note, this is mathematically equivalent to the practical implementation of a Gaussian process for a specified set of spatial locations, which uses a finite set of points. An example in Elliott et al. (2001b) chooses an exponential decay function to map cancer risk in northwest England. Kernel functions based on distance, however, do not allow the variability of the spatial surface to change with location. Paciorek and Schervish (2006) describe nonstationary extensions to common kernel choices in spatial settings with this property.

Space as discrete units

A more popular prior is the conditional autoregressive (CAR) prior, also known as a Gaussian Markov random field (GMRF), which was first introduced by Besag et al. (1991). These form a joint distribution as in Equation 2.1, but the covariance is usually defined instead in terms of the precision matrix \[ \mathbf{P} = \pmb{\Sigma}^{-1} = \tau(\mathbf{D} - \rho \mathbf{A}), \tag{2.2}\] where \(\tau\) controls the overall precision of the effects, \(\mathbf{A}\) is the spatial adjacency matrix formed by the small areas, \(\mathbf{D}\) is a diagonal matrix with entries equal to the number of neighbours for each spatial unit, and the autocorrelation parameter \(\rho\) describes the amount of correlation. This can be seen as tuning the degree of spatial dependence, where \(\rho = 0\) implies independence between areas, and \(\rho = 1\) full dependence. The case with \(\rho = 1\) is called the intrinsic conditional autoregressive (ICAR) model. There sometimes exists further overdispersion in the residuals that cannot be modelled by purely spatially-structured random effects. Besag et al. (1991) proposed the model (hereafter called BYM) \[ S_i = U_i + V_i, \tag{2.3}\] where \(U_i\) follow an ICAR distribution, and \(V_i\) are independent and identically distributed random effects. The addition of the spatially-unstructured component \(V\) accounts for any non-spatial heterogeneity.

Space as a nested hierarchy of geographies

The relationships between different levels of a hierarchy of geographical units are often incorporated into models as a nested hierarchy of random effects. These models account for when spatial units lie within common administrative boundaries. This is often a desirable property of the model for certain geographies, like states in the US, which are administrative. Policy is decided at these geographies, so there is reason to believe these boundaries may have a greater effect on health outcomes than spatial structure. Finucane et al. (2014) demonstrate how country-level blood pressure can be modelled by exploiting the hierarchy of global, super-region, region and country. Note, although these models group by geographical region, these models are not spatial as they do not contain any information on the relative position of the areas.

Of the two specifications that are spatial, either as a continuous process or discrete units, the Markov random field priors are often preferred for computational reasons, as we can exploit the sparseness of the adjacency matrix in our inference algorithms rather than computing the covariance between each pair of spatial units as in the general case of Equation 2.1. There are concerns, however, that the GMRF representation of space as an adjacency matrix, which was originally proposed for a regular lattice of pixels in image analysis (Besag et al., 1991), is reductive for more complicated spatial problems. Despite this, in an epidemiological context, Duncan et al. (2017) found the standard ICAR model with binary, first-order neighbour weights outperformed models with a variety of different weighting schemes, including matrix weights based on higher-order degrees of neighbours, distance between neighbours, and distance between covariate values.

In applications to disease mapping, spatial models are the natural choice when the disease exhibits a spatial pattern. This is the case for mortality from infectious diseases, particularly on short timescales like Covid-19 (Konstantinoudis et al., 2022). Nested hierarchies are a more suitable choice when administrative areas are meaningful and have an effect on the health outcomes of the population. For example, state-specific abortion laws in the USA could affect maternal mortality, and so a model should include an effect for each state.

Modelling variation beyond space

As computational power has improved, it has become feasible to model patterns over other features of the population, such as time period and age group. Trends over time can be modelled as linear through slopes, or using nonlinear effects which allow neighbouring time points to be alike, the simplest of which is a first-order Gaussian random walk process. All-cause mortality varies smoothly over ages, following a characteristic J-shape with higher mortality in the infant and older age groups (Preston et al., 2001), and therefore can be modelled using a nonlinear process such as a random walk.

Difficulties arise when considering interactions between the space, age, and time variables. One can imagine situations in which different spatial units will have different age patterns in disease rates. For example, if certain age groups were vaccinated against disease in that spatial unit before others. There are also social or behavioural risk factors, such as alcohol consumption or smoking rates, that are likely to exhibit different age patterns over space. After implementing a base model with the main effects, the question is how to model additional terms which account for the interactions between the variables. Space-time interactions could range from fully independent, to each spatial unit having independent temporal patterns, to inseparable space-time variation where interactions borrow strength across neighbouring spatial units and neighbouring time periods (Knorr-Held, 2000).

However, it should be considered that by breaking the population down into smaller and smaller subgroups through space, age and time period, the counts of cases become more sparse and there is a need for stronger smoothing to produce robust estimates, particularly for data that are already at the small-area level. Although interaction effects are plausible, modellers should consider whether there is evidence for the interaction in the data or whether they can simplify the model if the interaction effect turns out to be negligible.

It should be noted that there are situations where statistical smoothing would not be appropriate. There might be true variability in the data which a smoothing model would conceal. For example, certain spatial units might contain isolated populations with high mortality over a sustained period, such as counties with Native American reservations in the USA (Dwyer-Lindgren et al., 2017a). There can also be spatially- and temporally-specific events that cause a spike in mortality such as the Grenfell Tower fire in 2017. Without accounting for these events, the models described above would either attenuate their effect on mortality, or a spike in deaths would cause estimates of mortality in nearby spatial units or years to be erroneously high. Beyond the use of subject matter experts, posterior predictive checks and plots of modelled death rates against the observed data can help to identify outlier spikes in mortality which are specific to a particular time or place, and which we do not want our model to smooth.

2.2.2 Applications of disease mapping methods

Small-area analyses of mortality

In order to compare the health status between areas, health authorities require a measure of mortality that collapses age-specific information into a single number. Indirectly standardised measures such as the standardised mortality ratio – the ratio between total deaths and expected deaths in an area – are easy to calculate, but are not easily understood by laypeople. Directly standardised methods, in contrast, require knowledge of the full age structure of death rates rather than just the total number of deaths. Age-standardised death rates, however, suffer the same interpretability issue as the standardised mortality ratio, and are only comparable between studies if the same reference population is used. An alternative choice is life expectancy. Silcocks et al. (2001) explain that life expectancy is a “more intuitive and immediate measure of the mortality experience of a population, [and] is likely to have greater impact… than other measures that are incomprehensible to most people.” However, although the metric appears more interpretable, life expectancy at birth constructed from a period life table is often misinterpreted as the mean length of life of the cohort into which the newborn is born. In fact, it measures the expectation of life assuming that the newborn will be exposed to age-specific mortality conditions throughout their life that are exactly the same as the current population.

The estimation of death rates requires two data sources: deaths counts and populations. Modern death registration systems, such as that of the UK, are almost entirely complete and accurate. On the other hand, although usually treated as a known quantity, the population denominator is often problematic. Populations for small geographies are only recorded during a decennial census, and estimates are generated for the years in-between using limited survey data on births, deaths and migration. And although the census is considered the “gold standard”, it is subject to enumeration errors, particularly for areas with special populations such as students or armed forces (Elliott et al., 2001b).

Beyond the population issue, finer scale studies are restricted by data availability. Where data are available, there is still the need to overcome small number issues before feeding death rates through the life table to calculate life expectancy. Eayres and Williams (2004) recommend a minimum population size of 5000 when using traditional life table methods, below which the calculation of life expectancy is unstable1, or the error estimates become so large that any comparison between subgroups becomes meaningless. One approach, often taken by statistical agencies, is to build larger populations by either aggregating multiple years of data (Bahk et al., 2020; Office for National Statistics, 2015; Public Health England, 2021) or combining spatial units (Ezzati et al., 2008). Here, we focus on studies using Bayesian hierarchical models to generate robust estimates of age-specific death rates by recognising the correlations between spatial units and age groups, which produce more accurate estimates for small population studies of life expectancy (Congdon, 2009; Jonker et al., 2012).

Jonker et al. (2012) demonstrated the advantages of the Bayesian approach for 89 small areas in Rotterdam using a joint model for sex, space and age effects, finding a 8.2 year and 9.2 year gap between the neighbourhoods with the highest and lowest life expectancies for women and men. Stephens et al. (2013) employed the same model for 153 administrative areas in New South Wales, Australia.

Bayesian spatial models for mortality have been scaled to small areas for entire countries, and also consider trends in these regions over time. Bennett et al. (2015) forecasted life expectancy for 375 districts in England and Wales using a spatiotemporal model trained over a 31 year period, and Dwyer-Lindgren et al. (2017a) explored mortality trends in 3110 US counties from 1980 to 2014.

There have also been studies on specific cities at a finer resolution. In order to improve estimates for disability-free life expectancy, Congdon (2014) considered both ill-health and mortality in a joint likelihood with spatial effects for 625 wards in London, finding more than a two-fold variation in the percent of life spent in disability for men. Bilal et al. (2019) looked at 266 subcity units for six large cities in Latin America. As there is no contiguous boundary in this case, a random effects model for each city was used instead of a spatial model. The largest difference between the top and bottom decile of life expectancy at birth was 17.7 years for women in Santiago, Chile.

Two studies in North America have looked below the county level, at census tracts, with wide-ranging population sizes as small as 40. Dwyer-Lindgren et al. (2017b), using a model that relied heavily on sociodemographic covariates, studied trends for life expectancy and many causes of death for 397 tracts in King County, Washington, uncovering an 18.3 year gap in life expectancy for men. Using the same model for Vancouver, Canada, Yu et al. (2021) found widening inequalities over time and a difference of 9.5 years for men.

Small Area Health Statistics Unit

In 1983, a documentary on the radioactive fallout from a fire at the Sellafield nuclear site in Cumbria claimed that there was a ten-fold increase in cases of childhood leukaemia in the surrounding community. This anomaly had gone undetected by public health authorities, raising concern that routinely collected data were not able to identify local clusters of disease. The subsequent enquiry confirmed the excess, and recommended that a research unit was set up to monitor small-area statistics and respond quickly to ad hoc queries on local health hazards. The Small Area Health Statistics Unit (SAHSU) was established in 1987 (Elliott et al., 1992).

Beyond producing substantive research on environment and health, a core aim of SAHSU is to develop small-area statistical methodology (Wakefield and Elliott, 1999) for:

  • Point source type studies. Is there an increased risk close to an environmental hazard? SAHSU has investigated increased mortality from mesothelioma and asbestosis near Plymouth docks (Elliott et al., 1992); excess respiratory disease mortality near two factories in Barking and Havering (Aylin et al., 1999); kidney disease mortality near chemical plants in Runcorn (Hodgson et al., 2004); and possible excess of several morbidities near landfill sites (Elliott et al., 2001a; Jarup et al., 2007, 2002b).
  • Geographic correlation studies. Is there a correlation between disease risk and spatially-varying environmental variables? SAHSU have looked at several exposures, including a plume of mercury pollution (Hodgson et al., 2007); mobile phone base stations during pregnancy (Elliott et al., 2010); noise from aircraft near Heathrow (Hansell et al., 2013); road traffic noise in London (Halonen et al., 2015); and particulate matter from incinerators during pregnancy (Parkes et al., 2020).
  • Clustering. Does a disease produce non-random spatial patterns of incidence? If the aetiology is unknown, this could suggest that the disease is infectious.
  • Disease mapping. Summarising the spatial variation in risk.

SAHSU has been at the forefront of both methodology and applications in disease mapping. Aylin et al. (1999) mapped diseases for wards in Kensington, Chelsea and Westminster using a model that smoothed rates towards the mean risk across the region. Thereafter, SAHSU published a plethora of studies for disease mapping models with explicit spatial dependence, including using the BYM model (Equation 2.3) to map spatial variation in the relative risk of testicular (Toledano et al., 2001) and prostate (Jarup et al., 2002a) cancers for small areas in regions of England. In a landmark piece bringing together work on disease mapping and environmental exposures, SAHSU published an environment and health atlas for England and Wales, showing the spatial patterns of 14 health conditions at census ward level over an aggregated 25 year period alongside five environmental exposure surfaces (Hansell et al., 2014).

Further disease mapping studies at SAHSU using spatially structured effects have also extended the methodology to look at age patterns and trends over time. Asaria et al. (2012) analysed cardiovascular disease death rates by fitting a spatial model for all wards in England separately for each age group and time period. Bennett et al. (2015) designed a model to jointly forecast all-cause mortality for districts in England by age group and year. The model used BYM spatial effects and random walk effects over age and time to capture nonlinear relationships.

2.3 Mortality from specific causes of death

2.3.1 The Epidemiologic Transition

In the mid-twentieth century, a team in the US Public Health Service, led by Iwao Moriyama, began investigating the cause-specific composition of mortality into all diseases and injuries for the first half of the century. Moriyama and Gover (1948) grouped vital registration data into primary causes. Notably, they found, as the US saw an overall downward trend in mortality, the leading causes of death changed from communicable diseases, such as tuberculosis and diphtheria, toward non-communicable, “chronic diseases of older ages”, such as heart diseases and cancers. The success of the reduction – and in the case of typhoid fever, near-elimination – of infectious diseases was attributed to the strategy of the health officer in the early 1900s, who was focussed on improving water and sanitation, and public health interventions such as immunisation and quarantines.

By comparing vital registration data over several centuries, Abdel Omran observed this shift of mortality from communicable to non-communicable diseases (NCDs) in many countries (Omran, 1977, 1971). Although the pace and determinants of the transition varied between countries, Omran was able to formalise three common successive stages of the shift in mortality:

  1. The Age of Pestilence and Famine. Mortality is high and largely governed by Malthusian “positive checks” – epidemics, famines, and wars.
  2. The Age of Receding Pandemics. Mortality decreases as epidemics become less frequent, but infectious diseases remain the leading causes of death.
  3. The Age of Degenerative and Man-made diseases. Mortality declines further along with fertility, increasing the average age of population and NCDs take over as the leading causes of death.

He termed this the Epidemiologic Transition theory. Omran (1971) explained that England and Wales took the classic transition path followed by western societies, whereby socioeconomic factors such as improvements to living standards are crucial in causing easily preventable diseases to subside and shifting towards the third phase of the transition, whilst medical and other public health technology only help society much later in the final stage. Later, Olshansky and Ault (1986) would propose a fourth stage to the theory, the Age of Delayed Degenerative Diseases, in which the structure of causes of death is stable, but the age at which degenerative diseases kill is postponed, thus decreasing older age mortality. There are, however, questions around the universality and unidirectionality of the theory, with many examples in which age-specific death rates for population subgroups have risen over time, most notably the HIV/AIDS pandemic (Gaylin and Kates, 1997). Gersten and Wilmoth (2002) also criticise the lack of attention Omran’s theory pays towards the role of infection in chronic and degenerative diseases, in particular certain cancers.

Around the same time as Omran, Preston collated cause-specific mortality data for a large number of populations, spanning 48 nations and nearly a century (Preston, 1970; Preston and Nelson, 1974). This would enable international comparisons of groups of causes of death over different time periods, and a deeper understanding of the upward trends in life expectancy. In particular, by plotting cause-specific disease rates against overall mortality, Preston and Nelson (1974) saw that, over time, the contribution of infectious diseases to a particular level of mortality had become ever smaller. That is to say, as mortality declined, the contribution from infectious diseases also declined. Preston attributed this to an accelerating rate of medical progress guided by the “germ theory of disease”, which public health and science were not able to replicate for NCDs. Preston also traced the excess deaths in older males observed in western societies to cardiovascular diseases, cancer and bronchitis – a direct result of dramatic increases in cigarette smoking (Preston, 1970).

Since its first edition in 1990, the subject of international comparisons of the cause-specific composition of mortality has been the remit of the Global Burden of Disease (GBD) studies (Murray and Lopez, 1996). The studies aim to quantify and compare the burden of diseases, injuries, and risk factors, usually through cross-sectional methods but occasionally by examining trends and subnational populations (Dwyer-Lindgren et al., 2017a; Ezzati et al., 2008). An important innovation of the GBD study was the introduction of a hierarchical classification of groups of causes, with the broadest level divided into three groups: communicable, maternal, perinatal, and nutritional diseases (Group 1), NCDs (Group 2), and injuries (Group 3). Salomon and Murray (2002) made use of the wide-ranging dataset and grouping from the GBD to revisit the epidemiologic transition for the second half of the twentieth century. They found the majority of the change in cause structure occurred among children, with a shift from Group 1 to Groups 2 and 3, and in young adults, where the role of injuries is more dominant for men.

2.3.2 Modelling cause-specific mortality

In studies looking at multiple causes of death it may be desirable to extend disease mapping models to capture the interdependence between causes of death. Cause-specific mortality can exhibit complex correlation structures, with correlations for diseases with common risk factors but anti-correlations for competing causes of death. For studies that already look at mortality for small areas and narrow age groups, breaking the population down further by cause of death increases the sparseness in the data, and it may be necessary to introduce terms to the smoothing model which share information between causes in order to stabilise the estimation of death rates.

The simplest and most computationally-scalable method is to ignore any correlations between diseases and run separate regression models for each cause of death. This is the approach taken by GBD studies for a vast array of causes of death. When looking at both total mortality and a mutually exclusive, collectively exhaustive list of causes of death, studies typically constrain the cause-specific death rate estimates from separate regressions to sum to the all-cause mortality value. This is because estimates of all-cause mortality are more robust, as the death counts are larger and therefore have lower variance, and the data do not suffer from errors in assignment of cause of death. The GBD studies scale death rates so that the proportions of each cause of death sum to unity. In the context of mortality projections, Wilmoth (1995) also points out that aggregating forecasts from multiple causes of death often leads to upward bias when compared to the total mortality forecast.

A joint modelling approach would allow the borrowing of strength across causes to express correlations between diseases that share common risk factors and similar aetiologies. There is also some redundancy in separate regression models as a unique spatial surface is specified for each cause. In the disease mapping literature, studies have built spatial models to look at a small number of diseases which share spatial components between diseases. Firstly, Knorr-Held and Best (2001) considered a model (also described in Best et al. (2005)) for two diseases, with disease-specific spatial components and one shared component. Held et al. (2005) generalised this approach to model any number of diseases as a weighted sum of shared components. Rather than a general approach, both Downing et al. (2008) and Mahaki et al. (2018) allocated the spatial components to the diseases a priori based on knowledge of the common risk factors between cancers such as smoking, obesity and alcohol consumption. However, unless the components are pre-specified based on prior knowledge, the required number of shared components to capture variation in the data needs to be determined through trial and error. This is especially problematic when the number of diseases in the study, and hence the possible number of combinations of shared components, increases.

Foreman et al. (2017) modelled a larger number of causes of death to jointly forecast cause-specific mortality for states in the US. The model was similar to the spatiotemporal models described earlier, with random walk effects for temporal nonlinearities and a CAR prior for spatial effects, but with the introduction of a multivariate normal prior for causes of death whereby the covariance matrix describes the correlation structure between the 15 cause groups. The model did not, however, share information between age groups.

In studies of the cause composition of total mortality, rather than estimating the absolute death rate for each cause of death, it is possible to reframe the problem using a compositional model which considers the fraction of each cause of death composing total mortality. This was the approach taken by Salomon and Murray (2002) to investigate the dynamics of the proportions of mortality from GBD Groups 1, 2, and 3. The benefit of a compositional model is that the proportions are constrained to sum to unity, and the model can capture covariance between the component causes of death. However, it is not possible to recover absolute cause-specific death rates using the compositional approach without estimating the overall death rate.

2.4 Health inequalities in the UK

While the UK is, by global standards, a wealthy nation with relatively high life expectancy, the nation still suffers vast, preventable inequalities in mortality and morbidity. Health inequalities can be reduced through, amongst other initiatives, progressive social and economic policies, better nutrition programmes, and improved health care. It is important to estimate and understand differences in health outcomes between population subgroups to aid the design of such policies. There are several ways to stratify the UK population and compare inequalities between subgroups. Here, I focus on class, income, geography, and deprivation.

Class and income inequality

The notion of class is prominent in UK society, but health outcomes between classes are difficult to separate from other risk factors such as hazards in manual labour or smoking rates. The Whitehall study of 1967 followed 17,530 men working in the civil service and recorded their mortality over a 10-year period. Marmot et al. (1984) found, by classifying the civil servants into social class according to their employment grade, there was a three-fold difference in mortality between the highest class, administrators, and men in the lowest class, mainly messengers and manual workers. They found, in general, a strong inverse association between grade and mortality, which Marmot described as a “social gradient”. The men were working stable, sedentary jobs in the same office building in London, so the gradient could not be explained by industrial exposure alone, and the gradient remained even after controlling for smoking. The authors concluded there must be other factors inherent to social class (defined here by employment grade), which explain the mortality differences. A second cohort of Whitehall employees from 1985 to 1988, this time including women as well as men, were screened and asked to answer questions on self-reported ill-health. Marmot et al. (1991) found the social gradient in health had persisted in the 20 years separating the studies. In 2008, Marmot was asked by the Secretary of State to conduct a review into the state of health inequalities in the UK and to use the evidence to design policy for reducing these inequalities. A key plot in the first Marmot Review, released in 2010, depicted the social gradient in mortality for regions in England by socio-economic classification of employment (Marmot et al., 2010).

Income is not a routinely collected statistic in the UK. Nevertheless, using a small survey of 7000 people on three measures of morbidity, Wilkinson (1992) showed health improved sharply from the lowest to the middle of the income range.

Spatial inequality

In 2015, the GBD study released its first subnational estimates of mortality, starting with the UK and Japan. Steel et al. (2018) assessed these data, which divided the UK into 150 regions, finding mortality from all-causes varied twofold across the country, with the highest years of life lost in Blackpool and the lowest in Wokingham. In a study on forecasting subnational life expectancy in England and Wales, Bennett et al. (2015) estimated a 8.2 year range in life expectancy for men and 7.1 year range for women in 2012 between 375 districts. The lowest life expectancies were seen in urban northern England, and the highest in the south and London’s affluent districts. Within London itself, Cheshire (2012) visualised the heterogeneity of mortality in London by assigning tube stops the life expectancy of the nearest ward, revealing that 10 years are lost between two consecutive stops, Canary Wharf and North Greenwich, on the Jubilee line.

Deprivation

There have been substantial efforts in the UK to measure the deprivation of an area. Since 2004, the standard deprivation indicator in England has been the Index of Multiple Deprivation (IMD) – a composite indicator for each Lower-layer Super Output Area (LSOA2) covering income, unemployment, health, crime and environmental data sources (Ministry of Housing, Communities & Local Government, 2019). The Marmot Review presented life expectancy and disability-free life expectancy against IMD at the Middle-layer Super Output Area, which exhibit strong social gradients (Marmot et al., 2010). The GBD study found the 15 most deprived areas had consistently raised mortality, especially for all causes, lung cancer and chronic obstructive pulmonary disease. Deprived areas in London, such as Tower Hamlets, Hackney, Barking and Dagenham had lower rates of premature mortality than expected for that level of deprivation (Steel et al., 2018). Bennett et al. (2018) jointly estimated death rates by age, year and deprivation decile. They found that since 2011, although national life expectancy has continued to increase, the rise in female life expectancy has reversed in the two most deprived deciles. Using data from Public Health England, the second Marmot Review in 2020 also reported that female life expectancy declined in the most deprived decile between the periods 2010-12 and 2016-18 (Marmot et al., 2020). Digging further into these trends by region, the report found this trend was seen in all regions except London, the West Midlands and the North West, and that male life expectancy in the bottom decile also decreased in the North East, Yorkshire and the Humber, and the East of England.

2.5 Summary

Death rates vary by sex, age group and across time and space. Studying mortality variations at the small-area level introduces sparsity in the data, and statistical smoothing models must be used to obtain robust estimates of death rates. These models should be flexible to allow for variation across age, space and time, and should consider interactions between each of the dimensions. Modelling mortality from specific causes of death introduces further challenges, both through increased sparsity in the death counts and through correlations between diseases with common risk factors.

Following a change in government in 2010, the UK abandoned its strategy to reduce health inequalities. Since then, there has been a decline in female life expectancy in the most deprived areas.


  1. Or, if the open-ended age group contains zero deaths, the calculation of life expectancy is impossible. This is because the probability of dying in the final age group will be zero, so the life table cannot be closed and the life expectancy will be infinite.↩︎

  2. See Chapter 3 for descriptions of spatial units.↩︎