9  Discussion

9.1 Public health and policy implications of findings

Although previous studies have observed a decline in female life expectancy in the bottom two deciles of deprivation in the past decade (Bennett et al., 2018; Marmot et al., 2020), this thesis has shown both for men and women where the declines have been happening for small areas. Not only have there been declines, but the declines have been accelerating in the latter half of the decade, such that in the period 2014-19, life expectancy fell in 1 in 5 MSOAs for women and 1 in 9 for men. These trends happened in the years before the mortality-forcing event of the Covid-19 pandemic and are worse than comparable countries (Leon et al., 2019), and should be deemed a failure for public health policy. Although the complexity of policy means it cannot be proved causally, leaders within the NHS have pointed towards constrained funding for the healthcare body itself, as well as cuts to wider social determinants, such as housing and education, as the main reason for the country’s poor performance (Ham, 2023).

The declines in life expectancy were sustained over a long period of time, which serves as another example where death rates for some population subgroups run contrary to the persistent mortality decline of the third stage of the Epidemiologic Transition theory, as discussed in Gaylin and Kates (1997) with the HIV/AIDS pandemic. Even if England is in the hypothesised fourth stage of the transition, the Age of Delayed Degenerative diseases (Olshansky and Ault, 1986), there are subnational patterns where degenerative diseases are not killing at later and later ages. This supports the growing evidence that Omran’s theory acts only as a useful heuristic, and that there is a large amount of variability, particularly in the latter stages of the transition, between broad geographical regions (Mackenbach, 2022; Sudharsanan et al., 2022).

The difference in progress between districts in the last decade was largely driven by differences in degenerative diseases - in particular, the rate of improvement for CVDs and all other NCDs, and the strength of the negative forcing effect of Alzheimer’s and other dementias. Furthermore, female mortality from infectious, maternal, perinatal and nutritional conditions (GBD group 1), which dominate the second stage of the transition, increased in many districts. There is also a worrying shift towards injuries (GBD group 3) contributing negatively towards life expectancy progress, particularly for men. This is possibly driven by a rise in “deaths of despair” (Angus et al., 2023; Case and Deaton, 2015), although this would require further analysis by separating intentional and unintentional injuries. It should be noted that IMPN and injuries do not play a major role in total mortality as they accounted for only 11.1% of all deaths from 2002 to 2019, but they are almost entirely preventable and should be addressed through appropriate policy.

Cancer survival outcomes in the UK are worse than those in Europe in general (OECD, 2016). In this thesis, I found mortality showed huge spatial variation for a number of preventable cancers. Worryingly, at the same level of poverty, London performed significantly better that the rest of the country for several cancers. This suggests either regional differences in the quality of care, or differences in the populations living in deprived areas, which should both be the subject of policy discussions.

Finally, this thesis emphasises the value of small area work in informing policy. National trends in mortality are not spatially homogenous. Particular diseases have shown massive variation at the district level, such as COPD in women (6.0-fold in 2019), and by studying England at the MSOA level, I have uncovered the widest subnational gap in life expectancy of 27.0 years (men in 2019) in the literature. This rich source of data allows policymakers to work with local authorities to create targeted public health interventions. The population issues at the LSOA level in London and the coding of CVDs on the Isle of Wight show there are still limitations of the data. Nevertheless, the estimates from this thesis are already being used in the press to provide context for recent falls in US life expectancy (Burn-Murdoch, 2023), and I hope they are also being used in policy discussions.

9.2 Future work

There are many possible methodological and substantive extensions to the work presented in this thesis. Firstly, on the methodological side, with improvements both to hardware and the rise of approximate inference algorithms to replace computationally costly sampling methods, the models can, in theory, be scaled to higher and higher spatial resolutions and we could potentially estimate mortality for the entire country for LSOAs, OAs, or even postcodes. However, given the data issues in Chapter 5 and the very wide credible intervals seen for some LSOAs with high life expectancies, perhaps smaller is not better when the quality of the data is lacking.

One of the major strengths of the thesis was the use of Bayesian methods, at one of the highest spatial resolutions in the literature for a model estimating mortality. This was largely thanks to recent developments in probabilistic programming, which allow sampling algorithms to run on GPUs rather than CPUs, and is generally faster for models with over 10,000 parameters (Lao et al., 2020). The models themselves, which were based on previous work in the group (Bennett et al., 2018, 2015), followed traditional statistical approaches using intercepts and linear slopes and adding in random walk effects for nonlinearities. However, with models such as Equation 4.4, we are actually only interested in the left-hand side of the equation – the death rate – and not at all in the values of the parameters on the right-hand side. There are a vast array of flexible models which can describe complex data without needing to design heavily parametrised structures. These range from Gaussian processes (Flaxman et al., 2015; Rasmussen and Williams, 2006), where the user can use inductive biases and their own knowledge to influence how the model fits the data, to wilder options like neural networks, which completely sacrifice any remaining interpretability of the effects.

During my PhD, I became very familiar with the literature on Gaussian processes, and was involved with a paper aimed at speeding up inference in spatial settings (Semenova et al., 2022). One of the most promising avenues for the task of mortality estimation in this thesis, outlined in Flaxman et al. (2015), is the use of Kronecker-structured covariance matrices to create hierarchical Gaussian processes. In the context of this thesis, the covariance matrix would be defined as the Kronecker product of three smaller matrices: one for age effects, one for spatial effects, and one for temporal effects. We can exploit the Kronecker structure for faster inference, as we no longer need to invert the full matrix (an operation that scales cubically with the number of data points), but rather the smaller constituent matrices of each effect. Ultimately, nothing came to fruition as the working environment, due to the sensitivity of the data,1 was not conducive to developing novel statistical methods. Also, there was no guarantee the Gaussian process methods would be quicker, as it still requires the inversion of a large spatial matrix, nor would it produce a result much different from the original model. Instead, I allocated time to scaling existing models and testing new hardware. I believe the hierarchical Gaussian process model should be the starting point for future research because it is much more flexible and requires less testing of each effect than the models in this thesis.

Another methodological extension would be to model causes of death jointly, thereby borrowing strength across causes. Studies have built spatial models which use shared components between diseases, either modelling spatial patterns of any number of diseases using a weighted sum of shared components (Best et al., 2005; Held et al., 2005; Knorr-Held and Best, 2001) or pre-assigning the spatial components to the diseases based on knowledge of common risk factors (Downing et al., 2008; Mahaki et al., 2018). However, as the number of diseases in the study increases, more spatial components must be fitted, which is computationally prohibitive. Furthermore, unless the spatial components are defined a priori, there is a question of how many spatial components are required to parsimoniously describe the variance in the data. Alternatively, the correlations between the causes can be modelled using a multivariate normal distribution of the same dimensions as the number of cause groups, as in Foreman et al. (2017). Extending this idea, there is also the option to extend the hierarchical Gaussian process described above with another component covariance matrix describing the correlations between each of the cause groups. Ultimately, the goal would be to run a single joint model that flexibly describes interactions between sexes, age groups, spatial units, time, and causes of death.

On the substantive side, there is a reason the final year of study period was restricted at 2019 that has nothing to do with the availability of vital registration data for the proceeding years: the Covid-19 pandemic. It would be naïve to fit the same model to an extended time period that includes the pandemic years because the pandemic had such a major influence on mortality patterns. The model relies heavily on linear trends, which would be affected by the final years. Further, pandemics are fat-tailed events, so the Gaussian random walk effects would struggle to capture the shock without increasing the overall variance of the effect.

Rather than extending the study to include the pandemic years and look at the trends between 2002 and 2023, a more interesting question is “had the pandemic not happened, how would mortality patterns be different?” This would be done using a counterfactual analysis: train a model to estimate annual mortality from 2002 to 2019 and produce a forecast for the years from 2020. A team, including myself, has already done this analysis at the national level at a weekly temporal resolution (Kontis et al., 2022, 2020). Here, the task would instead involve forecasts for each sex-age-space-time-cause stratum on annual timescales. The group has been involved in a number of mortality forecasting projects (Bennett et al., 2015; Foreman et al., 2017; Kontis et al., 2017). The models are similar to those in this thesis, but would need to be adapted and trained specifically for forecasting purposes, with the possibility of averaging multiple models to reduce bias. The results would allow us to see how the cause-composition of mortality has changed compared to the business as usual scenario. We would obviously expect an increase in deaths from infectious diseases, but were injuries reduced as the country was locked down in their homes? Did cancer outcomes worsen due to missed surgeries as hospitals were flooded with Covid-19 patients? Is there a longer-term effect due to strain on emergency services, which has manifested in longer waiting lists and waiting times for appointments (Dorling, 2023)? And, how have these effects varied for different age groups and different areas of the country?


  1. Working on identifiable health data requires a secure computing infrastructure. In my case, the cluster had no access to the internet and only a command line interface, which hampered method development. It is understandable why spatial statisticians continue to study the standard Scottish lip cancer dataset.↩︎