« Previous | Contents | Next »
Listen
3. Evaluation Plan
During the preparation of the original tender bid, and in discussions between members of the RCB, EAG and OCS, an evaluation plan was developed. This was refined during the course of the project in light of emerging results and continued literature review. This section outlines this plan as an introduction to the methods employed in this evaluation.
3.1. Uncertainty
Two strands to the assessment of uncertainty were identified. On the one hand, methods for expressing uncertainty around the final SIMD rankings were required, since this was an area highlighted by the OCS as being of particular interest. In the literature on institutional performance indicators, to which the estimation of small area deprivation exhibits many parallels, the estimation and presentation of rank uncertainty is cited as an inadequately addressed issue 9.
By not associating any measure of variation into the presentation of performance or area deprivation ranks, it will be interpreted that the ranks as presented are the true ranking of study units (here, data zones). However, as has already been noted during the replication of the SIMD algorithm ( Section 2.3), the precise ordering of areas is not certain when a number of areas have similar scores on the measure used to form the ranks. Also, when these measures are estimated from observations of random variables there is uncertainty inherent in the estimation procedure. Assuming that the indicator data are random observations from distributions with mean values related to an underlying aspect of deprivation, then the deprivation ranks estimated from the observed data will not in general be the same as the true level of deprivation, due to random noise.
Estimates of rank uncertainty as previously reported 10 have been based on random effects models of the variability of indicators between and within the units of interest. Rank uncertainty estimates are then based on estimated posterior distributions of the ranks. This can be achieved directly by fitting random effects models within a Bayesian framework using Markov Chain Monte Carlo ( MCMC) methods ( e.g. using WinBUGS 11) or by simulating from residual distributions, estimated after fitting a multilevel model ( e.g. using MlwiN 12).
As part of the Shrinkage and Factor Analysis sections of this evaluation, it was planned that some of these models would be employed, and that these rank uncertainty estimates could be derived. Hopefully, some recommendations could be made based on these estimates for the estimation of uncertainty in future publications of the SIMD.
However, to "evaluate" the current methodology albeit is necessary to compare alternative approaches. It was decided that an assessment of uncertainty in the SIMD ranking of data zones and LAs should form a major part of the evaluations carried out in this project. The precision of a deprivation measure for ranking data zones or LAs, or for allocating funds to LAs, were considered to be important considerations in comparing alternative methods. This need not mean being able to find methods that give less variable rankings, since less variability does not imply less bias; it might be important if a simpler method were to give similar estimated levels of deprivation, as well as showing similar levels of rank uncertainty.
It was felt that model based approaches to the estimation of rank uncertainty would not be adequate for such broad comparison of methods, since these estimates would be particular to the model being fitted to the data. A more general approach was considered to be the use of simulated indicator datasets; each alternative methodological change to the algorithm could be assessed by applying the method to each simulated indicator dataset and extracting the ranking of data zones, or some other summary of the SIMD. This would supply a distribution of rankings or summary measures.
It would be natural to seek to produce simulated indicator datasets from a multivariate distribution so that the mean for each indicator is equal to the observed value* and the correlation structure reflects that observed, since it would be expected that indicators would be strongly correlated within each deprivation domain, at least.
*This is not, in fact, desirable, since for many indicators, such as the number of deaths in a data zone within a given sex and 5-year age band, the observed values are often zero, and occasionally equal to the denominator (in the oldest age groups). If simulated datasets had mean values equal to the observed value, then all simulated values would be equal, giving an unrealistic simulated distribution.
This would, however, require some assumptions to be made regarding the multivariate distribution of indicators. It was felt that this might lead to simulated datasets that would be better modeled by particular shrinkage methods compared to others. Consequently, it was decided that the indicators should be simulated independently; this would most likely overestimate the total variation in the full indicator dataset, but would produce simulated datasets that would provide a fair comparison of alternative methods.
3.2. Shrinkage
As one of the issues listed in the SE Strategy for measuring deprivation, this was seen as a major area for evaluation. Shrinkage is employed to safeguard against extreme indicator values in small data zones within the health and education domains, both for single indicator variables and for age-sex standardised indicators, such as the CMF and CIF. The original shrinkage method used a relatively simple formula to create a shrunken indicator value for each data zone based on the observed value and the average value for the LA within which the data zone was to be found, formed as a weighted combination of the two, with weights proportional to the reciprocal of the variances of the two estimates.
In general, it was decided that the original shrinkage method and the alternatives to be considered would be evaluated by application to the observed indicator data and to simulated datasets, with comparison of the observed indices and ranks, and summaries of the ranks over groups of data zones, and their distributions over the simulations.
The first alternative to the original shrinkage method was to use no shrinkage at all. Other simple alternatives were to use different higher-level units to shrink indicator variables towards. Options proposed for evaluation were: a single higher-level unit, i.e. the national average value; sub- LA units, i.e. local collections of data zones within LAs; and some other classification of "similar" data zones. An intermediate geography ( IG) was available from SNS that satisfied the second option. It was decided to test the third option using a classification of data zones into 6 subgroups defined by their level of urbanisation or rurality.
The original shrinkage method and the simple alternatives proposed above treat each indicator variable independently. In reality, the indicators that are combined for each domain score are (usually positively) correlated. In principle, shrinkage can be improved by obtaining better between- and within-data zone variance estimates. This can be achieved by estimating the covariance structure of indicators between data zones, though the within-data zone covariance structure cannot, with the data available, be obtained. Multivariate shrinkage techniques are available 13 that utilise this information and this was proposed as an alternative to univariate shrinkage for each domain; furthermore, for age-sex standardised indicator variables, multivariate shrinkage could be applied to all age-sex subgroups simultaneously.
Various model-base shrinkage methods were also proposed. These use random effects models to estimate the average value of an indicator in each data zone. These estimates are shrunk towards the higher-level average value in much the same way as with the original method.
However, the models can be generalised to incorporate several layers of higher-level units, for example data zones within IGs, IGs within LAs and LAs within the national dataset. In this way, data zone, IG and LA average indicator values are all shrunk towards higher-level averages. Also, multivariate shrinkage can be achieved by modelling several indicators simultaneously as an additional level in the model. The final, and most complex shrinkage alternative proposed, was via a spatial shrinkage model, which models the correlation between geographically proximate data zones.
3.3. Factor Analysis
The use of Factor Analysis ( FA) was also highlighted for review in the SE Strategy document. The original SIMD algorithm assumes that there is a single latent variable underlying each of the health, education and access domains, and FA is used to identify the weighted combination of the indicators in each domain that best estimates these latent factors. Prior to FA, each indicator variable is ranked over all data zones and transformed to a standard Normal distribution.
As for the alternative shrinkage methods, modifications to the original FA method would be evaluated by application to the simulated datasets.
The simplest alternative to FA that was proposed was Principal Components Analysis ( PCA) which identifies the linear combination of the indicator variables in each domain that explains the largest possible proportion of the total variance across all component indicators (after ranking and transformation to standard Normal variates). Additional modifications for evaluation were to use FA and PCA on indicator variables that were either untransformed, or subjected to alternative transformations of ranking, or ranking followed by an exponential transformation.
A more complex alternative was also planned, using a generalised FA technique 14. This uses the untransformed indicator variables, but assumes a distribution for these other than a Normal distribution; in most cases a Binomial, for those indicators that are expressed as a proportion (or percentage) of the data zone population (or sub-population).
3.4. Exponential Transformation
In the original SIMD algorithm, after each domain score is created, they are combined into a single index, the SIMD. Since this measure is intended to measure the presence of multiple deprivation, or deprivation on multiple domains, it was intended that these should not "cancel out", in the sense that a mixture of high and low levels of deprivation on different domains within the same data zone should not result in an overall score similar to a data zone with average levels of deprivation on all domains.
If domain scores are transformed by subtracting the mean value and dividing by the standard deviation ( i.e. so that each variable has zero mean and unit variance), such as in the calculation of the Carstairs index 15, such cancelling out can occur, since extremes of high and low deprivation can have equal positive and negative z-scores. For the construction of the SIMD, domain scores are ranked across data zones and transformed to have exponential distributions, scaled to have a maximum value of 100. An equal combination of extremely high and low deprivation on two deprivation domains results in a higher total deprivation score than a combination of average deprivation on both domains. It was intended that this approach would highlight data zones with multiple deprivation, or deprivation on multiple domains.
It was decided that a different approach would be taken to the evaluation of this step in the SIMD process. The properties of an index constructed using ranked and exponentially-transformed components were to be explored, with reference to the actual domain scores created as part of the SIMD 2004.
The impact of using a simple standardisation procedure (subtracting the mean value and dividing by the standard deviation) instead of the current method of ranking and transforming to an exponential distribution, will be assessed by application to the observed data as well as the simulated indicator datasets. Alternative transformations of the domain scores, that have a similar overall effect to the current method but do not involve ranking, and therefore preserve distances between data zones with respect to each domain, will also be applied to the observed data and the simulated indicator datasets.
3.5. Weighting
After transformation to exponential distributions, domain scores are combined as a weighted sum to form the SIMD score. The weights applied to the six domains are: income, 0.29; employment, 0.29; health, 0.14; education, 0.14; access, 0.09; housing, 0.05. These weights were chosen to reflect both the quality of the data used to construct the domain scores and the relative importance of each deprivation domain.
The plan for the evaluation of this area was to explore the impact of variations around the chosen weights on the resultant SIMD scores and ranks. Also, a method for the derivation of domain weights will be proposed that tries to separate empirical issues from value judgements.
3.6. Methods for Comparison
As outlined above, the original SIMD algorithm and various alternatives will be assessed by application to a large number of simulated indicator datasets, and comparing the resultant SIMD ranks produced by each modification. Details of how these results will be used to compare methods are given in Section 4.1.3.
It will not be possible to compare all methods in this way, since some will be computationally difficult, and too time consuming to apply repeatedly to a large number of datasets. For all alternative methods that can be applied to produce a SIMD, we shall compare the SIMD scores and ranks with those from the original algorithm using correlation coefficients, and with summaries of the ranks for groups of data zones, defined by the SE Urban/Rural Indicator, and by LA. The main summary to be compared will be the number and percentage of data zones in a group that are ranked within the 15% most deprived areas nationally. The Community Regeneration Fund allocations are defined by the distribution of these data zones across LAs.
For information, more detailed summary statistics for each method will be supplied in an Appendix to this report. In subgroups of data zones, the following summaries of the distribution of the SIMD rankings will be presented:
- SIMD Decile Distribution
- Defined as the number and percentage of data zones in each subgroup falling into each decile of the national SIMD distribution
- Mean SIMD rank
- The mean SIMD rank of data zones in each subgroup, measuring average deprivation levels for that subgroup of data zones
- SIMD Concentration Score
- Defined as the population-weighted mean SIMD rank of the most deprived data zones in the subgroup containing exactly 10% of the subgroup population
- SIMD Extent Score
- Defined as the percentage of the subgroup population living in one of the 10% most deprived data zones nationally
« Previous | Contents | Next »