« Previous | Contents | Next »
Listen
Appendix F.
On submission of this report to the funding organisation, the members of the Expert Advisory Group were given the opportunity to submit written comments on the final draft, which are included in this Appendix.
Comments by Hugh Gravelle and Matt Sutton
Conclusions and recommendation for a simple, transparent index of multiple deprivation ( STIMD)
We support fully most of the recommendations of the report and applaud the authors for a magnificent job.
In our view, the analyses of exponential transformation and weighting raise a number of concerns about the construction of the SIMD:
- The series of complex transformations applied leads to an index that does not have a transparent relationship to the raw deprivation scores
- The use of ranking procedures at several stages loses information contained in the raw deprivation scores. If ranks are required, these should be applied as late as possible in the procedure used to construct the index of multiple deprivation
- The desire to "avoid cancelling-out" is biased against areas with moderate levels of deprivation on many indicators in favour of those with extreme deprivation on one indicator
- Many of the transformations embody value judgements that are not transparent. Often their effects depend on the underlying distribution of the raw deprivation scores.
Therefore, we feel that it would be possible to produce a simpler and more transparent index of multiple deprivation based on the new statistical methods that the authors recommend, rather than the one suggested in the report. In addition to transparency, this STIMD index would have the following advantages over the SIMD:
- the relative contribution of each domain would reflect separately the prevalence and severity of each type of deprivation. The 'weights' attached to each domain would adjust automatically in response to empirical changes ( e.g. if one form of deprivation became less prevalent), and the value judgement about the severity of each form of deprivation could be explicitly obtained and stated.
- the 'trade offs' between the types of deprivation in the overall index would be explicit. In our opinion, the stated rationale for 'avoiding cancelling-out' is rather woolly and the current method is inconsistent and biased against certain profiles of deprivation. We propose that constant rates of trade off should be the default position unless a clear rationale can be provided for variable rates. Variable rates of trade off can be incorporated easily into the type of index we recommend.
- the resultant index would be cardinal rather than ordinal and would therefore be more useful for subsequent analysis. For example, instead of arbitrarily designating a proportion of areas as 'deprived' and allocating funding solely to them, resource allocation can be graded according to degrees of deprivation as currently practised in the health sector. Ordinal measures can of course be obtained from the cardinal measure by ranking, if desired.
Rationale for our recommendation
As the text of the report (section 11) indicates, we produced an analysis and critique of the exponential transformation and weights used in the SIMD. Much of the analysis appears as part of sections 9.5 and 9.6. However, the report's final recommendations on these matters (in sections 11.2.3 and 11.2.4) do not reflect our views on the way in which the issues raised in sections 9.5 and 9.6 should be dealt with.
We found that a non-trivial intellectual investment was necessary to understand the index. It seems doubtful if it would be easy to explain it to the average voter, taxpayer or recipient of funds dispersed using the index. Therefore, we favour a simpler and more transparent approach to combining the domains of deprivation into an overall index.
We believe that an index of multiple deprivation should have the following desirable properties
- Its construction should lose as little of the information contained in the raw deprivation scores as possible
- Its derivation from the raw deprivation scores should be as transparent and simple as possible
- Its calculation should separate empirical issues from value judgements
The current SIMD can be criticised on all three points. The process of taking ranks loses information: knowing only the relative ranks of two datazones in a particular domain conveys no information about the magnitude of the differences in deprivation levels. Only if we supplement information on ranks with quite detailed information on the distribution of raw domain scores can we get any impression of whether the datazone ranking 15 is much more deprived than a datazone with rank 30. Therefore, the use of rankings leads to unnecessary loss of information and imports value judgements. The use of complex transformations makes the index construction opaque and also imports and hides further value judgements. We suggest that it is possible to use the raw data on deprivation in ways which better meet the three criteria.
We suggest that there are two important aspects of deprivation which an index should capture:
(a) the proportion of people experiencing this deprivation (prevalence)
(b) the burden of this deprivation (severity).
The first is primarily an empirical question once a decision has been made about what level represents deprivation. The second is a value judgement. The current form of the SIMD conflates these two aspects
We suggest that these two aspects should be separated. This has the advantage of allowing an increase in the prevalence of a certain form of deprivation to be represented in the overall index. If, for example, there is a proportionate reduction in the level of income deprivation while all other deprivation domains remain constant, one would expect its relative importance in determining the index of multiple deprivation to decline. This does not happen with the current system since the ranks (and therefore transformed ranks) are not affected. One would need to reduce the weight applied to the income domain purposefully to achieve this.
We recommend that the measure of multiple deprivation should be a simple weighted sum of the proportions of the populations suffering deprivation on the different domains:
(1)
where d j is the weight on domain j and p ij is the proportion of population in data zone i who are deprived on domain j. The importance of domain j increases when the proportion of the population affected increases. The value judgements about which forms of deprivation impose the greatest burden on individuals are given by the d j and are kept separate from the empirical questions of the proportion of people who are affected.
Only very small changes to the data need to be made to construct such an overall index. Our analysis suggests that such an index would be very closely related to the current index, thereby offering considerable improvement in transparency without loss of information.
We have set out our arguments in more detail in the technical note below. The note also contains an illustrative calculation of the type of index we recommend.
Supporting technical notes
These notes use published data on the SIMD2004 to examine the main properties of the process used to combine domain scores into an index of multiple deprivation. We consider the two final stages of the construction of the SIMD: (i) the exponential transformation used to convert the domain scores onto a common scale and; (ii) the weighting of the domains.
Exponential transformation
The raw domain scores undergo two transformations before being weighted and summed to produce the index of multiple deprivation.
First, the raw domain scores are expressed as ranks. This transformation depends on the distribution of the raw scores. Histograms for each of the raw domain scores are illustrated in Figures 9.8, and 9.9. Three of the domain scores are positively skewed (income, employment and housing) and three are normally distributed by design. Consequently, the transformations into ranks are different across these two classes of domains.
Second, the domain ranks are transformed using the exponential transformation
(2)
where r ij is the relative rank of data zone i on domain j = a, b,…, f and
. The same transformation is applied to domain ranks for domains a, b, …, f. The form of is shown in Figure F.1.
The combined effect of ranking followed by exponential transformation of the ranks differs across domains (Figures 9.8, 9.9) because of the different shapes of the distribution of the raw domain scores across data zones for the different domains.
We are interested in the decision-rule that is used to decide which areas are more deprived than others and which combinations of deprivation levels can be considered to be of equal severity. There are a wide range of options which meet plausible criteria for such decision rules, such as a data zone having a higher score than another only if it has at least as high raw domain scores for all domains and a strictly higher score for at least one domain.
Other simple rules are:
(a) maximax - the overall level of deprivation is given by the maximum level of deprivation on any of the domains
(b) maximin - the overall level of deprivation is given by the minimum level of deprivation on any of the domains
(c) constant marginal rate of substitution - the overall level of deprivation is a weighted sum of the levels of deprivation on each of the domains. This means that there is a constant trade-off between deprivation on different domains.
With a maximin decision rule, the contours are L-shaped. With a maximax decision rule, the contours are also right-angled pointing to the north-east of the x-y space. With a constant marginal rate of substitution, the contours are parallel lines sloping downwards from the north-west to the south-east. The slope of the lines gives the marginal rate of substitution. If there is a one-to-one rate of substitution then the lines have a slope of -1.
The use of the exponential transformation (2) implies a non-constant tradeoff between the relative ranks. To illustrate suppose that the SIMD was based on just two domains ( a and b) so that the decision rule (function) for constructing the SIMD from the relative ranks is
(3)
where d j is the weight on domain j and weights are non-negative and sum to 1.
The tradeoff (or marginal rate of substitution) between the relative ranks r ia, r ib is the rate at which the relative rank on domain b would have to fall to keep the index constant after an increase in the relative rank on domain a.
(4)
Figure F.2 plots the contours of the S r( r ia, r ib) decision rule in ( r ia, r ib) space. Higher contours correspond to higher levels of deprivation. The contours are "bowed out" (quasi-convex), so that as the level of r ia increases the amount by which r ib must fall to keep the index constant is increasing. Thus the index for a data zone with a high relative rank on domain a and a low relative rank on domain b is relatively insensitive to changes in the relative rank on domain b. This is the property stated as being desired for the SIMD - that "cancelling-out" of deprivation should be minimised.
However, as Figure F.2 shows, the extent to which the IMD avoids "cancelling-out" depends on the level of the index of multiple deprivation. At low levels of multiple deprivation, represented by the contour for a score of 10, there is little cancelling out. At high levels of multiple deprivation, there is a high degree of cancelling out.
The contours in Figure F.2 are in the space of domain ranks. We are more interested in the rates of substitution between the raw domain scores that are implied by the process of constructing the SIMD. These are a more complex expression than (4) since they depend on the relationships between the distribution of raw domain scores as well as the SIMD process.
Let x ia, x ib be the raw measures of deprivation for area i for domains a, b. Let F a( x), F b( x) be the cumulative distribution functions for the raw measures. The relative rank of i on raw deprivation measure j is just r ij = F j( x ij). Since there are many areas we can assume that F j is differentiable. Then, in the illustrative two domain example, the SIMD index for data zone i in terms of the raw domain scores is
(5)
Hence the marginal rate of substitution between raw domain scores is
(6)
Since the SIMD function S is a linear sum of u ijs, it is convex if all u ij are convex in x ij and concave if all u ij are convex in x ij.
The first and derivatives of u ij with respect to x ij are
(7)
The first term in is positive. Hence with a uniform distribution of raw scores the transformed domain score is convex in the raw domain score. Now
. Since F j is a distribution function, its first derivative is the density function and it is positive (assuming there are no gaps in the support of the variable - which seems plausible for a distribution of raw deprivation scores for areas). Its second derivative is the slope of the density function. Again it is plausible that for x ij in the upper end of the distribution the density is getting smaller so that the second derivative of F j is negative. Or, if the distribution is unimodal the second derivative is negative when x ij exceeds the mode. Thus, for values of x ij towards the top end of the distribution, the second term in (7) is positive. Moreover with f j getting smaller the first term gets smaller. Thus it is plausible that (7) is negative: the function is concave for x ij in the upper end of the distribution. Hence we get contours of S which are convex (bowed out) for low levels of raw deprivation and concave (bowed in) for high levels.
We investigated the shape of the contours of the SIMD using data on combinations of two domains. We use the income domain in all analyses. We consider four pairs:
1. Income and employment (highest positive correlation between two positively-skewed variables)
2. Income and education (high positive correlation between one positively-skewed and one normally-distributed variable)
3. Income and housing (lowest positive correlation between two positively-skewed variables)
4. Income and access (negative correlation between one positively-skewed and one normally-distributed variable)
We can plot the contours of deprivation in ( x ia, x ib) where a is always the income domain and b is variously employment, housing and access. We use the empirical data to obtain the approximate value of the domain score that equates to the required value of the rank on the other domain for a particular value of the index. Figure F.3 plots the contours of equal multiple deprivation in ( x ia, x ib) space.
It is clear that the shape of the contours depends on their location and on the variables being considered. At high average levels of deprivation, the contours are convex implying that areas with equally high levels of deprivation on both domains are judged to experience more multiple deprivation than areas with "very high" deprivation on one domain and "less than high" deprivation on the other domain. At low average levels of deprivation, the contours are concave implying that areas with equally low deprivation on both domains are judged to experience less multiple deprivation than areas with "very low" deprivation on one domain and "more than low" deprivation on the other domain. Such value judgements do not seem consistent and are the result of a complex series of transformations to the raw deprivation data.
Weighting of domains
The application of weights is intended to reflect the relative importance of deprivation on each of the domains. This can be more naturally thought of as capturing two important aspects of deprivation:
(a) the proportion of people experiencing this deprivation (prevalence)
(b) the burden of this deprivation (severity).
The first is primarily an empirical question once a decision has been made about what level represents deprivation. The second is a value judgement. We suggest that these two aspects should be separated. This has the advantage of allowing an increase in the prevalence of a certain form of deprivation to be represented in the overall index. If, for example, there is a proportionate reduction in the level of income deprivation while all other deprivation domains remain constant, one would expect its relative importance in determining the index of multiple deprivation index to decline. This does not happen with the current system since the ranks and therefore transformed ranks are not affected. One would need to reduce the weight applied to the income domain to achieve this.
An example of a simple and transparent alternative deprivation index is:
(8)
where d j is the weight on domain j and p ij is the proportion of population in data zone i who are deprived on domain j. The importance of domain j increases when the proportion of the population affected increases. The value judgement about which forms of deprivation impose the greatest burden on individuals are given by the d j and are kept separate from the empirical questions of the proportion of people are affected. The index also imposes a constant marginal rate of substitution across the domains: the trade off between the amount (prevalence) of deprivation on domains a and b is given by d a/ d b.
To illustrate this approach we have derived approximate values of the prevalence of deprivation for each domain. The income and employment domains are already expressed as proportions of the population affected, and we use these data. For the housing domain, we take our prevalence estimate from the most prevalent of the two indicators in this domain - overcrowded households. For the health domain, we adopt the proportion of the population affected by the most prevalent health condition, limiting long-term illness. We use this as the sole indicator of deprivation in this domain as we wish to avoid normalisation of the indicators required for factor analysis. For education, we take the same approach and adopt the proportion of the population without formal educational qualifications as the sole indicator. For the access domain, we use the average of the proportions of the population reporting that post offices, doctors and grocery/food shops are not "very convenient" or "fairly convenient" from the 2003 Scottish Household Survey. This weight is applied to the unweighted average of the drive times for the five services considered in the SIMD. This is summarised in Table F.1 (partly replicated as Table 9.19 in the main report).
Table F.1 Data sources used for the prevalence estimates
Domain | Definition of prevalent group | Source | Scottish average prevalence rate | Applied to domain indicator |
|---|
Current income | In receipt of benefits on grounds of low income | SIMD 2004 | 15.1% | SIMD income domain score |
Employment | In receipt of unemployment-related benefits | SIMD2004 | 14.1% | SIMD employment domain score |
Housing | Household population living in overcrowded households | SIMD2004 | 14.1% | SIMD housing domain score |
Education | Persons aged 16-74 with no formal educational qualifications | 2001 Census | 33.2% | SIMD working age population with no educational qualifications |
Health | Persons with limiting long-term illness | 2001 Census | 20.3% | SIMD Comparative Illness Factor |
Access | Persons reporting that post offices, doctors and grocery/food shops are not "very convenient" or "fairly convenient" | 2003 Scottish Household Survey | 11.0% | Average of SIMD drive times to five services |
Since this creates a new index, we compare the weighted prevalence method with the multiple deprivation index that would emerge from the same selection of domain indicators using the SIMD methodology for combining domains. All versions are highly correlated. The original SIMD score and the index based on the subset of variables using the same method for combining domains have a correlation coefficient of 0.9934. The weighted prevalence index and the index using the SIMD method have a correlation coefficient equal to 0.9921.
Figure F.4 is a scatterplot between these latter two, illustrating the pure effect of changing the method for combining domains. The vertical and horizontal lines indicate the point on the multiple deprivation indices where datazones become included in the most deprived 15%. Datazones in the north-east quadrant are categorised in the top 15% using both methods. There is considerable agreement between the indices. Only 34 (3.5%) of the 975 datazones in the top 15% using the SIMD method are not in the top 15% using the weighted prevalence method.
This suggests it may be possible to generate a multiple deprivation index with similar results to the SIMD2004 using a considerably more transparent method. In this example, we have relied on a single indicator for the health and education domains and a simple average for the access domain but the forms of factor analysis described in the main report on the raw scores could also be used since this preserves distances between areas on the original scales.
Figure F.1. Relationships between domain ranks and the exponential transformation

Figure F.2. Contours of multiple deprivation in the deprivation rank space

Figure F.3. Contours of multiple deprivation in the deprivation score space


Figure F.4. Comparison of multiple deprivation indices based on weighted prevalence method and existing method

« Previous | Contents | Next »