« Previous | Contents | Next »
Listen
SCOTTISH INDEX OF MULTIPLE DEPRIVATION 2004: TECHNICAL REPORT
Chapter 4 : Creating a data zone level overall Scottish Index of Multiple Deprivation
Once the individual domain scores are calculated they are combined into the overall Scottish Index of Multiple Deprivation (SIMD 2004). The methodology 40 is based on that used by SDRC to create the SID 2003 and as with the domains and indicators, the techniques have been well documented in their previous work. As such, there may be some overlap with other SDRC reports.
The exponential transformation is used to prepare the domains for this combination. Each domain is first standardised by ranking the scores. This is necessary because the domains are measured on different scales and by ranking the domains it is ensured that they have identical distributions with the same range and maximum and minimum values. However, using the ranks alone would result in distributions which were symmetrical and deprivation in one domain could be fully 'cancelled out' by lack of deprivation in another. This does not reflect the prior distribution of domain scores and gives undue weight to the least deprived scores. Prior to standardisation the domain scores are such that the most deprived scores are spread out, while the least deprived scores are very similar. Thus simply using the symmetrical ranks is inappropriate given that low scores signify less deprivation and do not imply well-being. A transformation is required to address these issues and, in line with the SDRC methodology in SID 2003, the exponential transformation of the ranks was chosen as the most appropriate method.
The exponential transformation deals with this question of cancellation. It has the advantage that every domain is converted to an identical distribution with the same maximum and minimum values, whilst emphasising the most deprived 'tail' of the distribution. The transformation 'draws out' the ranks of the most deprived data zones so that spaces are introduced between data zones that reflect the actual distributions. The formula for the calculation is that used by SDRC in SID 2003 41:
X = -23*log{1-R*[1-exp(-100/23)]}
where R is the rank (with the least deprived data zone ranked 1) transformed to the range [0,1], log is the natural logarithm and exp the exponential transformation.
The constant -23 gives a 10% cancellation property. To illustrate why this property is desirable, suppose two domains were equally weighted and cancellation was not applied. A data zone which was most deprived on one of the domains and least deprived on the other would be ranked at the 50 th percentile. However, it does not seem appropriate to suggest that lack of deprivation in one domain should exactly cancel out an entirely different dimension of deprivation in another. Using the 10% cancellation property, the data zone would be ranked within the 10% most deprived data zones. This was considered to be more appropriate.
Following the exponential transformation, the data zones have scores ranging between 0 (least deprived) and 100 (most deprived) on each domain. In addition, the scores increase exponentially so that the most deprived data zones have more prominence. The 10% cancellation factor means that the most deprived 10% of data zones are emphasised with scores between 50 and 100 whilst the remaining 90% of data zones have scores between 0 and 50. Thus the exponential transformation successfully deals with the issues of cancellation and symmetry.
The overall SIMD 2004 score is then constructed by combining the exponentially transformed domains using the ratios 6 : 6 : 3 : 3 : 2 : 1 in the following order:
- Current Income
- Employment
- Health
- Education, Skills and Training
- Geographic Access and Telecommunications
- Housing
The weights are those used in the SID 2003, adjusted to allow the inclusion of the new Housing domain. The SDRC chose a theoretical approach to deriving the domain weights over other possibilities such as weights chosen empirically, by consensus, for policy relevance or arbitrarily. They concluded that the Current Income and Employment domains should carry the most weight in the overall Index. This was partly due to the fact that these domains were the most robust and partly since this was in line with the academic literature of multiple deprivation. In the SIMD 2004 these conclusions are still relevant and therefore the weights have remained similar with a slight adjustment to incorporate the Housing domain. The relatively small weight of the Housing domain is a reflection of the current limited amount of relevant data available for inclusion in the domain. As more and better data on poor housing conditions is developed the relative weight is likely to increase.
The larger the SIMD 2004 score the more deprived the data zone. However in order to compare data zones it is important to use the relative order of the ranks. It is not correct for example to say that data zone X is twice as deprived as data zone Y because the SIMD score for X is 50 and that for Y is 25. This is due to the transformation of the data that takes place to enable a domain score to be produced. It is equally not true to say that a data zone of rank 50 is twice as deprived as a data zone with rank 100. However a data zone of rank 75 is more deprived than a data zone of rank 125.
« Previous | Contents | Next »