On this page:

Evaluation of Statistical Techniques in the Scottish Index of Multiple Deprivation

« Previous | Contents | Next »

Listen

9. Results

9.1. Univariate Shrinkage Methods

9.1.1. Observed Indicator Data

Table 9.1 shows the correlations between the original SIMD 2004 methodology and alternatives based on using the same univariate shrinkage method, but applied using different shrinkage areas. Included in these comparisons is the no shrinkage alternative, which can be thought of as shrinkage using data zones as their own shrinkage area. The other shrinkage area options shown are the use of a single (national) shrinkage area, the use of groups of data zones into intermediate geographies ( IGs) below local authority ( LA) level, and the use of groups of data zones defined by the SE Urban/Rural indicator (rurality groups).

All correlations are high, indicating that none of the alternatives has an excessive influence on the domain or multiple deprivation scores or rankings. In general, shrinkage towards rurality group averages produces deprivation measures most strongly correlated with the original algorithm, and use of an intermediate geography produces measures that are least strongly correlated to the original, though these correlations are still very high.

Correlations for the SIMD score are greater than for the SIMD ranks, though for the domain scores, this is reversed. Altering the shrinkage areas used has the largest effect on the health domain score, in the construction of which 79 separate indicator variables are shrunk; for the education domain, 19 raw indicator variables undergo shrinkage, and there is less impact on these domain measures. The impact on the overall SIMD scores and ranks are less since these two domains have a combined weight in the final calculation of only 28%; calculations for domain scores making up the remaining 72% remain unchanged in these modified methodologies.

Shrinkage Area

None

National

Intermediate

Geography

Rurality

Group

SIMD Score

0.9987

0.9992

0.9980

0.9994

SIMD Rank

0.9985

0.9990

0.9975

0.9991

Health Domain Score

0.9663

0.9819

0.9469

0.9822

Health Domain Rank

0.9684

0.9819

0.9487

0.9826

Education Domain Score

0.9940

0.9956

0.9909

0.9963

Education Domain Rank

0.9956

0.9965

0.9910

0.9968

Table 9.1Correlations of SIMD, health domain and education domain scores and ranks between original algorithm and methods using alternative shrinkage areas

Rurality Group

Total

Shrinkage Method

Original

None

National

IG

Rural Group

N

N (%)

N (%)

N (%)

N (%)

N (%)

Large Urban

2432

673 (27.7)

666 (27.4)

664 (27.3)

670 (27.5)

678 (27.9)

Other Urban

1892

228 (12.1)

231 (12.2)

230 (12.2)

232 (12.3)

230 (12.2)

Accessible Small Towns

666

34 (5.1)

37 (5.6)

40 (6.0)

33 (5.0)

33 (5.0)

Remote Small Towns

189

15 (7.9)

15 (7.9)

15 (7.9)

15 (7.9)

13 (6.9)

Accessible Rural

930

23 (2.5)

22 (2.4)

23 (2.5)

23 (2.5)

19 (2.0)

Remote Rural

396

2 (0.5)

4 (1.0)

3 (0.8)

2 (0.5)

2 (0.5)

Table 9.2Number of data zones in each rurality group, and number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method and alternative univariate shrinkage options

Table 9.2 shows for groups of data zones defined by the SE Urban/Rural indicator, the number of data zones ranked within the most deprived 15% nationally, under the original SIMD algorithm and under each univariate shrinkage alternative. The use of a single national shrinkage unit, or no shrinkage at all both result in slightly fewer data zones in large urban areas being classified as highly deprived, and more in other urban areas and accessible small towns. This could be a result of these methods not over-shrinking isolated pockets of deprivation. The use of IGs as shrinkage units appears to give very similar results to the original algorithm, with some trade-off between the two urban groups. The use of rurality groups themselves as shrinkage units has the effect of classifying more urban data zones as deprived at the expense of data zones in small towns and rural areas. This could be a result of over-shrinkage as previously discussed, since the trend for lower prevalence of extreme deprivation in more rural areas will lead to greater shrinkage in those rural areas where deprivation is observed to be extreme.

The same summaries are shown for each LA in Table A.1 ( Appendix A). Glasgow City has by far the highest proportion of such data zones, at 54%, though under each alternative method, this proportion is reduced by 1-2%. Similarly, the proportion of very deprived data zones in North Lanarkshire is reduced under every alternative method. The opposite is seen in Aberdeen City, East Renfrewshire, Angus, Perth & Kinross and Aberdeenshire Council areas, with an increase in the number of deprived data zones under each univariate shrinkage alternative.

The differences between the numbers of data zones classified as highly deprived using the various different univariate approaches to shrinkage are, in general, small. This does not imply that these differences are insignificant, since there is a tendency for the alternative approaches to result in more data zones being classified as highly deprived amongst the otherwise less deprived LAs.

Ranking LAs as in Table A.1, by the proportion of data zones classified as highly deprived under the original algorithm, those LAs in the lower half of the distribution (Stirling - Shetland) contain 53 deprived data zones under the original method, and 64, 65, 63 or 59 under the alternatives of no, national, IG or rural group shrinkage respectively. Under all four alternative methods in these 16 LAs, only once is a LA determined to have fewer deprived data zones (West Lothian, under the no shrinkage option). For comparison, in the more deprived half of the distribution (Glasgow City - Aberdeen City), amongst the 64 LA-by-shrinkage combinations, 21 result in more deprived data zones, 21 in fewer deprived data zones and 22 result in no difference, compared to the original method.

Figure 9.2 Median and 5-95% range of SIMD ranks for each data zone, ordered by original SIMD rank

Figure 9.1Median and 5-95% range of SIMD ranks for each data zone, ordered by original SIMD rank

9.1.2. Simulated Indicator Datasets

Figure 9.1 shows the median, 5th and 95th percentiles of simulated SIMD ranks using the original algorithm, plotted against the observed SIMD rank for each data zone. Note that the median simulated SIMD rank for each data zone is not in general the same as the observed SIMD rank, though these are very highly correlated (99.98%). The variation in ranks is less than often seen in the literature regarding rank uncertainty estimates. These applications generally deal with smaller numbers of units of analysis, ranked according to fewer indicators than being used here. The apparent stability of SIMD ranks may reflect the use of a large number of individual indicator variables, so that the variance of the composite multiple deprivation index is small. Also, since a large number of data zones are being characterised, and these data zones are very heterogeneous with respect to the deprivation indicators being used, it is relatively easy to discriminate between data zones with high and low levels of deprivation.

The fact that the variance of rankings near to the extremes of the deprivation scale converge towards zero indicates that the algorithm as a whole can reliably identify these extreme data zones. The way the SIMD is constructed requires that a data zone must show high levels of deprivation on a number of domains in order to be labeled as extremely deprived. Furthermore, each domain is constructed from a number of indicator variables which are positively correlated nationally, so such data zones will tend to have high levels of deprivation on many of these indicators. It is therefore unlikely, even allowing for random variation of the indicators, that the most deprived data zones will be assigned SIMD ranks far from its observed value.

Figure 9.2 presents the standard deviation ( SD) of simulated SIMD rankings for each data zone, plotted against the original SIMD rank. Rank variability is least at the extremes of the distribution, but in general, is seen to be less in the more deprived half of the distribution. This is in agreement with the impression given by Figure 9.1, and reflects a favourable quality of the SIMD algorithm, since a major purpose of the SIMD is to identify the most deprived data zones, rather than the least deprived.

Figure 9.3 Standard deviation of simulated SIMD ranks for each data zone, ordered by original SIMD rank, with estimated association between rank variation and original SIMD rank

Figure 9.2Standard deviation of simulated SIMD ranks for each data zone, ordered by original SIMD rank, with estimated association between rank variation and original SIMD rank

In order to evaluate alternatives to the original SIMD method, we can calculate, in each data zone, the SD of the SIMD ranks across the simulations. Expressing these SDs relative to those obtained using the original algorithm gives a measure of variation of the SIMD ranks provided by each method, with values >1 indicating greater variability compared to the original algorithm. In general, the different methods for calculating SIMD ranks do not result in uniformly greater or lesser variability compared to the original algorithm.

Plotting these ratios against the original observed SIMD ranks shows the variation of SIMD ranks produced by each shrinkage alternative, relative to the original method, over the full range of multiple deprivation. Figure 9.3 shows these estimates for the alternative shrinkage methods of no shrinkage, shrinkage to national average values, shrinkage to IG averages and shrinkage to average values in groups of data zones defined by the SE Urban/Rural Indicator.

No shrinkage and shrinkage to IG averages produce SIMD ranks that show greater variability than the original algorithm over the whole deprivation range. Ratios are least within the third quartile of ranks and increase sharply at the least deprived end of the distribution. Towards the more deprived end of the distribution, SD ratios are relatively constant, with less than 4% additional variation compared to the original SIMD ranks. Shrinkage to national and rural group averages produce similar patterns, though levels of variability in the third quartile are similar to or less than with the original algorithm.

Previously, we reported the numbers of data zones classified as severely deprived (defined as lying within the most deprived 15% of data zones nationally) in subgroups of data zones defined by the SE Urban/Rural Indicator and by LAs. Using the simulation results, it is possible to estimate the probability that each data zone lies within the most deprived 15%, as the number of simulations in which this is the case, divided by the total number of simulations (here, 1,000). Some data zones will have an estimated probability of 1 of being severely deprived, and for some the probability will be zero, though for a number of data zones around the 15% threshold, the probability will be strictly between 0 and 1.

Figure 9.4 Alternative shrinkage methods: estimated association (with 95% CI) between standard deviation (SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and original SIMD rank

Figure 9.3Alternative shrinkage methods: estimated association (with 95% CI) between standard deviation ( SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and original SIMD rank

The sum of these probabilities over subgroups of data zones relative to the total number of data zones in the subgroup, gives a summary measure akin to the percentage of data zones in a subgroup that are severely deprived. Those data zones certainly within the 15% threshold are counted fully, and those certainly outwith the threshold are not counted at all, but those data zones near to the cut-off are included with weights equal to the likelihood that they truly are severely deprived. Data zones that are, on average, within the 15% most deprived are given weights greater than ½, and those occasionally found to have severe deprivation are given weights less than ½.

If the SIMD ranks were calculated with estimates of uncertainty attached, it would be possible to consider the share of the most deprived areas or allocate funding based on these probabilities, rather than the strict inclusion method generally used. LAs with data zones just outside the chosen threshold would have these areas included in their allocation calculation, and those data zones just within the threshold would be given less weight than the most deprived areas.

Rurality Group

Total

Shrinkage Method

Original

None

National

IG

Rural Group

N

N (%)

N (%)

N (%)

N (%)

N (%)

Large Urban

2432

671.8 (27.6)

664.6 (27.3)

664.4 (27.3)

666.0 (27.4)

674.9 (27.8)

Other Urban

1892

226.7 (12.0)

230.3 (12.2)

230.2 (12.2)

232.9 (12.3)

228.0 (12.0)

Accessible Small Towns

666

34.7 (5.2)

35.9 (5.4)

36.5 (5.5)

33.3 (5.0)

34.0 (5.1)

Remote Small Towns

189

14.6 (7.7)

16.2 (8.6)

15.7 (8.3)

16.8 (8.9)

14.7 (7.8)

Accessible Rural

930

24.3 (2.6)

24.5 (2.6)

24.6 (2.6)

23.2 (2.5)

21.2 (2.3)

Remote Rural

396

2.9 (0.7)

3.5 (0.9)

3.6 (0.9)

2.8 (0.7)

2.2 (0.6)

Table 9.3Number of data zones in each rurality group, and probability-weighted number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method and alternative univariate shrinkage options

Table 9.3 shows the probability-weighted numbers of data zones classified as severely deprived in subgroups of data zones defined by the SE Urban/Rural Indicator. Overall, patterns of deprivation are very similar to those found previously (Table 9.2). No shrinkage and shrinkage towards a single national average both result in fewer severely deprived data zones from large urban areas, and slightly more in all other types of area, supporting the view that these methods do not mask isolated areas of extreme deprivation. Shrinkage to rural group averages has the opposite effect, again supporting the idea that this approach accentuates this over shrinkage. Using IG averages for shrinkage has no clear pattern of effects on the distribution of data zones classified as suffering high levels of multiple deprivation. The same summaries for LAs are shown in Table B.1 ( Appendix B).

Whereas using a strict threshold to define data zones amongst the most deprived 15% nationally identifies 975 such areas, the number of data zones with a non-zero probability of lying within the most deprived 15% (as defined by the simulated indicator datasets method used here) is generally much larger. In this way, for example, the distribution of a larger number of data zones contribute towards decisions based on the locations of the most deprived data zones. For the shrinkage alternative considered here, the numbers of data zones with non-zero probabilities of being severely deprived are 1,354 for the original method, 1,355 for both the no shrinkage and rurality group options, 1,360 for the IG shrinkage method and 1,362 when using a single, national shrinkage area.

9.2. Multivariate Shrinkage Methods

9.2.1. Multivariate Analogue to Univariate Method

The method of Longford 13, a matrix-based analogue of the univariate shrinkage method:

scientific formula

is appealing for the apparent simplicity of the above formula. However, in the original SIMD algorithm, there are a number of data issues that mean that the basic univariate shrinkage formula cannot be applied directly.

A number of raw indicator variables have zero denominators in some data zones. For example, the data used to construct the CMF and CIF indicators include the numbers of males and females within 5-year age bands up to age 89, or over 90 years of age. At the extremes of these age ranges, some data zones will include no individuals. Consequently, neither the observed rate, nor the variance of its logit can be calculated, and the data zone cannot be included in the shrinkage formula.

Correlation

Univariate
shrinkage

Multivariate
Shrinkage

Alcohol-related discharges

0.9998

0.9968

Drug-related discharges

0.9990

0.9921

Emergency admissions

0.9999

0.9989

Prescriptions for anxiety, depression or psychosis

0.9893

0.9776

Live singleton births of low birthweight

0.7795

0.4981

Table 9.4Correlations between shrunken indicators from the Health domain derived using random effects models and those derived using the original shrinkage method

In the univariate case this is not a major problem. Such data zones are removed prior to shrinkage of rates, after which a "shrunken" value of zero is inserted. This decision is unlikely to adversely affect the results in any way, since with a denominator of zero, the particular age-sex group can contribute nothing to the expected numbers of events used to calculate age- and sex-standardised indicators. For single Binomial indicator variables, this process of shrinkage followed by insertion of zeros at data zones with no relevant population has little impact, since such data zones, having no "at risk" individuals, are observed to have no deprivation according to that indicator.

In the matrix-based multivariate case, such a procedure is not possible, since the removal of data for a single indicator results in the loss of all data for that data zone. There is no immediately apparent analogue to the procedure used in the univariate setting that could be applied to modify the multivariate approach, and this method was not pursued further.

9.2.2. Random Effects Models

Table 9.4 shows the correlations between the shrunken estimates of the five indicator variables of alcohol-related discharges, drug-related discharges, emergency admissions, prescriptions for anxiety, depression or psychosis and low birthweight singleton births obtained using the original method and those obtained using random effects models. Shrunken estimates were obtained from random effects models for each indicator variable individually, and from a multivariate model of all five indicators simultaneously.

Correlations are higher when each indicator is shrunk independently, as might be expected since under these models each indicator value is being shrunk towards its LA average, whereas under the multivariate model, the movement of each indicator is affected by the values of every other indicator in the same data zone. Correlations under both models for the hospital episode data are very high, in excess of 99%; for the prescriptions indicator this is slightly reduced, possibly reflecting a slight deviation of the data from the assumed distribution, bearing in mind that the values of this indicator are themselves estimated.

The low birthweight indicator shows much lower correlation between the original method and both random effects models. It is probable that this indicates a more severe departure from the Poisson distribution (and from the Binomial distribution). However, since this is a single indicator amongst seven used in the health domain, the effects on the final domain score and SIMD are likely to be small. Nevertheless, a more detailed consideration of this indicator would be required if such a method were to be implemented in practice.

Figure 9.5 Relationships between shrunken values of prescriptions for anxiety, depression or psychosis and live singleton births of low birthweight derived using multivariate random effects model and those derived using the original shrinkage method

Figure 9.5Relationships between shrunken values of prescriptions for anxiety, depression or psychosis and live singleton births of low birthweight derived using multivariate random effects model and those derived using the original shrinkage method

The relationships between the shrunken estimates for the hospital episode indicators obtained by the original method and those obtained using random effects models are almost perfectly linear. For prescriptions for anxiety, depression or psychosis and live singleton births of low birthweight, the relationships between the original method and the multivariate shrinkage method are shown in Figure 9.4. The poor association observed for the low birthweight indicator is clear.

Whilst this method can be applied to small groups of indicator variables, it is not feasible to fit corresponding models to larger groups of variables. For example, the mortality data used to construct the SMF indicator consist of the numbers of deaths in 38 age-sex groups for each data zone over a four-year period. Fitting an analogous model to that used for the other health domain indicator variables would involve 741 variance-covariance terms at both the between- LA and between-data zone levels.

An alternative parameterisation was used in an attempt to obtain useful shrunken estimates; it was assumed that there were sex-specific quadratic associations between mortality rates and age, with the terms for age and age2 in males and females being random effects over LAs and data zones. In practice, however, this model failed to converge, and was not explored further.

9.3. Spatial Shrinkage Methods

Table 9.5 shows the correlations between the shrunken estimates of the five indicator variables of alcohol-related discharges, drug-related discharges, emergency admissions, prescriptions for anxiety, depression or psychosis and low birthweight singleton births obtained using the original method and those obtained using spatial correlation models. A similar pattern is found as with the random effects models, with high correlations for the hospital episode indicators, but lesser correlations for other indicators, particularly the low birthweight variable.

Correlation

Alcohol-related discharges

0.9992

Drug-related discharges

0.9961

Emergency admissions

0.9996

Prescriptions for anxiety, depression or psychosis

0.9619

Live singleton births of low birthweight

0.7116

Table 9.5Correlations between shrunken indicators from the Health domain derived using spatial correlation models and those derived using the original shrinkage method

Figure 9.5 shows the relationships between the shrunken estimates obtained by spatial shrinkage and those obtained by the original method for the hospital episode indicators that exhibit less than 99% correlation with the original estimates.

On closer inspection of the results of fitting the spatial shrinkage models to these indicators, it is apparent that whilst the models for alcohol-related discharges, drug-related discharges and emergency admissions have converged reasonably well, the convergence of the model for the prescription data is not yet complete, and for the birthweight data is clearly inadequate. Nevertheless, all of the results shown could be improved if the model fitting procedures were allowed to run for longer. Convergence of the prescriptions model could be improved by making better choices for the starting points for the MCMC algorithm, possibly using the results of the fitting procedure thus far and repeating the process. The reasons for non-convergence of the birthweight indicator are less clear and would require further investigation.

9.4. Factor Analysis

9.4.1. Observed Indicator Data

Table 9.6 shows the correlations between the original SIMD 2004 methodology and alternatives based on using the same factor analysis ( FA) method, but with different transformations of the component indicator variables. The original method used ranking followed by a standard Normal transformation. The alternatives shown are no transformation, ranking only, and ranking followed by an Exponential transformation.

Figure 9.6 Relationships between shrunken values of prescriptions for anxiety, depression or psychosis and live singleton births of low birthweight derived using spatial correlation models and those derived using the original shrinkage method

Figure 9.5Relationships between shrunken values of prescriptions for anxiety, depression or psychosis and live singleton births of low birthweight derived using spatial correlation models and those derived using the original shrinkage method

Transformation

None

Rank

Rank + Exponential

SIMD Score

0.9991

0.9996

0.9994

SIMD Rank

0.9985

0.9997

0.9987

Health Domain Score

0.9371

0.9889

0.9292

Health Domain Rank

0.9844

0.9973

0.9868

Education Domain Score

0.9961

0.9867

0.9265

Education Domain Rank

0.9980

0.9983

0.9913

Access Domain Score

0.8200

0.9841

0.9217

Access Domain Rank

0.9715

0.9962

0.9762

Table 9.6Correlations of SIMD, health domain, education domain and access domain scores and ranks between original algorithm and methods using Factor Analysis applied to alternative transformations of component indicator variables

The correlations for SIMD scores and ranks are high, all in excess of 99%, partly reflecting the fact that the domains affected contribute a combined weight of only 37% towards the overall score. Domain score and rank correlations are slightly lower, though using a rank transformation alone is seen to have less effect than other options. In general, alternative transformations have less impact on domain ranks than domain scores, an additional explanation for the lack of effect seen on the SIMD scores and ranks, which are constructed using ranked domain scores. Notably, the education domain score with no transformation shows a particularly high correlation with the original, suggesting the component indicator variables for this domain, or at least those with the largest score loadings as defined by FA, are well approximated by a Normal distribution.

Rurality

Group

Total

Transformation used prior to Factor Analysis

Rank +

Normal

None

Rank

Rank +

Exponential

N

N (%)

N (%)

N (%)

N (%)

Large Urban

2432

673 (27.7)

676 (27.8)

671 (27.6)

677 (27.8)

Other Urban

1892

228 (12.1)

227 (12.0)

229 (12.1)

224 (11.8)

Accessible

Small Towns

189

15 (7.9)

14 (7.4)

14 (7.4)

15 (7.9)

Remote

Small Towns

666

34 (5.1)

33 (5.0)

36 (5.4)

35 (5.3)

Accessible

Rural

930

23 (2.5)

23 (2.5)

23 (2.5)

22 (2.4)

Remote

Rural

396

2 (0.5)

2 (0.5)

2 (0.5)

2 (0.5)

Table 9.7Number of data zones in each rurality group, and number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method (Rank + Normal) and alternative transformations used prior to factor analysis

Transformation

None

Rank

Rank +

Normal

Rank +

Exponential

SIMD Score

0.9896

0.9988

0.9993

0.9988

SIMD Rank

0.9844

0.9991

0.9992

0.9976

Health Domain Score

0.8232

0.9833

0.9928

0.9337

Health Domain Rank

0.8363

0.9907

0.9922

0.9765

Education Domain Score

0.8699

0.9751

0.9863

0.9117

Education Domain Rank

0.8773

0.9855

0.9867

0.9690

Access Domain Score

0.6247

0.9830

0.9980

0.9191

Access Domain Rank

0.8823

0.9945

0.9971

0.9701

Table 9.8Correlations of SIMD, health domain, education domain and access domain scores and ranks between original algorithm and methods using Principal Components Analysis applied to alternative transformations of component indicator variables

Rurality Group

Total

Original

Transformation used prior to Principal Components Analysis

Rank +
Normal

None

Rank

Rank +
Exponential

N

N (%)

N (%)

N (%)

N (%)

N (%)

Large Urban

2432

673 (27.7)

684 (28.1)

682 (28.0)

681 (28.0)

683 (28.1)

Other Urban

1892

228 (12.1)

222 (11.7)

216 (11.4)

225 (11.9)

222 (11.7)

Accessible Small Towns

666

34 (5.1)

33 (5.0)

34 (5.1)

33 (5.0)

33 (5.0)

Remote Small Towns

189

15 (7.9)

13 (6.9)

13 (6.9)

13 (6.9)

15 (7.9)

Accessible Rural

930

23 (2.5)

21 (2.3)

28 (3.0)

22 (2.4)

20 (2.2)

Remote Rural

396

2 (0.5)

2 (0.5)

2 (0.5)

1 (0.3)

2 (0.5)

Table 9.9Number of data zones in each rurality group, and number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method, using factor analysis applied to ranked and Normally transformed indicators, and methods using principal components analysis applied following alternative transformations of indicator variables

Table 9.7 shows the number and percentage of data zones within the most deprived 15% of data zones nationally, in groups of data zones classified by the SE Urban/Rural indicator. The effects of alternative transformations used in conjunction with factor analysis appears to have minor impact on the distribution of these data zones. This conclusion is confirmed in Table A.2 ( Appendix A), showing minor differences in the distribution of the most deprived data zones across LAs under these alternatives.

Table 9.8 shows the correlations between the original SIMD 2004 methodology and alternatives based on replacing FA with principal components analysis ( PCA), also with different transformations of the component indicator variables. The alternatives shown are no transformation, ranking only, ranking followed by a standard Normal transformation, and ranking followed by an Exponential transformation.

Again, correlations for the overall scores and ranks are high. Using PCA with ranked and Normally transformed indicator variables has a minimal effect, though the combination of using PCA and an alternative transformation produces domain scores and ranks that are less well correlated with those originally produced; the exception to this being the rank transformation alone, for which correlations remain high.

Figure 9.7 Algorithms using factor analysis in combination with alternative prior transformations: estimated association (with 95% CI) between standard deviation (SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and orig

Figure 9.6Algorithms using factor analysis in combination with alternative prior transformations: estimated association (with 95% CI) between standard deviation ( SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and original SIMD rank

Table 9.9 shows the distribution of the top 15% most deprived data zones under the PCA alternatives, in comparison with the original SIMD data. All lead to greater numbers of data zones classified as extremely deprived in large urban areas; this is mainly at the expense of other urban areas, though there are also slight falls in the numbers of deprived data zones in

Rurality Group

Total

Transformation used prior to Factor Analysis

Rank + Normal

None

Rank

Rank + Exponential

N

N (%)

N (%)

N (%)

N (%)

Large Urban

2432

671.8 (27.6)

669.9 (27.5)

670.3 (27.6)

672.7 (27.7)

Other Urban

1892

226.7 (12.0)

226.7 (12.0)

228.4 (12.1)

224.8 (11.9)

Accessible Small Towns

666

34.7 (5.2)

34.6 (5.2)

35.1 (5.3)

34.1 (5.1)

Remote Small Towns

189

14.6 (7.7)

15.3 (8.1)

14.1 (7.5)

15.3 (8.1)

Accessible Rural

930

24.3 (2.6)

25.0 (2.7)

24.6 (2.6)

24.7 (2.7)

Remote Rural

396

2.9 (0.7)

3.4 (0.9)

2.5 (0.6)

3.3 (0.8)

Table 9.10.Number of data zones in each rurality group, and probability-weighted number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method and algorithms using factor analysis in combination with alternative prior transformations

The impact of these alternative methods on the distribution of very deprived data zones across LAs is shown in Table A.3 ( Appendix A). Glasgow City has slightly larger numbers of these data zones, but the proportions in Dundee City increase by approximately 10-12% under each alternative. PCA applied without transformation of indicator variables causes some large changes, with relative increases of 64% in Falkirk and 11% in North Lanarkshire, and a reduction of 31% in South Ayrshire."

9.4.2. Simulated Indicator Datasets

Figure 9.6 shows the estimated association between variation in SIMD ranks and original SIMD ranks under algorithms using FA with alternative prior transformations, with variation expressed as the SD of rankings applied to each data zone over 1,000 simulated indicator datasets, relative to the SD obtained from the original algorithm. Applying FA to the raw indicator variables results in SIMD ranks that are less variable than under the original algorithm across the whole range of deprivation. Using ranked indicators as inputs to FA results in extra variability at either end of the deprivation distribution, but particularly amongst the most deprived data zones. Applying an exponential transformation to ranked indicators yields less variation for the most deprived data zones, but more variation elsewhere, with an increase of more than 10% in the least deprived areas.

Calculating the probability of each data zone lying in the 15% most deprived nationally, the probability-weighted distribution of the most deprived areas can be estimated. Table 9.10 shows this distribution with respect to groups of data zones defined by the SE Urban/Rural Indicator. Applying FA to the raw indicator variables results in a redistribution of the most deprived data zones away from large urban areas and into small towns and rural areas. The patterns of redistribution using other options for transformation of indicator variables are less clear. The same data are shown by LA in Table B.2 ( Appendix B).

Figure 9.8. Algorithm using principal components analysis with or without prior transformation: estimated association (with 95% CI) between standard deviation (SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and origina

Figure 9.7.Algorithm using principal components analysis with or without prior transformation: estimated association (with 95% CI) between standard deviation ( SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and original SIMD rank

Using PCA in place of FA for the calculation of domain scores, the patterns of variation in SIMD ranks in relation to the original SIMD ranks are similar to those obtained using FA with each alternative prior transformation of indicator variables. Overall, variation in SIMD ranks are greater when using PCA. This is particularly apparent with no prior transformation; followed by FA, this produced SIMD ranks that were less variable than the original algorithm (Figure 9.6), with PCA the SDs of ranks are approximately 30% greater over much of the deprivation distribution, rising to twice the original in the least deprived data zones (Figure 9.7).

Rurality Group

Total

Original

Transformation used prior to Principal Components Analysis

Rank + Normal

None

Rank

Rank + Exponential

N

N (%)

N (%)

N (%)

N (%)

N (%)

Large Urban

2432

671.8 (27.6)

679.3 (27.9)

677.6 (27.9)

677.6 (27.9)

679.4 (27.9)

Other Urban

1892

226.7 (12.0)

224.0 (11.8)

216.5 (11.4)

226.0 (11.9)

221.9 (11.7)

Accessible Small Towns

666

34.7 (5.2)

32.2 (4.8)

36.8 (5.5)

32.1 (4.8)

32.5 (4.9)

Remote Small Towns

189

14.6 (7.7)

14.0 (7.4)

13.1 (6.9)

13.7 (7.2)

14.8 (7.8)

Accessible Rural

930

24.3 (2.6)

23.3 (2.5)

26.7 (2.9)

23.7 (2.5)

23.7 (2.5)

Remote Rural

396

2.9 (0.7)

2.2 (0.6)

4.2 (1.1)

2.0 (0.5)

2.6 (0.7)

Table 9.11.Number of data zones in each rurality group, and probability-weighted number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method (factor analysis applied to ranked and Normally transformed indicators), and methods using principal components analysis applied following alternative transformations of indicator variables

Correlation

SIMD Score

0.9978

SIMD Rank

0.9965

Health Domain Score

0.9192

Health Domain Rank

0.9159

Education Domain Score

0.9957

Education Domain Rank

0.9962

Access Domain Score

0.9842

Access Domain Rank

0.9964

Table 9.12. Correlations of SIMD, health domain, education domain and access domain scores and ranks between original algorithm and the method applying Generalized Factor analysis to the health, education and access domains

The implications of taking account of uncertainty through the use of simulated indicator datasets is shown in Table 9.11. All PCA-based methods result in a greater probability of data zones in large urban areas being classified as severely deprived, mainly associated with reductions in the numbers of such data zones in other urban areas and small towns. The effect of these changes on the LA distribution of extremely deprived data zones is shown in Table B.3 ( Appendix B).

9.4.3. Generalized Factor Analysis

Table 9.12 shows the correlations between the original SIMD 2004 methodology and an alternative where generalized factor analysis was applied to the health, education and access domains rather than applying conventional factor analysis to the ranked, normalized domain scores.

Throughout, the correlations for ranks and scores were similar. The overall SIMD scores and ranks were highly correlated, again partly reflecting the fact that the domains affected contribute a combined weight of only 37% towards the overall score. The health domain scores and ranks were least highly correlated. This may have been due to five indicators in this domain being processed in their original form as Binomial variables, rather than ranking and transformation to a Normal distribution as in the original SIMD. For the access and education domains, rank and score correlations were exceedingly high. In the case of access, this reflects the Normal distribution of the log-transformed mean access times. For education, achieving convergence of the WinBUGS model-fitting proved difficult. A solution was achieved by using a ranked and Normally distributed transformation of the absence indicator data. It appeared that this indicator did not follow a Binomial distribution as had originally been anticipated.

Table 9.13 shows the number and percentage of data zones within the most deprived 15% of data zones nationally, in groups of data zones classified by the SE Urban/Rural indicator. Applying generalized rather than conventional factor analysis had very little impact on the distribution of these data zones. This conclusion is confirmed in Table A.4 ( Appendix A), showing minor differences in the distribution of the most deprived data zones across LAs under these alternatives.

Rurality Group

Total

Transformation used prior to Factor Analysis

Original:
Rank + Normal

Generalized FA in health, education, access domains

N

N (%)

N (%)

Large Urban

2432

673 (27.7)

673 (27.7)

Other Urban

1892

228 (12.1)

227 (12.0)

Accessible Small Towns

666

34 (5.1)

33 (5.0)

Remote Small Towns

189

15 (7.9)

15 (7.9)

Accessible Rural

930

23 (2.5)

24 (2.6)

Remote Rural

396

2 (0.5)

3 (0.8)

Table 9.13.Number of data zones in each rurality group, and number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method (Rank + Normal) and an alternative where generalized factor analysis was applied in the health, education and access domains.

9.5. Exponential Transformation

The transformation of domain scores prior to the construction of the SIMD score is in fact a combination of two transformations: a rank transformation followed by an exponential transformation. Whilst the exponential transformation has the same effect on each domain rank, the rank transformation has a different effect on each domain score, so the combined effect of both transformations depends of the distribution of the underlying domain scores.

Figure 9.8 shows the distribution of the Income, Employment and Housing domain scores, as well as the effects on each score of being ranked and exponentially transformed. Scores on these domains are positively skewed, and the transformations combine to an almost linear transformation, except at the very highest score values. For these domains, this step of the algorithm therefore has the effect of "pulling in" some extreme data zones, thereby protecting against outlying values.

Figure 9.9 shows the same graphs for the Health, Education and Access domains; these are approximately Normally distributed by design; the transformed scores are approximately sigmoidal in shape.

The process of combining domain scores to form an index of multiple deprivation implies that with all other domains being equal, a decrease on one domain can be compensated by an increase on another. The SIMD is constructed as a linear combination of ranked and exponentially transformed domain scores. On the scale of the transformed scores, the rate of substitution between pairs of domains is determined by their relative weights in the SIMD.

Figure 9.9. Histograms showing distributions of domain scores, and plots of ranked domain scores and ranked & exponentially transformed domain scores against raw scores; Current Income, Employment and Housing domains

Figure 9.8 .Histograms showing distributions of domain scores, and plots of ranked domain scores and ranked & exponentially transformed domain scores against raw scores; Current Income, Employment and Housing domains

For example, Figure 9.10 shows the Current Income and Employment domain scores. On the ranked and transformed scale, a decrease on one domain is compensated for by an equal increase on the other, since they have equal weight in the SIMD. This is shown by the contour lines, linking points of equal value for the evenly weighted sum of the two ranked and transformed domain scores. On the scale of the ranked domain scores, the contour lines are "bowed out", illustrating the desired property that high ranks one domain should not be cancelled out by low ranks on the other. Alternatively, it is apparent that if a data zone has different ranks on the income and employment domains, it will have a higher combined score than another data zone with ranks on both domains equal to the average of the ranks of the first data zone.

Figure 9.10. Histograms showing distributions of domain scores, and plots of ranked domain scores and ranked & exponentially transformed domain scores against raw scores; Health, Education, Skills & Training and Geographic Access & Telecommunications

Figure 9.9 .Histograms showing distributions of domain scores, and plots of ranked domain scores and ranked & exponentially transformed domain scores against raw scores; Health, Education, Skills & Training and Geographic Access & Telecommunications domains

However, on the scale of the original domain scores, the rate of substitution between the two domains is approximately constant over the region covered by the observed data, a result of the largely linear association between the original and the ranked and transformed domain scores. Also, at the highest levels of deprivation on both scores, the contour lines are slightly "bowed in", as a result of the contraction of the distributions at the highest values.

Two alternative methods of standardizing domain scores prior to their weighted combination will be considered. Firstly, a simple standardisation technique of subtracting the mean and dividing by the standard deviation will be used. Secondly, functions will be sought that can be used to transform domain scores that produce a similar effect to the current two-stage transformation process, but do not include ranking of data zones.

Figure 9.11. Associations between Current Income and Employment domains on the following scales: ranked & transformed, ranked only, untransformed; with contours showing lines of equal combined ranked & transformed scores (equal weights)

Figure 9.10 .Associations between Current Income and Employment domains on the following scales: ranked & transformed, ranked only, untransformed; with contours showing lines of equal combined ranked & transformed scores (equal weights)

9.5.2. Simple Standardisation

As an alternative to the process of ranking and exponential transformation prior to weighted combination of domain scores to produce the SIMD score, a simple standardisation method was used. In other words, each domain score was transformed by subtracting the national mean and dividing by the standard deviation. The resultant SIMD scores and ranks had correlations with those obtained by the original algorithm of 0.9900 and 0.9964 respectively.

Table 9.14 shows the distribution of the 15% most deprived data zones across subgroups defined by the SE Urban/Rural Indicator under the original algorithm and the method using simple standardisation of domain scores prior to their weighted combination to form the SIMD score. The differences between the methods are minor, with a small reduction in the number of data zones in large urban areas being classified as highly deprived, with the distribution moving slightly towards accessible small towns and rural areas. The effects at LA level are shown in Table A.5 ( Appendix A).

Rurality Group

Total

Transformation used prior to weighted combination

Original:
Rank + Exponential

Simple Standardisation

N

N (%)

N (%)

Large Urban

2432

673 (27.7)

668 (27.5)

Other Urban

1892

228 (12.1)

228 (12.1)

Accessible Small Towns

666

34 (5.1)

37 (5.6)

Remote Small Towns

189

15 (7.9)

15 (7.9)

Accessible Rural

930

23 (2.5)

24 (2.6)

Remote Rural

396

2 (0.5)

3 (0.8)

Table 9.14.Number of data zones in each rurality group, and number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method and an alternative where domain scores were standardised by subtracting the mean and dividing by the standard deviation prior to weighted combination

When this alternative method is applied to the simulated indicator datasets, the patterns of variation in SIMD ranks in relation to the original SIMD ranks are shown in Figure 9.11. Use of simple randomisation is observed to produce SIMD ranks with almost the same variation, except in the least deprived data zones, where variability is seen to rise quite sharply, to approximately 15% above that under the original algorithm.

Figure 9.12. Algorithm using simple standardisation of domain scores prior to weighted combination to form SIMD scores

Figure 9.11 .Algorithm using simple standardisation of domain scores prior to weighted combination to form SIMD scores: estimated association (with 95% CI) between standard deviation ( SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and original SIMD rank

The implications of taking account of uncertainty through the use of simulated indicator datasets is shown in Table 9.15. The highly deprived data zones are redistributed slightly away from large urban areas into other areas. The effect of these changes on the LA distribution of extremely deprived data zones is shown in Table B.4 ( Appendix B).

Rurality Group

Total

Transformation used prior to weighted combination

Original:
Rank + Exponential

Simple Standardisation

N

N (%)

N (%)

Large Urban

2432

671.8 (27.6)

665.9 (27.4)

Other Urban

1892

226.7 (12.0)

229.1 (12.1)

Accessible Small Towns

666

34.7 (5.2)

36.1 (5.4)

Remote Small Towns

189

14.6 (7.7)

14.6 (7.7)

Accessible Rural

930

24.3 (2.6)

25.9 (2.8)

Remote Rural

396

2.9 (0.7)

3.4 (0.9)

Table 9.15 .Number of data zones in each rurality group, and probability-weighted number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method and an alternative where domain scores were standardised by subtracting the mean and dividing by the standard deviation prior to weighted combination

9.5.3. Alternative Distributions

Another alternative that was considered was to identify distributions that resembled the two-stage transformation. For the income, employment and housing domains, the overall effect of the current transformation is largely linear, so an identity transformation was chosen. For the domains created using Factor Analysis, the domain scores are, by definition, almost Normally distributed. A suitable transformation might therefore be:

scientific formula

where F is the Normal cumulative distribution function, and s x is the standard deviation of x*.

* Standardisation of x is required only because of the way in which the domain scores are defined. Each is constructed as the weighted combination of indicator variables that have been ranked and transformed to standard Normal variates, with weights that sum to unity. The resulting domains score therefore has zero mean, but variance less than one; this step could be avoided by defining domain scores to have unit variance.

Figure 9.12 shows the distribution of the Health, Education and Access domains after this transformation has been applied. On this transformed scale, these domains appear to have similar distributions to the Income, Employment and Housing domains, and have the same property that ranking and transformation to an exponential distribution has a net result of that is largely a linear transformation, with some high values being tempered.

Figure 9.13. Histograms showing distributions of f(x)

Figure 9.12.Histograms showing distributions of f(x), i.e. domain scores after transformation by scientific formula, and plots of ranked domain scores and ranked & exponentially transformed domain scores against f(x); Health, Education, Skills & Training and Geographic Access & Telecommunications domains

An alternative to the original method of ranking and transforming to an exponential distribution, could therefore be to leave the income, employment and housing domain scores untransformed and to apply the above transformation to the health, education and access domains, followed by simple standardisation of each domain to give the same mean and variance. To determine the extent to which these alternative transformations achieve the desired property of avoiding "canceling out" of opposing levels of deprivation on different domains, we can plot pairs of ranked domain scores and draw contour lines joining points of equal combined deprivation derived using the proposed transformations.

Figure 9.14. Associations between ranked Current Income and Employment domains and between ranked Health and Education, Skills & Training domains

Figure 9.13.Associations between ranked Current Income and Employment domains and between ranked Health and Education, Skills & Training domains, with contours showing lines of equal combined scores using the proposed alternative transformations (equal weights)

Figure 9.13 shows these plots for two pairs of domains: Current Income and Employment, and Health and Education, Skills & Training. In both cases, the contours of equal combined deprivation demonstrate the "bowed out" pattern, indicating that the chosen transformations of each domain score have the desired property of avoiding "cancelling out" of opposing levels of deprivation. These figures are shown for other combinations of domains in Appendix D.

Rurality Group

Total

Transformation used prior to weighted combination

Original:
Rank + Exponential

Transformation of Health, Education & Access domains, followed by simple standardisation

N

N (%)

N (%)

Large Urban

2432

673 (27.7)

678 (27.9)

Other Urban

1892

228 (12.1)

225 (11.9)

Accessible Small Towns

666

34 (5.1)

34 (5.1)

Remote Small Towns

189

15 (7.9)

15 (7.9)

Accessible Rural

930

23 (2.5)

21 (2.3)

Remote Rural

396

2 (0.5)

2 (0.5)

Table 9.16.Number of data zones in each rurality group, and number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method and an alternative where the health, education and access domains scores were transformed by f(x) and then all domains were standardised by subtracting the mean and dividing by the standard deviation prior to weighted combination

Figure 9.15. Algorithm with transformation of the health, education and access domain scores followed by simple standardisation of all domain scores prior to weighted combination to form SIMD scores

Figure 9.14.Algorithm with transformation of the health, education and access domain scores followed by simple standardisation of all domain scores prior to weighted combination to form SIMD scores: estimated association (with 95% CI) between standard deviation ( SD) of simulated SIMD ranks (relative to SD of ranks using original algorithm) and original SIMD rank

Table 9.16 shows the distribution of the 15% most deprived data zones across subgroups defined by the SE Urban/Rural Indicator under the original algorithm and the method transforming the health, education and access domain score using f( x) followed by simple standardisation of all domain scores prior to their weighted combination to form the SIMD score. The differences between the methods are very small, with five additional data zones in large urban areas being classified as highly deprived, at the expense of three data zones in other urban and two from accessible rural areas. The effects at LA level are shown in Table A.6 ( Appendix A). These alternative SIMD scores and ranks had correlations with those obtained by the original algorithm of 0.9983 and 0.9988 respectively.

Applying this method to the simulated indicator datasets, Figure 9.14 shows the patterns of variation in SIMD ranks found in relation to the original SIMD ranks. Over the central half of the original deprivation scale, there is little difference in variability of SIMD ranks and in the most deprived quarter of the distribution there is some evidence of reduced variability. However, in the least deprived quarter of the distribution, the variability in SIMD ranks is markedly greater than under the original method, with the standard deviation of the ranks being more than 40% greater.

The implications of taking account of uncertainty through the use of simulated indicator datasets is shown in Table 9.17. There is little difference between the original method and the alternative considered here. The effect of these changes on the LA distribution of extremely deprived data zones is shown in Table B.5 ( Appendix B).

9.6. Weighting

To evaluate the sensitivity of the SIMD to changes in the weights applied to each domain, a sensitivity analysis was performed. Each domain weight, in turn, was increased and decreased by 10, 25, 50 and 100%. Thus each weight was increased or decreased by a range of amounts, the last of which corresponded to removing the domain completely from the calculation of the SIMD.

Rurality Group

Total

Transformation used prior to weighted combination

Original:
Rank + Exponential

Transformation of Health, Education & Access domains, followed by simple standardisation

N

N (%)

N (%)

Large Urban

2432

671.8 (27.6)

675.2 (27.8)

Other Urban

1892

226.7 (12.0)

224.7 (11.9)

Accessible Small Towns

666

34.7 (5.2)

34.4 (5.2)

Remote Small Towns

189

14.6 (7.7)

14.8 (7.8)

Accessible Rural

930

24.3 (2.6)

23.4 (2.5)

Remote Rural

396

2.9 (0.7)

2.6 (0.7)

Table 9.17.Number of data zones in each rurality group, and probability-weighted number (%) of data zones amongst the 15% most deprived data zones nationally, according to the SIMD calculated using the original method and an alternative where the health, education and access domains scores were transformed by scientific formulaand then all domains were standardised by subtracting the mean and dividing by the standard deviation prior to weighted combination

The results of this process are shown in Table 9.18. Correlations of SIMD scores and ranks are shown between the original algorithm and each modification using alternative sets of weights. Also shown are the numbers of data zones that change classification into and out of the top 15% most deprived data zones.

For adjustments to individual weights of ±10%, the correlation between original and modified SIMD scores and ranks are almost perfect, and no more than a handful of data zones move into or out of the top 15% most deprived nationally. Over all adjustments tried, in all but two cases the correlations between original and modified SIMD scores and ranks are greater than 99%. The exceptions are for extreme changes in the Access domain weights; the lowest correlation (98.4%) is between the original SIMD ranks and the ranks produced when excluding the Access domain. In general, changes to the Access domain weight have the greatest impact, due to the negative correlation between this and other domains. Nonetheless, the number of data zones that change their extreme deprivation classification tends to be greater for those domains with the largest weights under the original algorithm.

Conceptually, the application of weights to the combination of domain scores is intended to reflect the relative importance of deprivation on each of the domains. The current SIMD methodology quotes the weights chosen for each domain with little justification. For the sake of transparency, it might be suggested that the processes leading to the choice of weights should be explicitly defined.

Domain Weight Changed

Income

Employment

Health

Education

Access

Housing

Percentage Increase in
Domain Weight

10%

1.0000
1.0000
4

1.0000
1.0000
2

1.0000
1.0000
3

1.0000
1.0000
4

0.9999
0.9999
3

1.0000
1.0000
1

25%

0.9998
0.9998
10

0.9998
0.9998
6

0.9999
0.9999
8

0.9999
0.9998
11

0.9997
0.9991
5

0.9999
0.9999
6

50%

0.9994
0.9993
17

0.9993
0.9992
13

0.9996
0.9995
12

0.9995
0.9994
21

0.9986
0.9966
7

0.9998
0.9996
10

100%

0.9983
0.9978
29

0.9979
0.9975
31

0.9987
0.9984
27

0.9982
0.9979
32

0.9943
0.9872
23

0.9991
0.9985
20

Percentage Decrease in
Domain Weight

10%

1.0000
1.0000
3

1.0000
0.9999
6

1.0000
1.0000
4

1.0000
1.0000
2

0.9999
0.9999
3

1.0000
1.0000
4

25%

0.9998
0.9997
9

0.9997
0.9997
14

0.9999
0.9998
8

0.9998
0.9998
5

0.9997
0.9991
7

0.9999
0.9999
5

50%

0.9989
0.9986
25

0.9986
0.9984
28

0.9995
0.9994
18

0.9993
0.9992
13

0.9987
0.9962
13

0.9998
0.9996
9

100%

0.9931
0.9918
60

0.9916
0.9902
71

0.9976
0.9969
41

0.9966
0.9960
36

0.9948
0.9842
20

0.9990
0.9981
16

Table 9.18.Correlations between original algorithm and modified algorithms with adjustments to individual domain weights (NB: decrease of 100% corresponds to removal of domain from SIMD calculation), for SIMD Scores and SIMD Ranks, with numbers of data zones changing classification into and out of the 15% most deprived nationally

To this end, the relative weight chosen for each domain can be thought of as capturing two important aspects of deprivation;

  • the proportion of people experiencing this deprivation (prevalence), and
  • the burden of this deprivation (severity),

and the process of defining domain weights can address these two issues separately.

The first is primarily an empirical question, once a definition of prevalence has been achieved. Under the current methodology, a proportionate reduction in the level of income deprivation, with all other deprivation domains held constant, would not affect the resultant SIMD score for each data zone, since domain scores are ranked prior to their combination. If the weight applied to the income domain were to depend upon the prevalence of income deprivation nationally, the choice of weight would be less open to criticism.

The second aspect, of the severity of each form of deprivation, is a value judgement. Methods for estimating these aspects, including Discrete Choice Experiments, could be applied to population samples.

The relative weight applied to each domain in the calculation of the SIMD could then be defined as

scientific formula

where w j is the weight for the j th domain, s j is the severity associated with that domain and p j is the prevalence of that domain of deprivation.

Domain

Prevalent group

Prevalence

Current Income

As defined by Current Income domain

15.0%

Employment

As defined by Employment domain

13.8%

Health

Persons with limiting long-term illness

19.3%

Education, Skills & Training

Working age adults without qualifications

30.7%

Geographic Access & Telecommunications

Persons reporting that post offices, doctors and grocery/food shops are not "very convenient" or "fairly convenient"

11.0%

Housing

Living in overcrowded household

14.1%

Table 9.19.National prevalence estimates for each deprivation domain

For example, Table 9.19 provides estimates of the prevalence of each deprivation domain in the population. The Income and Employment domains act as their own prevalence measures. For the Health, Education and Housing domains, the most prevalent component indicator variables have been chosen to estimate prevalence. For the Access domain, there is no indicator variable within the SIMD indicator dataset that can be used to estimate prevalence; consequently data from the Scottish Household Survey 2003 have been used to estimate the prevalence of this deprivation domain.

For the sake of argument, we can take the current weights used in the calculation of the SIMD to be estimates of the severity of each domain. Combining the prevalence and severity estimates provides the following weights for construction of the SIMD:

  • Income, 0.254;
  • Employment, 0.235;
  • Health, 0.159;
  • Education, 0.252;
  • Access, 0.058;
  • Housing, 0.041.

The resultant correlations with the original algorithm are high for both SIMD scores (0.997) and ranks (0.996), with 33 data zones being reclassified into or out of the severely deprived group.

These figures are presented for illustration only, and we do not propose that these weights should be adopted in practice. The objective of making the method of deriving domain weights more transparent is, we feel, of merit.

« Previous | Contents | Next »

Page updated: Tuesday, October 18, 2005