« Previous | Contents | Next »
Listen
11. Recommendations
During the initial discussions and later plans for the implementation of this project, the authors consulted in some detail with members of an Expert Advisory Group ( EAG), some of whom also provided additional analyses that were incorporated into this report. In particular, the methods described in Sections 3.4 and 3.5, and the results presented in Section 9.5 and 9.6, are based largely on work carried out by Hugh Gravelle and Matt Sutton, with modifications made in the light of observations made by Chris Dibben, and the computer program used to fit the spatial shrinkage model applied in Section 9.3 was written by Alistair Leyland.
The interpretation of the findings of this evaluation, and the recommendations made in this section, are the views of the authors, and do not necessarily reflect the opinions of the EAG. On submission of this report to the funding organisation, the members of the EAG were given the opportunity to submit written comments on the final draft, which are included in this report in Appendix F.
11.1. Outline
The purpose of this project was to evaluate the statistical methods used in the construction of the SIMD 2004, informally described as a "Health Check". Other than a minor error in the original program code, the SIMD methodology could be said to have passed this checkup. We have found little evidence that any of the methods used is invalid for the purpose of creating the SIMD or its constituent domain scores.
The one aspect of the methodology that could be questioned is the use of shrinkage for indicators in the Health and Education, Skills & Training domains. Shrinkage reduces variability in a dataset at the expense of bias. It would be expected that the extent of this bias (in terms of the likelihood of misclassifying a truly deprived data zone as not being amongst the most deprived nationally) will be greatest for deprived data zones within otherwise less deprived shrinkage areas, resulting in a small tendency for the current method to underestimate levels of deprivation in the least deprived LAs. Avoiding such an effect was the stated desire of moving the SIMD 2004 to smaller geography to detect pockets of deprivation. We found some evidence of this effect when we compared the current figures with those using either a single national shrinkage area for all data zones, or not using shrinkage at all.
It is not necessarily possible to say with any certainty what methods are best since there is no Gold Standard measure of deprivation with which to compare results. However, the methods used to produce any measure of deprivation should be as straightforward and simple to comprehend as possible. We feel that the methodology could be simplified with little change to the resultant SIMD scores and ranks, and would recommend that future editions of the SIMD adopt these simplifications.
We would also recommend the publication of uncertainty estimates alongside the deprivation measures currently produced. Under the current methodology, it would be overly complex to produce these estimates, but with a simplified algorithm, it should be feasible.
11.2. Specific Recommendations
The following recommendations are the views of the authors of this report. Recommendations 11.2.1 and 11.2.6 would simplify the methodology for the calculation of small area deprivation measures in Scotland, and would have little impact on the actual values of these measures. Recommendation 11.2.2(i) is more computationally difficult, but is conceptually more appealing than the current method. Recommendation 11.2.2(ii) would standardise the methodology in the sense that indicator variables in each domain would undergo the same process to produce a domain score. Recommendation 11.2.3 would remove the need to rank domain scores before combining them into the SIMD. Recommendation 11.2.4 does not apply to the statistical methodology in particular, but would add transparency to the rationale for this step of the algorithm.
Recommendation 11.2.5 would perhaps have the greatest benefits in terms of improving the interpretation and application of the Scottish Indices of Deprivation. Its implementation would require simplification of the methodology, but if such an approach were to be adopted, it could incorporate a more general factor analysis method with little additional computation.
11.2.1. Shrinkage
We recommend that the shrinkage step of the algorithm is removed. It has little effect on the resultant indices and by shrinking towards LA averages, introduces a small bias that penalises data zones within otherwise less deprived areas. The application of shrinkage within some domains but not others does not constitute a consistent approach, and the use of Factor Analysis results implicitly in a degree of shrinkage.
11.2.2. Factor Analysis
(i) We recommend that a Generalised Factor Analysis method be adopted. Though a less simple method to implement than Maximum Likelihood Factor Analysis, it removes the need to rank and transform indicator variables first, recognising the natural distribution of each variable, and observing the degree of separation between data zones on the original scale of each indicator.
(ii) We recommend that Generalised Factor Analysis be considered for the Current Income, Employment and Housing domains, as well as the three domains that currently undergo FA. This would result in a consistent methodology being applied to each domain. The result for the Current Income and Employment domains would be equivalent to shrinkage towards a single higher level average, and would have little impact on the ranking of data zones; for the Housing domain, being constructed from two indicators, the likely effect on the relative positions of each data zone would also be small. All domains could be expressed as standard Normal deviates; the Current Income and Employment domains could also be expressed on the scale of the original measurements.
11.2.3. Exponential Transformation
We recommend that domain scores, expressed as standard Normal deviates, are transformed by the function
, or similar, prior to their weighted combination to form the SIMD (if all domains are defined to have zero mean and unit variance, further standardisation is not necessary). This would have similar effects to the current method in terms of avoiding "canceling out" of opposing levels of deprivation on different domains, but would not require ranking of data zones, thereby preserving the degree of separation between areas on each domain.
11.2.4. Weighting
We recommend that the methods by which domain weights are derived is made more explicit, reflecting the importance of each domain in terms of its prevalence experienced by those living in Scotland and its severity. .
11.2.5. Uncertainty
We recommend that measures of deprivation be estimated using Markov Chain Monte Carlo methods and the posterior distributions of these measures be used to produce uncertainty intervals for the deprivation score and rank of each data zone. This would be feasible only if a simplified methodology were adopted for the calculation of deprivation measures.
11.2.6. Other Recommendations
We recommend that the CMR, CIF and Adults without Qualifications indicators be replaced by standardised ratios of the observed numbers of events to the expected numbers in each data zone, given the national age-sex distribution of events. This would be simpler to model statistically and aid the calculation of estimates of uncertainty, which would automatically take account of any loss of stability with this method.
11.3. Concluding Remarks
We have found that a number of simplifications to the current methodology could be made with little impact on the resulting deprivation indices. If anything, the current methods are slightly biased in favour of large urban areas, and a simplified algorithm could be viewed as more equitable. If one of the objectives of the current indices is to adopt simple and transparent methods where possible, then these simplifications would assist in this aim.
However, the current method of Factor Analysis ( FA) employs the imposition of artificial distributions to component indicator variables, and a more general method could be used that would recognise the natural distribution of each indicator and preserve the relative differences between data zones on each variable. Such a method would involve statistical techniques that are more complicated to apply, but no more difficult to understand.
One interpretation of the similarity between the current SIMD and the alternative produced without shrinkage is that an element of shrinkage is taking place during the application of FA. The observation that no shrinkage produces similar results to shrinkage towards a single higher level average implies that using a single shrinkage area in general preserves the ranking of data zones. Extending this argument, Generalised FA could be applied to all six domains without prior shrinkage of indicator variables, and would yield a similar ranking of data zones on the resulting SIMD. Such an approach is appealing in that each domain is treated in the same way, with the observed data assumed to represent an observation from a distribution whose expected value is linearly related to some underlying deprivation factor.
The transformation of domain scores to an exponential distribution as a way to avoid "canceling out" is a feature particular to the current methodology. We have provided an alternative transformation for the Normally distributed domain scores that does not involve ranking and recognises the distance between areas on the original indicators. We have also noted that transformation of the remaining domain scores is not required in order to achieve the stated aim; however, should FA be applied to these domains as well, the resultant "shrunken" domain scores could be represented on a scale that is by definition Normally distributed, and the aforementioned transformation could again be applied.
The area in which the current indices could be improved the most is in the calculation of measures of uncertainty in the ranking of each data zone. This would allow the presentation of deprivation indices and summaries at composite area levels with associated confidence intervals, recognising the nature of such indices as estimates rather than truths. This uncertainty could be incorporated into resource allocation algorithms that are currently based on thresholds of deprivation levels, so that data zones close to the threshold would contribute towards the allocation attributed to the higher-level area. The necessary methods could be applied if the current algorithm were to be simplified, and should incorporate the proposed factor analysis methods with little difficulty.
To adopt the recommendations regarding simplification and standardisation of the algorithm and estimation of uncertainty of the resultant indices and ranks would require an investment of time to develop the methodology; we estimate that three months' full-time work by a statistician would be required, plus some additional input from expert(s) in the associated techniques. Subsequent applications of the developed method would be less time consuming.
To develop the proposals regarding the weights used to combine domain scores would involve the construction of measures of prevalence and severity associated with each deprivation domain. The precise techniques that could be used have not been explored in this report, but should be the subject of further research.
« Previous | Contents | Next »