« Previous | Contents | Next »
Listen
Scottish Indices of Deprivation 2003
Appendix 2: Factor Analysis
In the domains where individuals can be identified as being deprived or not in terms of the domain definition, the number of deprived people can simply be summed and divided by a suitable denominator to create an area rate. In other domains, deprivations tend to exist in different spatial and temporal forms so, for example, an area will be education deprived if the adults in the area have no qualifications or if the children do not obtain any qualifications. These two situations co-exist in an area but relate to different individuals at any given point in time. It is hypothesised that an underlying factor exists at an ecological level that makes these different states likely to exist together in a local area. This underlying factor cannot be measured directly but can be identified through its effect on individuals (e.g. failure to obtain qualifications and failure to enter higher education). These variables need to be combined at an ecological level to create an area score. Fundamentally this score should measure, as accurately as possible, the underlying factor.
There are a number of problems in achieving this goal. The variables: [1] are measured on different scales, [2] have different levels of statistical accuracy, [3] have different distributions, [4] may or may not apply to the same individual and [5] measure, to different degrees, the underlying factor imperfectly. Maximum Likelihood (ML) factor analysis was used with a view to overcoming these problems. Other methods, such as applying a linear-scaling model (i.e. adding a large number of items that purport to measure the same construct together to increase the reliability of a scale - assuming error elements to be non-additive and random), deal with only some. Alternative statistical methods, such as Principal Components Analysis (PCA), do not address all these problems. PCA, for example, ignores measurement error ( error variance) or the variables' imperfect measurement of the underlying construct ( specific variance). This is because it does not attempt to separate common variance (i.e. variance shared between three or more variables) from specific variance and error variance. The appropriate technique, where specific and error variance are suspected (i.e. problems 2 and 5), is a form of common factor analysis of which ML factor analysis is a type .
The premise behind a simple one-common-factor model is that the underlying factor is imperfectly measured by each of the variables in the dataset but that the variables that are most highly correlated with the underlying factor will also be highly correlated with the other variables. By analysing the correlation between variables it is therefore possible to make inferences about the common factor and indeed to estimate a factor score for each case (i.e. ward). This, of course, assumes that the variables themselves are all related to the underlying factor to some extent and are in most cases fairly strongly related to it.
It is not the aim of this analysis to reduce a large number of variables into a number of theoretically significant factors as is usual in much social science use of factor analysis (i.e. exploratory factor analysis). The variables will be chosen because they are believed to measure a single area deprivation factor. The analysis therefore involves testing a one-common factor model against the possibility of there being more than one factor. If a meaningful second common factor is found it would suggest the need for a new domain or the removal of variables. Decisions over whether a meaningful second common factor exist are aided by standard tests and criteria.
Once a satisfactory solution is achieved a factor score can be estimated for each ward. That is, the combined indicators, using weights generated by the factor analysis process, are then used as the domain score. Thomson's method for estimating factor scores was used.
« Previous | Contents | Next »