« Previous | Contents | Next »
Listen
6. Factor Analysis
In the current SIMD methodology, indicator variables in the health, education and access domains are combined using maximum likelihood factor analysis ( FA) to form the associated domain scores. Prior to FA, each indicator variable is ranked over data zones, and transformed to a standard Normal distribution.
6.1. Transformation
The original algorithm involves ranking and transforming each indicator variable to a standard normal distribution. This imposes an equal distribution for each indicator; the choice of distribution being made in an attempt to comply with the assumption implicit in the use of FA that the variables follow a multivariate Normal distribution.
Some simple alternatives to this transformation procedure are:
- No transformation
- Using the observed indicators without transformation will preserve the relative distances between data zones with respect to the scale of measurement of each indicator. Extreme levels of deprivation on single indicators that might be masked by standardisation are thus more likely to be preserved in the final domain score.
- Uniform distribution
- Transforming indicator variables to uniform distributions, or equivalently applying FA to the ranked indicators, assumes an equal importance for the difference between every consecutive pair of data zones.
- Exponential distribution
- Transforming indicators to exponential distributions will accentuate differences between the most deprived data zones, resulting in a combined domain score that identifies data zones with extreme levels of deprivation.
The current method and the alternatives outlined above are applied to the observed indicator data and the simulated indicator datasets for the purposes of evaluation.
6.2. Principal Components Analysis
FA assumes there to exist a single latent, or unobserved, variable, to which the expected value of each (transformed) indicator is linearly related. Principal components analysis ( PCA) identifies the linear combination of the indicator variables with the largest variance, subject to identifiability constraints.
PCA, in combination with the alternative transformation options given above, is applied to the observed indicator data and the simulated indicator datasets for the purposes of evaluation.
6.3. Generalised Factor Analysis
As an alternative to the procedure in which indicators are ranked, transformed to a standard normal distribution and entered into a factor analysis, a more direct approach was explored.
Bartholomew 14 established a general framework for factor analysis which relaxes the requirement for normally-distributed variables to allow any variable with a distribution from the exponential family, including among others the normal, binomial and Poisson distributions. These ideas have been further developed in generalised linear latent variable models ( GLLVMs). The primary challenge with such models is in the numerical approximation methods required for estimation, since the underlying latent variable is unobservable and must be integrated out of the likelihood function. Numerical approximations which have been applied include Gauss-Hermite quadrature 17, adaptive Gaussian quadrature 18 and, more recently, the Laplace approximation of the likelihood function 19.
The adaptive Gaussian quadrature method has been implemented in the statistical software package STATA, but this software was not available to us. We addressed the numerical approximation problem by fitting GLLVMs by Markov Chain Monte Carlo ( MCMC) methods using the software WinBUGS 11.
We implemented this method for the observed indicator data only; the process is very time consuming and could not be applied to the simulated data. The method was applicable to the health, education and access domains. In the health domain, the CMF and CIF indicator variables do not have natural distributions, and for the purposes of this project were ranked and transformed to standard Normal distributions as in the original method; the other 5 health indicators were treated as Binomial variables within the GLLVM. For the education domain, the mean SQA score was treated as Normally distributed, with other indicators included as Binomial. For the access domain, all 5 indicators were treated as Normal on a log scale.
« Previous | Contents | Next »