« Previous | Contents | Next »
Listen
SHS Lite - User Guide: A guide to using the Scottish Household Survey simplified dataset
2. What's in the Box
2.1 SHS Lite Dataset
The SHS Lite dataset is a simplified version of the full survey data collected by the Scottish Household Survey. The full survey data is both larger and more complex, containing around 30,000 cases for each two-year sweep of the survey with each case having approximately 2,000 variables.
The Scottish Executive decided to commission a simplified data file, which would allow users to undertake most forms of analysis using a substantially smaller data file.
The main features of the SHS Lite data are:
- The number of variables has been reduced from 1,825 to 573.
- Complex data loops have been removed and the original variables have been summarised in new variables.
- The variables have been organised into 'sets' of related variables. These sets can be used to further simplify accessing variables through SPSS dialog boxes.
Some aspects of the data have not changed. For example:
- The number of cases remains over 30,000. With fewer variables however, running analysis will be faster on most computers.
- The structure of the data continues to have questions that relate to both sections of the questionnaire: to the household and to an adult randomly selected within that household.
- The data still needs to be weighted before the results can be considered representative of the household or adult populations.
- The variable names are still linked to the Computer Aided Personal Interviewing (CAPI) script used to collect the data. The questionnaire will remain an important reference source for identifying and understanding the variables in the data.
! Note | If you have opened the SHS dataset file from the CD, you will not be able to save any changes you have made back to the CD. You can however save a copy of this file to a suitable location on your own computer. This means, you can save your own changes but should you make a mistake, the original file can be copied again from the CD. |
2.2 Documents
Alongside the SHS Lite data file are a number of documents that provide important information about the survey, how the data are collected and what individual variables represent. These are provided on the CD. The main documents are:
2.2.1 Short SHS Questionnaire.pdf
This version of the questionnaire contains all the questions asked in the survey (except for the travel diary sections that are not included in the SHS Lite datasets). This allows the simplified data file to be seen in the context of the full survey and indicates where questions are asked of a subset of the sample. Abbreviated questionnaires are included for 1999/2000 and 2001/2002.
To assist useability, the routing in the questionnaire has been somewhat simplified from the CAPI programme used to conduct the survey.
2.2.2 SHS Lite Variable Listing.pdf
This file contains a list of all the variables in the SHS Lite file. It shows:
- which analysis 'set' each variable has been assigned to
- the name of each variable
- a descriptive label for each variable
- whether the variable is original - if it refers directly to a question in the questionnaire - or if it has been derived from other questions
- who the variable relates to - the household, the random adult, the random schoolchild
- which weight to use when analysing that variable
Analysis sets and weighting are discussed further in Chapter Four.
2.2.3 Other SHS documents
For further reference, a number of additional documents have been included on the CD.
- The 1999/2000 and 2001/2002 SHS Annual Reports. These provide analysis of the SHS results as well as background information and a glossary of terminology used in the survey.
- The 1999/2000 and 2001/2002 SHS Technical Reports. These contain detailed methodological information on the SHS as well as information about response rates and comparisons of SHS data and other data sources.
2.3 Variable database
This database contains more detailed information on the variables. It covers both the SHS Lite data files. This database allows users to search for keywords and displays a list of related variables.
2.4 Limitations of the data
There are a number of important issues that users should be aware of when using the SHS Lite data.
Like all sample surveys, the SHS can only produce estimates and these estimates are limited by a number of factors such as:
- Sampling variability - all samples can differ from the population by chance. This is often referred to as sampling error ( see Section 5.1).
- The number of cases that analysis is based on - estimates based on large samples are more accurate than those based on small samples.
- Bias in the sample - if a sample under-represents sections of the population or if a large proportion of people do not answer some questions, the estimates may differ substantially from the population for reasons that are not a result of chance. For example, in 1999/2000, 54% of adults interviewed were female but the true figure in the population is only 51%. This is an example of bias caused by young males, in particular, being difficult to contact or refusing to take part in the survey.
The SHS is limited in the amount of detail it can collect about some topics and often cannot collect data to the standards of official statistics. This applies to measures such as:
- Economic status - the variables containing the economic status of the highest income householder, the random adult etc. are based on self-reported questions and do not conform to official definitions of employment/unemployment. While these variables can be used to look at how responses vary between people with different economic classifications, the SHS cannot provide estimates of unemployment that are comparable to official estimates.
- Household income - the SHS collects income data from or about the highest income householder and, where there is one, their spouse. This is not the same as estimates from the Family Resources Survey. The SHS does not, therefore, provide comparable estimates of household income.
Although the SHS has a large sample that covers the whole of Scotland, it has some geographical limitations because of the sample sizes in small local authorities and because it is designed to be representative only at national and local authority level. This means:
- Users need to be mindful of the sampling errors for analysis and especially when this is based on breakdowns within a single local authority
- It is not appropriate to undertake geographical analysis below local authority level since the sampling techniques used in some local authorities cannot guarantee representativeness.
« Previous | Contents | Next »