On this page:

Scottish Household Survey: Methodology 2003/2004

« Previous | Contents | Next »

Listen

SCOTTISH HOUSEHOLD SURVEY: Methodology 2003/2004

2. Sampling

The requirements of the sample for the survey are as follows:

  • it should provide an achieved national sample of 31,000 interviews over two years
  • interviews should be spread evenly across the 24 months of interviewing
  • the sample should be fully national in character (i.e. covering the whole of mainland Scotland and the Islands) and that each quarter should produce nationally representative results
  • results as reliable as those of a simple random sample of 500 should be available for the larger local authorities on an annual basis and for all local authorities (regardless of size) after 2 years
  • the sample should be capable of producing data which are representative both of Scottish households and the adult (aged 16+) population resident in private households.

These objectives are met by:

  • selecting the survey sample from the Postcode Address File
  • distributing interview targets by local authority area to achieve the stated accuracy requirements
  • minimising design effects by using random sampling in the more densely populated areas and clustered sampling in other areas
  • stratifying the clustered sample within local authorities to ensure coverage and representativeness
  • using computer-assisted interviewing to control the selection of individuals for interview within households.

2.1 Sampling from the Postcode Address File

Since the mid-1980s, the Small User File of the Postcode Address File (PAF) has emerged as the most widely used sampling frame for general population surveys of this kind. This development has been the result of increasing concern about the accuracy of the main alternative to the PAF, the Electoral Register, particularly in the wake of the Community Charge. The principal advantages of the PAF, relative to the Electoral Register, are completeness (it is estimated to miss the addresses of only 2% of the adult population and is updated every three months) and lack of bias (those addresses which are missing from the PAF are not as likely to be concentrated among particular types of people). The PAF was, therefore, selected as the sampling frame for the SHS. There are, however, a number of issues arising from its use.

Deadwood

The Small User File of the PAF, which forms the basis of the sample of addresses, is known to contain a number of addresses that are not residential (usually small shops and offices) or which have been demolished or are unoccupied. The extent of 'deadwood' in the PAF varies by area, but is usually estimated at between 10% and 13% in national samples of this kind. This is accounted for by drawing slightly more addresses than the target of a 70% response rate would suggest. Thus, for every 100 interviews to be achieved, 160 addresses are issued to interviewers rather than the 140 suggested by a response rate of 70%.

In practice, the number of additional addresses selected to allow for deadwood varies by local authority based on the contractors' experience of SHS fieldwork carried out in 1999/2000, the first two years of the SHS. These figures are published in the Fieldwork outcomes documents.

2.2 Accuracy and completeness

In local authorities where clustered sampling is used, Enumeration Districts (EDs) are used as the Primary Sampling Units (PSUs), as is described in a later section. In some cases, particularly in areas subject to sizeable population change, entire EDs have sometimes been demolished since the PAF was last updated. To accommodate this, the MORI Sampling Unit arranges for a substitute PSU to be drawn from the remaining pool of EDs within the same local authority area and with the same MOSAIC type (see Appendix 1) for each ED found to be unusable.

In areas where random sampling is used, the full sample for the survey is drawn for each two year fieldwork period in advance and so may exclude households in newly-built housing entering the PAF during the period of the survey. However, data suggests that new housing accounts for only around 1% of the housing stock in any year. 1 Moreover, the impact of this is further reduced by the fact that new properties are often entered onto the PAF some time before they are actually completed. This should not be a problem in areas of clustered sampling, because, although the PSUs are selected for two years at a time, the actual address lists are not drawn until nearer the time of the fieldwork.

One further point relating to the accuracy of the PAF is that some postcodes straddle the border with England and it is possible for 'Scottish' addresses actually to be in England (and, correspondingly, for 'English' addresses to belong in Scotland). To avoid this problem, Ordnance Survey maps of the Scottish/English border are manually inspected. Addresses that are actually in England are excluded, while those in 'English' EDs that are in Scotland are appended to the adjoining 'Scottish' ED.

Exclusions

Special EDs - It is customary in general population sampling of this kind to exclude 'special' EDs, which include prisons, hospitals and military bases. While prisons and hospitals do not generally have significant numbers of private households, the same may not be true of military bases. On the basis of Scottish MOSAIC classifications, however, such EDs account for just 0.5% of the population. They are, therefore, excluded from the sampling frame, since interviewing on military bases would pose fieldwork problems relating to access and security.

Specific accommodation types - The following types of accommodation are excluded from the survey if they are not listed on the Small User file of the PAF (since it is a survey of private households):

  • nurses' homes
  • student halls of residence
  • other communal establishments (e.g. hostels for the homeless and old people's homes)
  • mobile homes
  • sites for travelling people.

Households in such accommodation are included in the survey if they are listed on the Small User file of the PAF and the accommodation represents the sole or main residence of the individuals concerned.

People living in bed and breakfast accommodation are similarly included if the accommodation is listed on PAF and represents the sole or main residence of those living there.

Students' term-time addresses are taken as their main residence (in order that they are counted by where they spend most of the year). Since halls of residence were excluded, however, there will be some under-representation of students.

2.3 Multiple dwellings

There are potential problems associated with the fact that a single entry on the PAF may actually represent multiple dwellings or that a dwelling may contain multiple households. For example, an address listed as 14 Milton Street may consist of a tenement block containing 8 separate flats. Often, the existence of these additional addresses is indicated in the PAF in a field known as the Multiple Occupancy Indicator (MOI). To ensure that such households had an equal chance of inclusion, it is necessary to weight the address when drawing the sample. Thus 14 Milton Street would appear 8 times. In the address listings issued to interviewers, such addresses appear as '14 Milton Street - 3 of 8' etc., with interviewers given clear counting procedures for identifying the relevant selected dwelling.

Where the MOI is correct, this procedure is unproblematic. Sometimes, however, the MOI is incorrect or missing and the true number of dwellings at an address is only discovered once the survey is in the field.

Where an interviewer finds that the MOI is different from the actual number of dwellings observed in the field, he or she uses a Kish grid to select one dwelling at random for interview. This procedure is subsequently checked in the office to ensure the interviewer has carried out a proper random selection. Where it is evident that the interviewer has not followed the selection procedure correctly, the address is re-issued to him/her to go over the process again.

Cases in which the MOI is found to be incorrect should, in principle, be given an additional weight to take account of the implications of this for probabilities of selection. In fact, this is not done, for reasons outlined in the discussion on weighting in Fieldwork outcomes.

2.4 Overall sample structure

Scotland has 32 local authorities and the sample structure of the survey is intended to yield results as reliable as those of a simple random sample of 500 for the larger local authorities (defined as those with at least 750 achieved interviews per year) on an annual basis and for all local authorities (regardless of size) after 2 years.

The overall aim of the sample design is to pursue a systematic random sample where fieldwork conditions allow it - in areas of high population density - and to cluster interviews in the remaining areas, in order to achieve the best combination of sample efficiency and cost effectiveness. The distinction is made on the basis of population density per square kilometre in each local authority. In areas with a population density of 500 or more persons per square kilometre, a systematic random approach is adopted. In those with a lower population density, interviews are clustered.

Nine authorities fall into the former (systematic random) category:

  • Aberdeen City
  • Glasgow City
  • Dundee City
  • Inverclyde
  • East Dunbartonshire
  • Renfrewshire
  • East Renfrewshire
  • West Dunbartonshire
  • Edinburgh, City of

In these areas, the sample is stratified by Scottish MOSAIC and a systematic random sample of addresses is drawn within each of the resulting strata (the stratification by Scottish MOSAIC is described in sub-section 2.8). Addresses within these areas are selected in full at the beginning of each two-year interviewing cycle. They are then grouped into batches, on the basis of their postcodes, for allocation to interviewers.

The remainder of this sub-section concentrates on procedures for multi-stage sampling within the remaining 23 local authorities (which are listed in Table 2-1).

2.5 Primary sampling unit and cluster size

Enumeration Districts (EDs) are used as primary sampling units (PSUs) for those local authorities which fall into the category of lower population density. EDs were chosen over the main alternative, postcode sectors, for the following reasons. Firstly, the use of postcode sectors would significantly increase the cost of fieldwork in these areas since they are much larger (covering an average of 2,300 households, compared with an average of 150 per ED). Secondly, in smaller local authorities such as the Orkney Islands and Clackmannanshire there would be too few postcode sectors to sample effectively without selecting a large number of addresses within each chosen PSU. Thirdly, EDs have certain advantages in terms of data linkage since they are directly compatible with Census Output Areas and can be easily linked with geo-demographic systems.

The main disadvantage of using EDs is that they are relatively small, averaging 150 households. This means that there is a potential for larger design factors, reducing the overall efficiency of the sample. The calculation of design factors involves an examination of the survey measure across the PSUs. The greater the variation between PSUs, the higher the design factor (since which PSUs are chosen is then likely to have a greater effect on the results). If a small PSU is used, the variation between them is likely to be increased since the variation within PSUs is likely to be less (households in a small PSU will usually be more similar than those in a large PSU). However, the effects of the survey design on sampling errors can be considerably moderated by:

  • sampling a large number of PSUs
  • interviewing as few respondents as practical in each PSU
  • stratifying the PSU selection by status measures because within a stratified survey the variation between PSUs is examined separately for each stratum - affluent areas are compared with similar areas and poorer areas are compared with others - and design effects are reduced. 2

The approach is, therefore, to aim for an average of 11 achieved interviews per PSU in order to have a minimum of about 50 PSUs within each local authority. The use of stratification by Scottish MOSAIC also has the effect of reducing the extent of variability within each stratum and thus limiting the size of the design effect. Although it was impossible to predict design factors accurately without knowing the exact topic coverage and the variability of response, it was envisaged that, for most variables, the design factors would be in the range 1.1-1.2 for the survey as a whole.

2.6 Procedures for dealing with very small enumeration districts

There is a further issue relating to enumeration districts (EDs) that are too small to sample from. It would, for example, have been undesirable and impractical to try to obtain 11 or 12 interviews from an ED containing only 30 households because of the impact on variance between households within the PSU, the possibility of potential respondents discussing the survey and the practical difficulty of obtaining sufficient numbers of interviews. Two questions, therefore, arise: what should be the minimum size for an ED and how should smaller EDs be dealt with?

In relation to the first of these questions, it was decided that 61 households (from the 2001 Census count) should be considered the minimum for inclusion as a separate PSU. This implied interviewing at most about 20% of households in the smallest PSUs, which was felt to be acceptable, given that these EDs lay in areas with lower density of population.

Typically, 11% of EDs within the areas covered by clustering contained 60 or fewer households. However, this does not mean that 11% of PSUs for the survey also do so, since EDs are sampled with probability proportionate to the number of addresses (weighted by the MOI). These EDs contain approximately 3% of the total number of households in the local authorities where clustered sampling is used.

To resolve the problem of these small EDs, each ED with 60 or fewer households is paired with a neighbouring (or adjoining) ED to create a number of pseudo-EDs, which are, in fact, comprised of two or more real EDs. This has no bearing on probabilities of selection, since the 'pairing' takes place before the PSUs are selected and thus the new pseudo-ED has a probability of selection proportionate to its aggregated number of addresses (weighted by the MOI). EDs are merged until they cross the 61 household threshold.

2.7 Stratification by local authority area

Table 2-1 shows the expected distribution of sample by local authority at the end of each two-year sampling period. The underlying principle here is that the allocation of interviews by local authority area should be broadly proportionate to the number of households, except where the resulting sub-sample in any particular area would fall below a pre-determined accuracy threshold. The allocation was carried out in the following way.

    1. A minimum accuracy threshold of 4.4% at the 95% confidence limit was set. This is the accuracy associated with an estimate of 50% from a simple random sample of 500 from an infinite population.

    2. Taking account of the Finite Population Correction Factor and assuming a design factor of 1.1 in those areas with a clustered design, the minimum number of interviews required to meet the above benchmark is established for each local authority area. This gives a figure of around 490 for the high population density areas and 560-590 for the areas with a clustered design.

    3. For each area, this figure is compared with the number of interviews associated with a strictly proportionate allocation of 31,000 interviews across local authorities. Where the proportionate allocation of 31,000 interviews would result in a local authority having less than the minimum identified at paragraph 2, the number of interviews is set to that minimum, or equal to 550 if the minimum is less than 550.

    4. The remaining interviews (i.e. those left after the process of allocation in paragraph 3) are simply allocated to the remaining local authorities with probability proportionate to household population.

    5. The number of addresses required is then calculated using information on likely deadwood and response rate assumptions for each area. This calculation is rounded up to the next multiple of 18 (the number of addresses in an interviewer work allocation) and the interview target recalculated using the actual number of addresses to be issued and the assumptions about deadwood and response rates. Finally, the 95% confidence interval for the revised interview target is then calculated.

    As can be seen from the final column in the table, the projected accuracy of the sub-samples in the different areas (over two years) ranges from +/-1.6% in the largest authority (Glasgow City) to +/-4.4% in the smaller authorities which are over-sampled to bring them up to the accuracy threshold. In terms of the projected number of interviews, the range was from 3,634 to 552. This degree of variation is felt to be appropriate, given the need for finer-grained analysis within the larger local authorities.

    Table 2-1: Projected two-year achieved sample size by local authority

    2001 Census household population

    Wholly proportionate allocation

    Rounded two-year total with projected achieved minimum sample size

    Width of 95% confidence interval (%)

    Authorities with systematic random sampling

    Aberdeen City

    97,013

    1,400

    1,307

    2.7

    Dundee City

    66,908

    968

    863

    3.3

    East Dunbartonshire

    42,206

    599

    561

    4.1

    East Renfrewshire

    34,950

    481

    555

    4.1

    Edinburgh, City of

    204,683

    2,890

    2,700

    1.9

    Glasgow City

    271,596

    3,911

    3,634

    1.6

    Inverclyde

    36,691

    540

    559

    4.4

    Renfrewshire

    75,355

    1,091

    1,008

    3.1

    West Dunbartonshire

    40,781

    583

    549

    4.2

    Authorities with clustered sampling

    Aberdeenshire

    90,736

    1,281

    1,198

    2.8

    Angus

    46,945

    666

    620

    3.9

    Argyll and Bute

    38,969

    545

    596

    4.0

    Clackmannanshire

    20,558

    292

    590

    4.0

    Dumfries and Galloway

    63,807

    902

    862

    3.4

    East Ayrshire

    50,346

    722

    665

    3.8

    East Lothian

    38,157

    531

    597

    4.0

    Eilean Siar

    11,275

    169

    586

    3.9

    Falkirk

    62,598

    860

    816

    3.4

    Fife

    150,274

    2,109

    1,967

    2.2

    Highland

    89,533

    1,257

    1,177

    2.8

    Midlothian

    32,922

    448

    596

    4.0

    Moray

    35,803

    505

    588

    4.0

    North Ayrshire

    58,726

    841

    789

    3.5

    North Lanarkshire

    132,619

    1,867

    1,731

    2.3

    Orkney Islands

    8,342

    118

    570

    4.0

    Perth and Kinross

    58,323

    802

    748

    3.6

    Scottish Borders

    47,371

    652

    608

    4.0

    Shetland Islands

    9,111

    129

    573

    4.0

    South Ayrshire

    48,749

    689

    646

    3.8

    South Lanarkshire

    126,496

    1,777

    1,632

    2.4

    Stirling

    35,508

    483

    698

    4.0

    West Lothian

    64,896

    891

    859

    3.3

    All Scotland

    2,192,247

    31,000

    31,448

    2.8 Stratification within local authorities

    As indicated at Section 2.4, within local authorities, the sample is stratified by the geo-demographic indicator, Scottish MOSAIC. The purpose of this is to ensure that the sample correctly reflects the population structure in terms of area or neighbourhood type. Given the likely relationship between such variables and the topic coverage of the survey, stratification should lead to an increase in survey precision. It cannot, in any case, result in a sample which is less effective than an unstratified one, since stratification does not imply any departure from randomness or from the principle of equal probabilities of selection within a local authority.

    Although the full Scottish MOSAIC classification runs to 47 types, for the purposes of stratification, it is sufficient to use the main summary groups. A full description of these is included in Appendix 1.

    An additional advantage of using Scottish MOSAIC for the purpose of stratification is that it can be applied not only at ED level but at unit postcode level.

    2.9 Allocating sample across the calendar year

    As the fieldwork for the survey runs throughout the calendar year, it is important to ensure an even distribution of PSUs (and, in the high population density local authorities, interviewer assignments) by geographic area and Scottish MOSAIC type over time. There are two main reasons for this: an uneven distribution would jeopardise the requirement for the sample to be representative of the national population on a quarterly basis and some of the variables measured by the survey are likely to exhibit seasonal patterns - e.g. rates of economic activity, modes of transport.

    The procedure for allocating PSUs to months of the year is derived from that developed by the Office for National Statistics (ONS) in managing the Family Expenditure Survey (FES) 3 and differs only in the need for the SHS sample to be spread evenly across 24 rather than 12 months.

    This approach operates in the following way. Firstly, a full listing is prepared of the PSUs drawn as part of the two-year sample. These are listed by local authority and then by MOSAIC type within local authority. Secondly, this is split into random yearly allocations. Thirdly, within each year, the listing of PSUs is then labelled with a random permutation of the numbers 1 to 12 representing the twelve months covered by the fieldwork. This permutation is generated with certain properties to avoid 'bunching' of interviews within particular quarters:

    • the first four months are from different quarters
    • every subsequent month is from the same quarter as the one four places before.

    The example given by ONS (and used to allocate the 1996 FES) is as follows:

    Table 2-2: Procedure for allocating PSUs by month of fieldwork

    Position in list

    Month

    Quarter

    1, 13, 25, etc.

    10

    4

    2, 14, 26, etc.

    8

    3

    3, 15, 27, etc.

    5

    2

    4, 16, 28, etc.

    1

    1

    5, 17, 29, etc.

    11

    4

    6, 18, 30, etc.

    7

    3

    7, 19, 31, etc.

    4

    2

    8, 20, 32, etc.

    2

    1

    9, 21, 33, etc.

    12

    4

    10, 22, 34, etc.

    9

    3

    11, 23, 35, etc.

    6

    2

    12, 24, 36, etc.

    3

    1

    As this sequence can be added automatically to the sampling procedures for the survey, no time is spent manually assigning PSUs to particular months. The same approach is applied to the sample for the SHS.

    2.10 Respondent selection

    As the survey is intended to collect information both about the structure and characteristics of Scottish households and about the people who occupy those households, the interview has a two-part structure. The respondent for the first part of the interview is the highest income householder or their spouse or partner 4, with this information established at the start of the interview. For the second part of the interview, one adult (aged 16+) member of the household is selected at random by the CAPI script, and interviewed at a later date if necessary. 5

    « Previous | Contents | Next »

    Page updated: Tuesday, May 16, 2006