On this page:

Scotland's People: Results from the 2001/2002 Scottish Household Survey (Volume 8: Technical Report)

« Previous | Contents | Next »

Listen

Scotland's People: results from the 2001/2002 Scottish Household Survey
Volume 8: Technical Report

2. Sampling

The original requirements of the sample for the survey were as follows:

  • that it should allow an achieved national sample of 31,000 interviews over two years
  • that those interviews should be spread evenly across the 24 months of interviewing
  • that the sample should be fully national in character (i.e. covering the whole of mainland Scotland and the Islands) and that each quarter should produce nationally representative results
  • that results as reliable as those of a simple random sample of 500 should be available for the larger local authorities on an annual basis and for all local authorities (regardless of size) after 2 years
  • that the sample should be capable of producing data which are representative both of Scottish households and the adult (aged 16+) population resident in private households.

The following sub-sections address issues relating to the sampling frame; the balance between systematic random and clustered sampling; the distribution of interviews by local authority area; the stratification of interviews within local authority areas; and the selection of individuals for interview within households.

2.1 Sampling frame

Since the mid-1980s, the Small User File of the Postcode Address File (PAF) has emerged as the most widely used sampling frame for general population surveys of this kind. This development has been the result of increasing concern about the accuracy of the main alternative to the PAF, the Electoral Register, particularly in the wake of the Community Charge. The principal advantages of the PAF, relative to the Electoral Register, are completeness (it is estimated to miss the addresses of only 2% of the adult population and is updated every three months) and lack of bias (those addresses which are missing from the PAF are not as likely to be concentrated among particular types of people). The PAF was, therefore, selected as the sampling frame for the SHS. There are, however, a number of issues arising from its use.

Deadwood

The Small User File of the PAF, which forms the basis of the sample of addresses, is known to contain a number of addresses that are not residential (usually small shops and offices), which have been demolished or are unoccupied. The extent of 'deadwood' in the PAF varies by area, but is usually estimated at between 10% and 13% in national samples of this kind. This is accounted for by drawing slightly more addresses than the target of a 70% response rate would suggest. Thus, for every 100 interviews expected to be achieved, 160 addresses are issued to interviewers (rather than the 140 suggested by a projected response rate of 70%).

2.2 Accuracy and completeness

In local authority areas where clustered sampling is used, Enumeration Districts (EDs) are used as the Primary Sampling Units (PSUs), as is described in a later section. In some cases, particularly in areas subject to sizeable population change, entire EDs have sometimes been demolished since the PAF was last updated. To accommodate this, the MORI Sampling Unit arranges for a substitute PSU to be drawn from the remaining pool of EDs within the same local authority area and with the same MOSAIC type (see Appendix 1) for each ED found to be unusable.

In areas where systematic random sampling is used, the full sample for the survey is drawn for each two year fieldwork period in advance and so may exclude households in newly-built housing entering the PAF during the period of the survey. However, estimates from the Scottish Abstract of Statistics1 suggest, however, that new housing accounts for only roughly 1% of the housing stock in any given year. Moreover, the impact of this is further reduced by the fact that new properties are often entered onto the PAF some time before they are actually completed. (There should not be a problem in areas of clustered sampling, because, although the PSUs are selected for two years at a time, the actual address lists are not drawn until nearer the time of the fieldwork.)

One further point relating to the accuracy of the PAF may be worth noting: experience in the 1991 and 1996 Scottish House Condition Surveys showed that, with some postcodes straddling the border, it is possible for 'Scottish' addresses actually to be in England (and, correspondingly, for 'English' addresses to belong in Scotland). To avoid this problem, Ordnance Survey maps of the Scottish/English border are manually inspected. Addresses which are actually in England are excluded, while those in 'English' EDs which are actually in Scotland are appended to the adjoining 'Scottish' ED.

Exclusions

Special EDs - It is customary in general population sampling of this kind to exclude 'special' EDs, which include prisons, hospitals and military bases. While prisons and hospitals do not generally have significant numbers of private households, the same may not be true of military bases. On the basis of Scottish MOSAIC classifications, however, such EDs account for just 0.5% of the population. They are, therefore, excluded from the sampling frame, since interviewing on military bases would pose fieldwork problems relating to access and security.

Specific accommodation types - The following types of accommodation are excluded from the survey if they are not listed on the Small User file of the PAF (since it is a survey of private households):

  • nurses homes
  • student halls of residence
  • other communal establishments (e.g. hostels for the homeless and old people's homes)
  • mobile homes
  • sites for travelling people.

Households in such accommodation are included in the survey if they are listed on the Small User file of the PAF and the accommodation represents the sole or main residence of the individuals concerned.

People living in bed and breakfast accommodation are similarly included if the accommodation is listed on PAF and represents the sole or main residence of those living there.

Students' term-time addresses are taken as their main residence (in order that they are counted by where they spend most of the year). Since halls of residence were excluded, however, there will have been some under-representation of students.

2.3 Multiple dwellings

There are potential problems associated with the fact that a single entry on the PAF may actually represent multiple dwellings or that a dwelling may contain multiple households. For example, an address listed as 14 Milton Street may consist of a tenement block containing 8 separate flats. Often, the existence of these additional addresses is indicated in the PAF in a field known as the Multiple Occupancy Indicator (MOI). To ensure that such households had an equal chance of inclusion, it was therefore necessary to weight the relevant addresses when drawing the sample. Thus 14 Milton Street would have appeared 8 times. In the address listings issued to interviewers, such addresses appear as '14 Milton Street - 3 of 8' etc., with interviewers given clear counting procedures for identifying the relevant selected dwelling.

Where the MOI is correct, this procedure is unproblematic. Sometimes, however, the MOI is incorrect or missing and the true number of dwellings at an address is only discovered once the survey is in the field.

In the SHS, of the 50,689 addresses issued in 2001/2002, the MOI was found to be incorrect in 2.6% of cases. In 2.0% of cases, the actual number of dwellings was less than shown on the MOI and in 0.6% of cases there were more dwellings.

Where an interviewer finds that the MOI is different from the actual number of dwellings observed in the field, he or she uses a Kish grid to select one dwelling at random for interview. This procedure is subsequently checked in the office to ensure the interviewer has carried out a proper random selection. Where it is evident that the interviewer has not followed the selection procedure correctly, the address is re-issued to him/her to go over the process again.

Cases in which the MOI is found to be incorrect should, in principle, be given an additional weight to take account of the implications of this for probabilities of selection. In fact, this is not done, for reasons outlined in Section 4 in the discussion on weighting.

2.4 Overall sample structure

Scotland has 32 local authorities and the sample structure of the survey is intended to yield results as reliable as those of a simple random sample of 500 for the larger local authorities (defined as those with at least 750 achieved interviews) on an annual basis and for all local authorities (regardless of size) after 2 years.

The overall aim of the sample design is to pursue a systematic random sample where fieldwork conditions allow it - namely, in areas of high population density - and to cluster interviews in the remaining areas, in order to achieve the best combination of sample efficiency and cost effectiveness. The distinction is made on the basis of population density per square kilometre by local authority area. In those areas with a population density of 500 or more persons per square kilometre, a systematic random approach is adopted. In those local authority areas with a lower population density, interviews are clustered.

Nine authorities fall into the former (systematic random) category:

  • Aberdeen City
  • Glasgow City
  • Dundee City
  • Inverclyde
  • East Dunbartonshire
  • Renfrewshire
  • East Renfrewshire
  • West Dunbartonshire
  • Edinburgh, City of

In these areas, the sample is stratified by the geo-demographic indicator, Scottish MOSAIC, and a systematic random sample of addresses is drawn within each of the resulting strata (the stratification by Scottish MOSAIC is described in sub-section 2.8). Addresses within these areas are selected in full at the beginning of each two-year interviewing cycle. They are then grouped into batches, on the basis of their postcodes, for allocation to interviewers.

The remainder of this sub-section concentrates on procedures for multi-stage sampling within the remaining 22 local authorities (which are listed in Table 2-1).

2.5 Primary sampling unit and cluster size

Enumeration Districts (EDs) are used as primary sampling units (PSU) for those local authorities which fall into the category of lower population density. EDs were chosen over the main alternative, postcode sectors, for the following reasons. Firstly, the use of postcode sectors would significantly increase the cost of fieldwork in these areas since they are much larger in size (covering an average of 2,300 households, compared with an average of 150 per ED). Secondly, in some of the smaller local authorities - e.g. the Orkney Islands and Clackmannanshire - there would have been too few postcode sectors to allow us to sample effectively without selecting a large number of addresses within each chosen PSU. Thirdly, EDs have certain advantages in terms of data linkage since they are directly compatible with Census Output Areas and can be easily linked with geo-demographic systems.

The main disadvantage of using EDs is that they are relatively small, averaging 150 households. This means that there is a potential for larger design factors, reducing the overall efficiency of the sample. The calculation of design factors involves an examination of the survey measure across the PSUs. The greater the variation between PSUs, the higher the design factor (since which PSUs are chosen is then likely to have a greater effect on the results). If a small PSU is used, the variation between PSUs is likely to be increased since the variation within PSUs is likely to be less (households in a small PSU will usually be more similar than those in a large PSU). However, the effects of survey design on the size of the likely sampling errors can be considerably moderated by:

  • Sampling a large number of PSUs.
  • Interviewing as few respondents as practical in each PSU.
  • Stratifying the PSU selection by status measures - because, within a stratified survey, the variation between PSUs is examined separately for each survey stratum. Hence, affluent areas are compared with other similar areas, and poorer areas are compared with others - and design effects are commensurably reduced.

The approach is, therefore, to aim for an average of 11 achieved interviews per PSU in order to have a minimum of about 50 PSUs within each of the local authorities. This is a smaller cluster size than that employed in the 1993, 1996 and 2000 Scottish Crime Surveys, which involved (on average) 15 completed interviews per ED. The use of stratification by Scottish MOSAIC has also had the effect of reducing the extent of variability within each stratum and thus limiting the size of the design effect. Although it was impossible to predict design factors accurately without knowing the exact topic coverage and the variability of response, it was envisaged that, for most variables, the design factors would be in the range 1.1-1.2 for the survey as a whole. In 2001/2002 the average design factor calculated for survey variables was 1.15. The design factors for a range of survey variables, for the years 2001/2002, are shown in Section 5.

2.6 Procedures for dealing with very small EDs

There is a further issue here relating to those EDs which are, in effect, too small to sample from. It would, for example, have been undesirable and impractical to seek to obtain 11 or 12 interviews from an ED containing only 30 households because of the impact on variance between households within the PSU, the possibility of potential respondents discussing the survey and the practical difficulty of obtaining sufficient numbers of interviews. Two questions, therefore, arise: firstly, what should be considered the minimum size for an ED and, secondly, how should smaller EDs be dealt with?

In relation to the first of these questions, it was decided that an ED size of 61 households (from the 1991 Census count) should be considered the minimum for inclusion as a separate PSU. This implied interviewing at most about 20% of households in the smallest PSUs, which was felt to be acceptable, given that these EDs lay in areas with lower density of population.

In 2001/2002, 11% of EDs within the areas covered by clustering contained 60 or fewer households. However, this does not mean that 11% of PSUs for the survey also do so, since EDs are sampled with probability proportionate to the number of addresses (weighted by the MOI). These EDs contain approximately 3% of the total number of households in the local authorities where clustered sampling is used.

To resolve the problem of these small EDs, each ED with 60 or fewer households is paired with a neighbouring (or adjoining) ED to create a number of pseudo-EDs, which are, in fact, comprised of two or more real EDs. This has no bearing on probabilities of selection, since the 'pairing' takes place before the PSUs are selected and thus the new pseudo-ED has a probability of selection proportionate to its aggregated number of addresses (weighted by the MOI). EDs are merged until they cross the 61 household threshold. Table 2-1 indicates the number of EDs in each local authority where the household count falls below the threshold.

Table 2-1 Small EDs encountered in sampling by Local Authority area: SHS 2001/2002

Local authority
('Low population density' local authorities only)

Number of small EDs
(prior to merging process)

Aberdeenshire

80

Angus

38

Argyll and Bute

71

Clackmannanshire

6

Dumfries and Galloway

122

East Ayrshire

46

East Lothian

14

Eilean Siar

19

Falkirk

22

Fife

65

Highland

113

Midlothian

23

Moray

24

North Ayrshire

34

North Lanarkshire

51

Orkney

11

Perth and Kinross

71

Scottish Borders

89

Shetland

18

South Ayrshire

44

South Lanarkshire

52

Stirling

35

West Lothian

39

2.7 Stratification by local authority area

Table 2-2 shows the distribution of the original projections of achieved sample by local authority area at the end of the two-year sampling period. The underlying principle here is that the allocation of interviews by local authority area should be broadly proportionate to the number of households, except where the resulting sub-sample in any particular area would fall below a pre-determined accuracy threshold. The allocation was carried out in the following way.

  • The first stage was to set a minimum accuracy threshold of 4.4% at the 95% confidence limit - i.e. the level of accuracy associated with an estimate of 50% from a simple random sample of 500 from an infinite population.
  • Taking account of the Finite Population Correction Factor and assuming a design factor of 1.1 in those areas with a clustered design, the minimum number of interviews required to meet the above benchmark is established for each local authority area. This gives a figure of around 490 for the high population density areas and 560-590 for the areas with a clustered design.
  • For each area, this figure is compared with the number of interviews associated with a strictly proportionate allocation of 31,000 interviews across local authorities by household numbers. Where the proportionate allocation of 31,000 interviews would result in a local authority having less than the minimum identified at paragraph 2, the number of interviews is set to that minimum, or equal to 550 if the minimum is less than 550.
  • The remaining interviews (i.e. those left after the process of allocation in paragraph 3) are simply allocated to the remaining local authorities with probability proportionate to household population and then rounded to the nearest multiple of 11 (or 12 in areas of higher population density) - the expected average number of interviews to be achieved per PSU (or interviewer assignment in the high population density local authorities).

As can be seen from the final column in the table, the projected accuracy of the sub-samples in the different areas (over two years) ranges from +/-1.6% in the largest authority (Glasgow City) to +/-4.4% in the smaller authorities which are over-sampled to bring them up to the accuracy threshold. In terms of the projected number of interviews, the range was from 3,612 to 552. This degree of variation is felt to be appropriate, given the need for finer-grained analysis within the larger local authorities.

Table 2-2 Projected two-year achieved sample size by local authority area: SHS 2001/2002

Local authority

Total number of households
(1991 Census estimates)

Wholly proportionate distribution

Rounded two-year total with projected achieved minimum sample size

Width of 95% confidence interval ( )

Authorities with systematic random sampling

Aberdeen City

98,029

1,400

1,296

2.7%

Dundee City

67,791

968

900

3.2%

East Dunbartonshire

41,928

599

552

4.1%

East Renfrewshire

33,696

481

552

4.1%

Edinburgh, City of

202,304

2,890

2,640

1.9%

Glasgow City

273,793

3,911

3,612

1.6%

Inverclyde

37,814

540

552

4.1%

Renfrewshire

76,403

1,091

1,008

3.1%

West Dunbartonshire

40,847

583

552

4.2%

Authorities with clustered sampling

Aberdeenshire

89,671

1,281

1,188

3.1%

Angus

46,617

666

616

4.3%

Argyll and Bute

38,158

545

594

4.4%

Clackmannanshire

20,436

292

584

4.4%

Dumfries and Galloway

63,145

902

836

3.7%

East Ayrshire

50,529

722

672

4.1%

East Lothian

37,158

531

594

4.4%

Eilean Siar

11,815

169

572

4.4%

Falkirk

60,202

860

792

3.8%

Fife

147,616

2,109

1,948

2.4%

Highland

88,013

1,257

1,166

3.2%

Midlothian

31,332

448

594

4.4%

Moray

35,381

505

594

4.4%

North Ayrshire

58,884

841

782

3.9%

North Lanarkshire

130,726

1,867

1,728

2.6%

Orkney Islands

8,236

118

562

4.4%

Perth and Kinross

56,117

802

738

3.9%

Scottish Borders

45,644

652

606

4.4%

Shetland Islands

9,065

129

562

4.4%

South Ayrshire

48,268

689

638

4.2%

South Lanarkshire

124,393

1,777

1,640

2.6%

Stirling

33,820

483

594

4.4%

West Lothian

62,411

891

826

3.8%

All Scotland

2,170,242

31,000

31,090

0.8%

2.8 Stratification within local authority areas

As indicated earlier, within local authority areas, the sample is stratified by the geo-demographic indicator, Scottish MOSAIC. The purpose of this is to ensure that the sample correctly reflects the population structure in terms of area or neighbourhood type. Given the likely relationship between such variables and the topic coverage of the survey, stratification should lead to an increase in survey precision. It cannot, in any case, result in a sample which is less effective than an unstratified one, since stratification does not imply any departure from randomness or from the principle of equal probabilities of selection within a local authority.

Although the full Scottish MOSAIC classification runs to 47 types, for the purposes of stratification, it is sufficient to use the 10 main summary groups. A full description of these is included in Appendix 1.

An additional advantage of using Scottish MOSAIC for the purpose of stratification is that it can be applied not only at ED level but at unit postcode level.

2.9 Procedures for allocating PSUs (and interviewer assignments) evenly throughout the calendar year

As the fieldwork for the survey runs throughout the calendar year, it is important to ensure an even distribution of PSUs (and, in the high population density local authorities, interviewer assignments) by geographic area and Scottish MOSAIC type over time. There are two main reasons for this. Firstly, an uneven distribution would jeopardise the requirement for the sample to be representative of the national population on a quarterly basis. Secondly, some of the variables measured by the survey are likely to exhibit seasonal patterns - e.g. rates of economic activity, modes of transport.

The procedure for allocating PSUs to months of the year is derived from that developed by the Office for National Statistics (ONS) in managing the Family Expenditure Survey 2 and differs only in the need for the SHS sample to be spread evenly across 24 rather than 12 months.

This approach operates in the following way. Firstly, a full listing is prepared of those PSUs drawn as part of the two-year sample. These are listed by local authority and then by MOSAIC type within local authority. Secondly, this is split into random yearly allocations on the basis of odd and even numbers. Thirdly, within each year, the listing of PSUs is then labelled with a random permutation of the numbers 1 to 12 representing the twelve months covered by the fieldwork. This permutation is generated with certain properties to avoid 'bunching' of interviews within particular quarters:

  • the first four months are from different quarters
  • every subsequent month is from the same quarter as the one four places before.

The example given by ONS (and used to allocate the 1996 FES) is as follows:

Table 2-3 Procedure for allocating PSUs by month of fieldwork

Position in list

Month

Quarter

1, 13, 25, etc.

10

4

2, 14, 26, etc.

8

3

3, 15, 27, etc.

5

2

4, 16, 28, etc.

1

1

5, 17, 29, etc.

11

4

6, 18, 30, etc.

7

3

7, 19, 31, etc.

4

2

8, 20, 32, etc.

2

1

9, 21, 33, etc.

12

4

10, 22, 34, etc.

9

3

11, 23, 35, etc.

5

2

12, 24, 36, etc.

3

1

As this sequence can be added automatically to the sampling procedures for the survey, no time is spent manually assigning PSUs to particular months. The same approach is applied to the sample for the SHS.

2.10 Respondent selection

As the survey is intended to collect information both about the structure and characteristics of Scottish households and about the people who occupy those households, the interview has a two-part structure. The respondent for the first part of the interview is the highest income householder or their spouse or partner 3, with this information established at the very start of the interview. For the second part of the interview, one adult (aged 16+) member of the household is selected at random, and interviewed at a later date if necessary. Further detail about the two parts of the interview and the topics covered in each can be found in Section 3.

« Previous | Contents | Next »

Page updated: Friday, March 31, 2006