On this page:

Household Transport in 2006: Some Scottish Household Survey Results

« Previous | Contents | Next »

Listen

A Notes and Definitions

A.1 Totals may appear to differ slightly from the apparent sums of their component parts, in cases where they have been calculated by adding up the "unrounded" values of the components and then rounding each figure independently. Similarly, percentages may appear not to sum to 100%.

A.2 In tables which analyse the results of questions for which multiple answers were allowed, the percentages may total more than 100%, because some interviewees gave more than one response.

A.3 The underlying sample numbers shown in different tables may not be the same. In some cases, this is because the tables relate to different populations (such as all households, all adults and all people). In addition, the SHS only collects certain kinds of information for particular sub-groups of the population (which are identified in the relevant tables' headings), and therefore some questions were not asked of all respondents because they only applied in certain circumstances (eg questions about children would not be asked in a household without any children). In some cases, the bases differ because some people were unable to, or did not want to, answer certain questions ( e.g. some households did not wish to provide details of their income).

A.4 Highest Income Householder: the household reference person for the first part of the interview. This must be a person in whose name the accommodation is owned or rented, or who is otherwise responsible for the accommodation. In households with joint householders, the person with the highest income is taken as the household reference person. If householders have exactly the same income, the older is taken as the household reference person.

A.5 Adult: for the purposes of the SHS, an adult is someone who was aged 16 or over at the time of the interview; a child is someone who was aged 15 or under.

A.6 Household types
A single pensioner household consists of just one adult of pensionable age (60+ for women, and 65+ for men) and no children
A single parent household contains an adult of any age and one or more children.
A single adult household consists of an adult of non-pensionable age and no children.
An older smaller household contains either (a) an adult of non-pensionable age and an adult of pensionable age and no children or (b) two adults of pensionable age and no children.
A large adult household has three or more adults and no children.
A small adult household contains two adults of non-pensionable age and no children.
A large family household consists of either (a) two adults and three or more children or (b) three or more adults and one or more children.
Small family households consist of two adults and one or two children.

A.7 Socio-economic classification: With effect from 2003, the SHS uses the National Statistics Socio-economic Classification ( NS- SEC), which has been designed to group together, as far as possible, people with similar levels of occupational skills. The version of the classification used for this analysis has eight categories, although the final one is not used in the tables, as it refers only to those who have never worked or are long-term unemployed. The seven classes which appear in the tables are aggregations of thirteen groups, and were defined in detail in the "Notes and Definitions" section of "Household Transport in 2005". Because the SHS only collects occupational information for people in employment, and for people who are not in work but who have been in paid work in the five years prior to the survey, the socio-economic classification is not known in many cases ( e.g. people who have been retired for many years). For the purposes of classifying households, the socio-economic classification of the Highest Income Householder is used.

A.8 Annual net household income: this is the total annual net income ( i.e. after taxation and other deductions) from employment, benefits and other sources, which is brought into the household by the highest income householder and/or his/her spouse or partner. This includes any contribution to household finances made by other household members ( e.g. for "digs"). Because of refusals or "don't knows", full information for the main components of household income was not collected from about a third of households. Subsequently, the SHS contractors imputed the missing components of income for almost all these households, using information that was obtained from other households that appeared similar. Depending upon the component of income, the contractors used either "hot deck" imputation (where the sample is divided into sub-groups based on relevant household characteristics, and the imputed values are obtained from randomly-chosen "donor" cases) or "predictive mean" imputation (where the data are used to construct a statistical model of the relationship between income and other household characteristics, which is then used to "predict" the income in cases where a value is to be imputed). The analyses by income given in this bulletin therefore cover all but a couple of percent of households.

A.9 Distance between home / work and home / school: the interviewer asks for the location of the place of work. If the respondent does not know the postcode, the contractors subsequently try to deduce it, using whatever information was obtained by the interviewer ( e.g. the name and address of the employer): in some cases, this may be sufficient only to indicate (say) the postal district ( e.g. "EH1"). The interviewer asks for the name of the random schoolchild's school, and this is later used to obtain the postcode of the school. This will sometimes be wrong - for example, if there are two schools with the same (or similar) names in the same council area, the postcode of the wrong one may be taken. The distances between home and work and between home and school are the estimated distances "as the crow files", based upon the grid co-ordinates of the "centres" of the postcodes (or whatever types of area were recorded) of the home, place of work and school. Therefore, the distance would be zero in the case of a journey from home to school if exactly the same postcode (or other type of area) was recorded for both the home and school. For example, if it was known that the journey from home and school involved travel from (say) "EH10" to "EH10", the estimated distance would be zero. However, if it was known that the journey from home to school was from "EH10 6UD" to "EH10 6XE", the "crow files" distance between the "centres" of the two postcodes would be calculated. Clearly, the percentage error in the estimation of distances will tend to be smaller for longer journeys - such as a journey from "EH1" to "G1". There will be cases where the "crow files" distance will understate considerably the distance actually travelled ( e.g. for someone who commutes between Kirkcaldy and Edinburgh). The results suggest that small percentages apparently walk or cycle very long distances to work or to school. In some cases, this may be due to errors in the information which was recorded - for example, if the respondent provided only the name of a company or a school or a place, the postcode assigned at a later stage in the processing may be that of another location for that company, or of another school with the same name, or another place with the same name. In such a case, the estimated distance could well be far too high. Or, it might be that the interviewer recorded the wrong mode of travel in the interview. There may also be cases where the person lives far from work (or school), and stays away from home during the working week (or the school week), and so is able to walk from the "temporary" accommodation to work (or school) but is counted on the basis of the long distance between home and work (or school).

A.10 Cars and Motor vehicles: prior to April 2003, the interviewer asked for details of each of the motor vehicles that are normally available for the private use of one or more members of the household, including vans, motor cycles, mopeds, and any other motor vehicles, as well as cars. Details of each vehicle were recorded separately, allowing figures to be produced for the number of cars as well as for the total number of motor vehicles. From April 2003, the interviewer asks only for the number of cars, in order to make "room" for questions on other topics.

A.11 The Scottish Index of Multiple Deprivation ( SIMD)

A.11.1 The Scottish Index of Multiple Deprivation ( SIMD) is used to rank the "data zones" used for the production of Scottish Neighbourhood Statistics in order of deprivation. There are 6,505 data zones, with an average of about 750 residents in each, formed by aggregating Census output areas. The tables in this edition use the second (2006) version of the SIMD, which is based on 37 indicators in the seven individual "domains" of "Current Income", "Employment", "Housing", "Health", "Education", "Access to Services", and "Crime". More information can be found at the SIMD website ( http://www.scotland.gov.uk/simd).

A.11.2 Households in the SHS sample have been allocated the SIMD value of the data zone which contains the postcode of the residence. In the small number of cases where a postcode is split between more than one data zone, the SIMD value used is that of the data zone into which the largest number of dwellings in that postcode falls. The SIMD values have further been assigned to one of 5 quintiles, with quintile 1 containing the most deprived 20% of data zones in Scotland, and quintile 5 the least deprived 20%. Because the SHS sample is not spread uniformly across Scotland, the quintiles do not necessarily each contain exactly 20% of the households in the SHS sample.

A.12 The SHS urban /rural classification

A.12.1 The urban / rural classification is based on settlement sizes, and (for the less-populated areas) the estimated time that would be taken to drive to a settlement with a population of 10,000 or more. The classification is based on postcodes. First, each postcode in Scotland was classed as either "urban" or "non-urban" on the basis of its "density" (measured in terms of the numbers of [a] residential and [b] non-residential addresses per hectare). Then, clumps of adjacent "urban" postcodes, which together contained more than a certain total number of addresses, were grouped together to form "settlements". (Any apparently "non-urban" postcodes which were entirely surrounded by "urban" postcodes, or by a combination of "urban" postcodes and coastline, were reclassified as "urban", and included in the relevant settlements.)

A.12.2 Six categories were then defined:

  • Large urban areas - settlements with populations of 125,000 or more. These are around - but are not the same as - Aberdeen, Dundee, Edinburgh and Glasgow. Because of the way in which settlements are defined, this category may (a) include some areas outwith the boundaries of these four cities, in cases where the settlements extend into neighbouring local authorities, and (b) exclude some "non-urban" areas within the boundaries of these four cities.
  • Other urban areas - other settlements of population 10,000 or more.
  • "Accessible" small towns - settlements of between 3,000 and 9,999 people, which are within 30 minutes drive of a settlement of 10,000+ people
  • "Remote" small towns - settlements of between 3,000 and 9,999 people, which are not within 30 minutes drive of a settlement of 10,000+ people
  • "Accessible" rural areas - settlements of less than 3,000 people, which are within 30 minutes drive of a settlement of 10,000+ people
  • "Remote" rural areas - settlements of less than 3,000 people, which are not within 30 minutes drive of a settlement of 10,000+ people

A.12.3 The urban/rural classification used for the SHS data is based on the Settlement file maintained by the General Register Office for Scotland ( GROS), which is revised from time to time ( e.g.) to take account of new information becoming available and changes in the method which GROS used to calculate the settlement sizes. Because the SHS is conducted in a series of "two-year sweeps", the urban/rural classifications used for 1999/2000, 2001/2002, 2003/2004, and 2005/2006 are all a little different, with each being based on the latest Settlement file that was available at the time of derivation. The classification used for 2003/2004 was the first to be based on a Settlement file which takes account of the results of the 2001 Census. The extent of Settlement boundaries is not expected to change significantly on a year to year basis. However, from time to time, the estimated population of a settlement will cross one of the thresholds ( e.g. increasing from under 10,000 to over 10,000) and this may have a noticeable effect on the percentage of households in some of the categories of the classification.

A.13 Possible sampling variability, and "95% confidence limits" for SHS estimates

A.13.1 Although the SHS's sample is chosen at random, the people who take part in the survey will not necessarily be a representative cross-section of the people of Scotland. For example, purely by chance, the sample could include disproportionate numbers of certain types of people, in which case the survey's results would be affected. In general, the smaller the sample from which an estimate is produced, the greater the likelihood that the estimate could be misleading. As an example, suppose that the percentage of people in a particular population sub-group (those aged 16-19, say) who travel to work in a particular way (eg by bicycle) is calculated from SHS data for a total of only (say) 100 or so commuters from that sub-group. Should the SHS sample contain, purely by chance, just two or three more 16-19 year olds who cycle to work, the resulting estimate would be two or three percentage points higher. Results produced from a small sample could therefore be greatly affected by sampling variability. The larger the sample, the less likely it is that the results will be affected greatly by sampling variability.

A.13.2 The likely extent of sampling variability can be quantified, by calculating the "standard error" associated with the estimate of a quantity produced from a random sample. Statistical sampling theory states that, on average:

  • only about one sample in three would produce an estimate that differed from the (unknown) true value of that quantity by more than one standard error;
  • only about one sample in twenty would produce an estimate that differed from the true value by more than two standard errors;
  • only about one sample in 400 would produce an estimate that differed from the true value by more than three standard errors.

By convention, the "95% confidence interval" for a quantity is defined as the estimate plus or minus about twice the standard error (from sampling theory, the interval is plus or minus 1.96 times the standard error), because there is only a 5% chance (on average) that a sample would produce an estimate that differs from the true value of that quantity by more than this amount.

A.13.3 There is no simple "rule of thumb" for the size of standard errors: the standard error of the estimate of a percentage depends upon several things:

  • the value of the percentage itself;
  • the size of the sample (or sub-sample) from which it was calculated ( i.e. the number of sample cases corresponding to 100%);
  • the sampling fraction ( i.e. the fraction of the relevant population that is included in the sample); and
  • the "design effect" associated with the way in which the sample was selected (for example, a "clustered" random sample would be expected to have larger standard errors - but lower fieldwork costs - than a simple random sample of the same size).

A.13.4 Table 28 shows the "95% confidence limits" for estimates of a range of percentages calculated from sub-samples of a range of sizes ( NB: the confidence limits for estimates of x% and of (100-x)% are the same). The table was produced in the same way as the tables of estimated sampling error in the "Annual Report" volumes of Scotland's People (see section B4), but has a more detailed breakdown of the smaller sample sizes.

A.13.5 The interpretation of an entry in Table 28 is best explained by an example:

  • the value in the cell at the intersection of the "45% or 55%" column and the "800" row is 4.1;
  • this means that the "95% confidence limits" for an estimate of 55% which is produced from a sub-sample of 800 are +/- 4.1%-points;
  • so the "95% confidence interval" for the estimate is 55% +/- 4.1%-points ( i.e. from about 50.9% to around 59.1%, assuming that the value of the estimate is 55.0%);
  • or, on average, only 1 in 20 sub-samples of size 800 would produce an estimate that differs from the (unknown) true value of this quantity (if it is around 55%) by more than 4.1%-points.

A.13.6 As an example of the use of this table, it will be seen from Table 1 that there were 847 single parent households in the survey in 2006, and that an estimated 55% of such households did not have any cars available to them. Because that estimate was produced from data for only 847 such households, sampling variability could (by chance) produce an error of several percentage points. The entry in the cell at the intersection of the "45% or 55%" row and the "800" column in Table 28 shows that the confidence limits for an estimate of 55% based on a sample of 800 will be about +/- 4.1%-points; similarly, Table 28 shows that an estimate of 55% based on a sample of 900 will have confidence limits of about +/- 3.9%-points - so an estimate of 55% based on a sample of 847 will have confidence limits of about +/- 4.0%-points. This means that there is a 1 in 20 chance that the estimate differs from the true value by more than about 4.0%-points. It follows that there is roughly a 1-in-3 chance that the estimate differs from the true value by more than about 2.0%-points. Clearly, estimates based on small samples have wider confidence limits.

A.13.7 Because the survey's estimates may be affected by sampling errors, apparent differences of a few percentage points between the figures for two sub-groups of the population may not be "significant": it could be that the true values for the two sub-groups are similar, but the random selection of households for the survey has, by chance, produced a sample which gives a high estimate for one sub-group and a low estimate for the other. A difference between two sub-groups is "significant" at the conventional "5%" level if it is so large that fewer than one random sample in twenty would be expected to produce a difference of that size (or greater) purely by chance, if the two sub-groups' true values were the same. One way of assessing significance at the 5% level involves comparing the difference with the 95% confidence limits for the two estimates. Suppose that these are +/- 3.0%-points and +/- 4.0%-points, respectively. Clearly:

  • a difference which is less than the magnitude of the greater of the limits (which, in this case, is 4.0%-points) is not significant; and
  • a difference which is greater than the sum of the magnitudes of the limits (in this case 3.0%-points + 4.0%-points = 7.0%-points) is significant.

Statistical sampling theory suggests that a difference whose magnitude is between these values is significant if it is greater than the square root of the sum of the squares of the magnitudes of the limits for the two estimates - in this case, the square root of (3.0 2 + 4.0 2) - i.e. the square root of (9 + 16) - i.e. the square root of 25, which is 5.0. So, in this case, a 5.0%-point difference would be considered significant. Similar calculations will indicate whether or not other pairs of estimates differ significantly.

A 13.8 The above information relates only to sampling variability. The survey's results could also be affected by non-contact / non-response bias: the characteristics of the people who should have been in the survey but who could not be contacted, or who refused to take part, could differ markedly from those of the people who were interviewed. If that is the case, the SHS's results will not be representative of the whole population. Without knowing the true values (for the population as a whole) of some quantities, one cannot be sure about the extent of any such biases in the SHS. However, comparison of SHS results with information from other sources suggests that they are broadly representative of the overall Scottish population, and therefore that any non-contact or non-response biases are not large overall. However, such biases could, of course, be more significant for some sub-groups of the population or in certain Council areas, particularly those which have the highest non-response rates. In addition, because it is a survey of private households, the SHS does not cover some sections of the population - for example, it does not collect information about many students in halls of residence (see paragraph B2.3). The "Methodology" and "Fieldwork Outcomes" volumes of Scotland's People (see section B4) provide more information on these matters.

A.14 Changes to the method of recording the answers to some questions

A.14.1 The SHS interview includes a number of questions which ask why a person does (or did) something - e.g. why he/she uses a particular means of travel to work. Some of these questions were originally "open-ended", with the interviewer typing a summary of the person's answer into the computer. If there were a number of reasons, it could take a long time for the interviewer to type them all in. Therefore, once a few months' answers had been obtained, the SHS contractors scrutinised them, and identified the reasons that were given often. They then changed some of the questions to use pre-coded lists of reasons, so that the interviewer could simply "tick" each one that was given by the respondent, which is much quicker than typing them in. The option of typing in something that the person said was retained for use on those occasions on which some of the answer could not be recorded using the entries in the pre-coded list. In such cases, the contractors subsequently examine the typed-in answers, and decide how to code them and whether there is a need to add new entries to the pre-coded lists.

A.14.2 Comparison of the results of the two methods of recording the answers indicated that, on average, more reasons were recorded per respondent after the pre-coded lists of answers were introduced. Clearly, data which were collected using the "open-ended" forms of questions, are not on the same basis as the data which were collected using the "pre-coded list" forms of questions. Therefore, in some previous editions of this bulletin, the results reported in this bulletin related only to the period for which the current forms of the questions were used.

« Previous | Contents | Next »

Page updated: Monday, October 8, 2007