On this page:

Long Distance Commuting in Scotland

« Previous | Contents | Next »

Listen

Chapter Three Data Collation and Cleaning

Introduction

3.1. In this chapter we discuss the data collation and cleaning undertaken prior to the main analysis (which is reported in later chapters).

Data sources

3.2. Relevant data sets used throughout this research include:

  • Scottish Household Survey ( SHS) (Household, Random Adult and Travel Diary)
  • 1991 and 2001 Census data, (particularly the travel to work matrices)

3.3. The Scottish Household Survey ( SHS) is a continuous cross-sectional survey which commenced in February 1999 to provide the Scottish Executive and other interested parties with information on the impact on households and individuals of key services and policies. Each complete sample of approximately 30,000 households is gathered over the course of two years. One householder from the selected SHS household is interviewed face-to-face about themselves and other members of the household. In addition, a randomly selected adult member of the same household aged 16 or over (who may, by chance, be the same person) is interviewed on other topics. In this way, results from the survey are representative of both Scottish households and adult individuals.

3.4. Core questions, providing standard information about the composition and key characteristics of households, continue across survey waves. As well as questions about transport patterns and choices, the SHS contains a Travel Diary section in which respondents' report on aspects of their previous day's travel. The main results are reported regularly in a series of Scottish Executive Transport Statistics bulletins.

3.5. Further data sources, utilised in any necessary geographic groupings, are the Scottish Neighbourhood Statistics data zones. The Scottish Executive (2006) summarises the importance of data zones as follows:

" The publication of the data zones is a significant milestone in the Scottish Executive and partners' ability to monitor and develop policy at a small area level. Through Scottish Neighbourhood Statistics data zones will increasingly be the core geography for making available small area statistics across most policy areas including information about benefits, education, health and the labour market. This will allow users to readily bring together information from various sources on a common geography". (Scottish Executive 2006 Scottish Neighbourhood Statistics Data Zone and Intermediate Geography 2006 CD Background Information)

3.6. There are a total of 6,505 data zones in Scotland. The data zones meet tight constraints on population thresholds (500-1000 household residents). They all nest into Local Authorities and are built up from 2001 Census output areas. Data zones group together output areas with similar social characteristics.

SHS data

3.7. SHS data estimates the 'crow-fly' distance between home and place of work within Scotland. There was a need to use 'actual distance travelled' estimates rather than 'crow-fly' distances. This was because 'crow-fly' estimates under-estimate travel distances for the following reasons:

  • estuarial crossings and other geographic features which require significant detours
  • curves and bends in roads (typically rural routes)
  • lack of a direct route resulting in a 'dog-leg' or detour
  • bypasses and general diversions to avoid congestion ( i.e. where the fastest route is not the shortest)
  • one-way systems
  • rectilinear networks in cities ( i.e. streets at right angles) which tend to increase distances in cities by a factor of v2

3.8. The distance between work and home was estimated using the shortest path through the road network. The methodology involved using MVA's Accession software and the Ordnance Survey Meridian road network (the Ordnance Survey Meridian Road Network included Motorways, A Roads, B Roads and minor roads) to estimate road-based distances between data zone points. Data zone 'points' were established on the basis of the locations and sizes of population concentrations within a zone. This approach has the advantage of utilising the high number of data zones (6505) which cover all of Scotland.

3.9. SHS 'crow-fly' estimates were then compared to the 'actual road-based' estimates for all the records utilised in the study.

3.10. In some instances, the nature of the trip record was such that it was not possible to establish 'actual road-based' estimates (eg where the data zone of place of work was not known). This meant that approximately 10% of the SHS trip records were not utilised in the analysis.

3.11. Defining long distance commuting trips on the basis of 'actual road-based' distances rather than 'crow-fly' distances increases the absolute number of long distance commuting records from 7,601 (18% of commuting trips) to 8,987 (24%). Table 3.1 details the proportion increase with the use of 'actual road-based' distance estimates.

Table 3.1: 'Actual road-based' distance compared to 'crow-flies' estimates

Data

Total trips

Long distance trips (15+Km)

Long distance (15+Km)

'actual road-based' Distance

38,259

8,987

24%

'crow-flies' distance

42,250

7,601

18%

SHS data cleaning

3.12. The nature of the SHS survey was such that it was necessary to consider the following in order to use clean data in any subsequent analysis :

  • trips of 15km or more by cycling/walking (147 records) were likely to be incorrectly coded and so were removed from further analysis
  • 'works at sea or offshore' and 'did not walk to work' were coded as long distance commuters (105 records). 'Works at sea or offshore' and walked to work were coded as short distance commuters (12 records)
  • 'works out-with Scotland (but not at sea/offshore)' and 'did not walk to work' were coded as long distance commuters (48 records). 'Works out-with Scotland (but not at sea/offshore)' and 'walks to work' were coded as short distance commuters (4 records)
  • data where employment status was defined as either 'at school' (738 records) or 'in further/higher education' (2467 records) were removed from the employment SHS data analysis
  • within the SHS Travel-to-School data, trips further than 15km by cycling/walking were removed (27 records)
  • 'works at or from home', trips were assigned a mode of 'walk' and assumed to be a short distance commute (1,526 records). This approach may not be the most appropriate in some cases (eg plumber who works from home), but was the best that could be done under the circumstances, given that (in most years) the SHS did not collect any information about whether the person worked (i) at home all the time, (ii) away from home all the time, or (iii) a mixture of the two

Census data

3.13. 'Actual road-based' distances, for work trips within Scotland were also estimated from data zones and attached to Census journey to work database. The Census journey to work data also enabled cross-border work trips to England to be considered. These were coded as follows:

  • output areas within 15km of the Scottish/English border were selected as a subset. Within this subset, 'actual road-based' estimates less than 15km, were labelled 'short distance', otherwise they were labelled 'long distance'
  • all English Output Areas, excluding the 'short distance' subset identified above, were assumed to be long distance trips

Journey time estimates

3.14. Having assigned work and home data zones to each record, consideration was given to the estimate of journey times.

3.15. The Transport Model for Scotland ( TMfS) was used as the basis of journey time estimates for both car and public transport trips. TMfS is a multi-modal model covering 95% of Scotland's population. The TMfS zone system contains 1,096 zones within the modelled area and 37 external zones which cover the rest of the Great Britain mainland. Each zone is based on the 2001 Census output area boundaries. The detailed highway and public transport networks have been developed with a 2002 base year and contain over 23,500 links, 1,850 modelled junctions and 1300 public transport services (rail, bus and underground).

3.16. To build and validate the base models, the existing database has been enhanced with a further data collection programme in the key areas, incorporating roadside interviews, journey time routes, traffic counts, public transport surveys and junction data.

3.17. The TMfS public transport time estimates combine walk, wait and in-vehicle times. Public transport wait times are estimated as either 0.5* the headway (time between successive services) OR a maximum wait time of 15 minutes. Public transport walk times are based on a 4.8km/h walk speed between the TMfS zone and the nearest public transport stop.

3.18. The TMfS car time estimate includes car 'in-vehicle' time only.

3.19. The time estimates (car and public transport) were attached to the records in both the Census and SHS data.

« Previous | Contents | Next »

Page updated: Monday, July 31, 2006