On this page:

2007 Scottish Survey of Achievement (SSA) - Science, Science Literacy and Core Skills - Supporting Evidence

« Previous | Contents | Next »

Listen

Annex I: Survey design and methodology

I.1 Introduction

The Scottish Survey of Achievement ( SSA) 2007 was required to meet the range of high level objectives set out in Chapter A. In addition, the following practical constraints were imposed where possible.

  • The duration of an assessment session was designed to be about 30 minutes at P3/P5 and 50 minutes at P7/S2.
  • The maximum that any individual pupil would be asked to undertake was two written booklets from Science knowledge and understanding or Science literacy and a questionnaire, as well as, one of the four elements of the practical or a piece of class-based writing.
  • The schools that had been invited to participate in the pre-testing of assessment material for the survey would not be selected for survey involvement, unless absolutely unavoidable.
  • To further minimise the burden on schools, there would be as little overlap as possible between the schools selected for inclusion in the SSA 2007 and the international study Trends in International Mathematics and Science Study ( TIMSS), which also ran in the same year. Where the inclusion of a TIMSS school was unavoidable, these schools undertook a reduced role in the SSA with pupils only being assessed in the knowledge and understanding and teacher judgements components. A pupil asked to participate in both the SSA and TIMSS could be withdrawn from the SSA if it was felt that it would be unduly stressful.
  • The total number of pupils selected for testing in an individual school was in proportion to the size of the school roll and was designed to be on average 20 for primary schools, and 30 for secondary schools.
  • A maximum number of twelve pupils per school would be selected for participation in the practical elements of the survey.

This annex explains how the sample was designed in order to best meet these objectives and constraints, as well as how the results were analysed.

I.2 The sample design

The principal aim of the SSA is to produce national estimates of achievement for pupils across Scotland at different stages in their education, whether they are taught in the publicly funded or the independent sector, however large or small their schools, and wherever they might be located. The only pupils deliberately excluded in 2007 were those in special schools. Pupils with special educational needs who were being taught in mainstream schools were not excluded, although they could be withdrawn from the sample at the school's discretion, before or during testing, should they consider the experience potentially or actually distressing for them.

In the 2006 survey, pupils known to be taught in Gaelic units were also excluded. However, such pupils were included in certain sections of the 2007 SSA to meet its third objective: to assess and report pupils' Science knowledge and understanding achievement by 5 to 14 levels at P5 and P7 for pupils taught wholly or partially through the medium of Gaelic.

Another objective of the 2007 SSA was to produce achievement estimates for 'opted-in' local authorities. This represents a different approach to the 2005 and 2006 surveys where, to minimise the burden on schools, half of the local authorities were reported on in each year. In 2007, local authority Directors of Education were invited to opt their LA in to authority level reporting on the Science knowledge and understanding, pupil and teacher questionnaires, and teacher judgements components of the survey. Given this 'opt-in' methodology, the set of reporting authorities is not necessarily representative of all 32 Scottish local authorities, but does include a mix of large and small, urban and rural, and socially deprived authorities from across the country. Table 1 illustrates the SSA Reporting Authorities for 2005, 2006 and 2007.

Table 1
SSA Reporting Authorities for 2005, 2006 and 2007

2005

2006

2007

Aberdeen City

Aberdeenshire

Aberdeen City

Angus

Argyll & Bute

Aberdeenshire

East Ayrshire

Clackmannanshire

Angus

East Dunbartonshire

Dumfries & Galloway

Dumfries & Galloway

East Renfrewshire

Dundee City

East Ayrshire

Edinburgh City

East Lothian

East Dunbartonshire

Highland

Eilean Siar

East Renfrewshire

Inverclyde

Falkirk

Edinburgh City

North Ayrshire

Fife

Falkirk

North Lanarkshire

Glasgow City

Fife

Perth & Kinross

Midlothian

Glasgow City

Renfrewshire

Moray

Highland

South Ayrshire

Orkney Islands

Inverclyde

South Lanarkshire

Scottish Borders

Moray

Stirling

Shetland Islands

North Ayrshire

West Lothian

West Dunbartonshire

North Lanarkshire

Perth & Kinross

Renfrewshire

South Ayrshire

South Lanarkshire

Stirling

West Dunbartonshire



When designing a sample survey, one of the key considerations is the margin of error which will be associated with the estimated population proportions. For example, on the basis of simple random samples, a sample size of 1,000 pupils would produce an estimate with a maximum associated margin of error of around three percentage points. So, we might say that the estimated proportion of P3 pupils deemed to be working at Level B in numeracy is 57 per cent plus or minus three per cent, having assessed 1,000 P3 pupils. With a sample size of 500 pupils the margin or error would increase to more than four percentage points. With 250 pupils the margin of error would be around six percentage points.

In the 2007 SSA, the aim was to ensure that the maximum margin of error associated with national estimates of pupils at a stage attaining a certain level was +/- two percentage points. Due to the complex two stage design of this survey more pupils are required to be sampled to achieve this level of accuracy compared to the equivalent simple random sample. Therefore, a minimum of 3,000 pupils per stage would be selected to undertake the Science knowledge and understanding assessment. A minimum of a further 1,250 pupils per stage would be required to undertake the Science literacy assessment. As the Science literacy assessment was a test of a new approach rather than a full national assessment a smaller sample was acceptable with an associated reduction in accuracy. In practice, these pupil numbers were increased slightly to allow for an estimated ten per cent or so pupil loss through absence and it was intended to select a total of 16,000 pupils for knowledge and understanding and 5,500 for Science literacy. The intended sample size for national reporting was therefore 21,500 3. In order for the sample to be nationally representative, the intended sample would be selected from local authorities proportionately to the size of the stage in that area.

In addition, to allow local authority level reporting, it was necessary to increase the Science knowledge and understanding pupil sample sizes that would normally be available within a representative national sample for each opted in authority. It was therefore decided to aim for sample sizes of around 450 pupils per stage in these areas 4 in the 2007 SSA. This represents similar sample sizes for reporting authorities in 2005 and 2006. However, in those earlier years, the local authority sample had to support separate achievement reporting in both reading and numeracy and the pupils were therefore divided between these two types of assessment with the consequence that margins of error were close to seven percentage points.

In 2007, the local authority sample need only support achievement reporting in Science knowledge and understanding, resulting in authority achievement estimates with margins of error of around five percentage points, a greater degree of accuracy than in 2005 and 2006. Authority reports will still be produced if the achieved samples drop below 450 pupils per stage, unless the sample falls below 200 pupils or is drawn from fewer than 50 per cent of the sampled schools in the authority. However, any drop in the pupil sample reduces the accuracy of the results and increases the margin of error.

I.2.1 Sampling across authorities and in the independent sector

Local authorities (both reporting and non-reporting) and the independent sector were treated as a single group for sampling purposes and all primary stages (P3, P5 and P7) were taken in the selected schools. A two-stage disproportionate stratified random sampling scheme was applied to produce the required number of pupils at each stage. This approach differs to the 2005 and 2006 surveys where reporting authorities, non-reporting authorities and the independent sector were all sampled separately and where different school samples were drawn without replacement for the three primary stages.

The school sample was selected first, stratified by authority and school size. School numbers were selected to give an average of 20 pupils per stage per school in the primary sector and 30 pupils per school at S2, based on the intended authority pupil sample size. For example, an authority with an intended sample size of 200 pupils at S2 would require a minimum sample of 7 schools.

Publicly funded schools were then classified as large (10 or more pupils in the relevant stage) or small (less than 10 pupils per stage). All pupils at the appropriate stage in small schools would be selected to avoid excluding small groups of pupils from their classmates and for the convenience of the school. Thus every pupil in a selected small school had a 100 per cent chance of selection. The sampling probabilities of small schools were therefore adjusted to ensure that pupils in these schools had the same probability of selection as any other pupils within the particular stage and authority, in principle producing an unbiased sample of pupils from these schools. This probability was given by dividing the number of pupils required in that authority by the authority's pupil population at that stage.

Otherwise, large schools were selected via simple random sampling. Pupils within them were then selected via stratified random sampling from within each stage in the school, with a probability of selection proportional to the size of the stage. The stratification applied was gender and deprivation (pupils living in the 20 per cent most deprived areas versus others, as indicated by their post codes). Again, this strategy gave every pupil in a 'large school' an equal chance of selection, in principle producing an unbiased pupil sample. The probability of selection of pupils in large and small schools was also equal within an authority.

Opted-in authorities were over-sampled to allow separate reporting of pupil achievement on Science knowledge and understanding. This required a minimum of 15 schools for sampling at each stage. Where fewer than 15 schools were available, all schools were selected. Where greater than 15 schools were available in an opted in authority, schools were selected via simple random sampling. In order to produce unbiased pupil samples in reporting authorities, the selected schools in these authorities had to provide the same proportion of their pupils for assessment. This sampling fraction was given by the proportion of the authority's population size at the stage that the required 450 pupils represented, and varied from authority to authority and from stage to stage.

The over-sampling in opted-in authorities was redressed through data weighting when the national achievement estimates were produced.

In practice, the school sample was selected first and authorities were given the opportunity to withdraw schools if necessary. The schools themselves were then invited to take part and the pupil sample was then drawn from those schools who agreed to participate.

I.2.2 Sampling for Gaelic medium reporting

The third objective of the 2007 SSA was to assess and report on pupils' Science knowledge and understanding achievement by 5 to 14 levels at P5 and P7 for pupils taught wholly or partially through the medium of Gaelic. Therefore, unlike in previous years, all such pupils (including pupils who receive any teaching delivered in Gaelic, written or oral) were selected for the survey. This therefore represents a Gaelic medium census rather than a sample survey.

Two knowledge and understanding booklets were translated for use in this element of the survey. Pupils were then given the opportunity to complete the assessment in their preferred language (English or Gaelic). In addition, one pupil questionnaire was translated and all pupils were invited to complete it in either English or Gaelic. Teacher judgements were also sought for pupils involved in this Gaelic medium census. Gaelic medium reporting was not required for Science literacy, Science practical or class-based writing in a Science context.

Knowledge and understanding achievement estimates will be produced for the Gaelic medium census and a comparator group of pupils will also be selected from the main sample to allow more detailed analysis. It is intended that the comparator group will match the Gaelic medium census as closely as possible in terms of gender and deprivation as well as being restricted to the two booklets undertaken by the Gaelic medium pupils. Results of the Gaelic survey will be published separately.

I.2.3 Summary of sampling strategy

Local Authorities and independent schools

  • At each stage, 450 pupils in reporting authorities, ten per cent of pupils in non-reporting authorities and 150 pupils from independent schools were selected at random to take part in the knowledge and understanding assessment. A further 1,000 at P3 and 1,500 at each of P5, P7 and S2 (different) pupils were randomly selected for the Science literacy assessment. Selection was via disproportionate stratified two-stage random sampling.
  • The school population was stratified by authority and school size (stage size: <10 and 10+) prior to sampling.
  • Separate school samples were drawn for primary and secondary, with all stages being taken in the selected primary schools.
  • In the small school-size strata, schools were selected via simple random sampling (i.e. equal probabilities of selection), with all pupils at the relevant stage automatically selected for assessment. The sampling probability of these schools was adjusted to ensure that their pupils had equal probability of selection as any other pupils within the particular stage and authority.
  • In the large school-size strata, schools were selected by simple random sampling and pupils were selected by stratified random sampling from within the stage in each selected school with a probability of selection proportional to the size of the stage. The stratification applied was gender and deprivation (pupils living in the 20 per cent most deprived areas versus others, as indicated by their post codes).
  • Opted-in authorities were over-sampled to allow separate reporting of pupil achievement on Science knowledge and understanding.
  • When sampling within schools for the knowledge and understanding sample, the same proportion of pupils was selected in each school at a stage in an opted in authority to give every pupil at the stage within the authority an equal probability of selection.
  • The over-sampling in opted-in authorities was redressed through data weighting when the national achievement estimates were produced.

Gaelic medium census

  • All P5 and P7 pupils in Scotland who were learning Science partially or wholly through the medium of Gaelic were selected. Only those pupils in special schools, in schools withdrawn by their authorities, or in the TIMSS main sample schools were excluded.

The outcome of this sampling strategy was an intended pupil sample of around 12,500 pupils in P3 and 13,000 pupils in each of P5, P7 and S2. The pupils were drawn from just over 1,100 different schools throughout the country: 895 primary schools and 234 secondary schools. A detailed breakdown of these figures by stage and local authority is given in Table 2.

This highlights that just over 90 per cent of the selected pupils came from the 817 primary schools and 207 secondary schools in the 22 reporting authorities (around 11,300 pupils at P3, 11,600 to 11,800 in P5, P7 and S2). The total number of pupils selected in each school in reporting authorities varied from one pupil to around 70 at primary stages and from five pupils to about 130 pupils in the secondary sector (S2). The requirement for 450 pupils per stage for opting-in authorities in the knowledge and understanding assessment varied slightly from authority to authority because the sampling fraction to be applied to each school's stage roll had to be adjusted in order to produce integer numbers of pupils.

Just over eight per cent (around 3,100) of the selected pupils were drawn from the 69 primary schools and 23 secondary schools in the ten non-reporting authorities. The total number of pupils selected in each school in these authorities varied from one pupil to 29 pupils at P3, one to 28 pupils at P5, one to 38 pupils at P7 and from three pupils to 155 pupils in the secondary sector (S2).

The remaining pupils (just over 150 per stage and 630 in total) came from the independent sector where nine primary and four secondary schools were randomly selected to participate. In these schools, the number of pupils selected varied from nine to 22 at P3, seven to 43 at P5, four to 36 at P7 and twelve to 72 in S2.

As a result of this disproportionate sampling strategy (with opted-in authorities being over-represented) appropriate adjustment (data weighting) was required when calculating the estimated national achievement proportions in order to compensate for this bias in authority representation. This is discussed further in section I.7.

Table 2
The intended pupil samples for written assessment in the 2007 SSA

(Number of schools* and pupils selected for survey participation)

Reporting authorities

Schools:

Pupils:

Total

Primary

Secondary

P3

P5

P7

S2

Aberdeen City

33

11

477

489

488

523

1,977

Aberdeenshire

48

15

549

564

535

568

2,216

Angus

38

8

528

547

529

502

2,106

Dumfries & Galloway

55

11

528

545

617

523

2,213

East Ayrshire

37

8

515

549

552

509

2,125

East Dunbartonshire

29

7

465

488

492

504

1,949

East Renfrewshire

18

7

493

505

502

506

2,006

Edinburgh City

34

12

520

562

551

579

2,212

Falkirk

33

6

481

492

485

511

1,969

Fife

43

14

556

616

600

608

2,380

Glasgow City

45

13

549

641

630

642

2,462

Highland

60

13

486

544

533

567

2,130

Inverclyde

21

7

491

484

486

487

1,948

Moray

30

5

509

523

533

388

1,953

North Ayrshire

38

6

524

549

547

515

2,135

North Lanarkshire

36

11

518

545

513

592

2,168

Perth & Kinross

45

9

514

535

512

509

2,070

Renfrewshire

37

9

536

565

563

531

2,195

South Ayrshire

30

8

500

507

509

505

2,021

South Lanarkshire

46

14

584

602

596

594

2,376

Stirling

35

7

478

494

494

485

1,951

West Dunbartonshire

26

6

479

493

495

494

1,961

Total for reporting authorities

817

207

11,280

11,839

11,762

11,642

46,523

Other authorities

Argyll & Bute

12

2

90

114

121

33

358

Clackmannanshire

3

1

57

60

59

59

235

Dundee City

2

2

24

25

30

128

207

East Lothian

5

2

69

64

71

112

316

Eilean Siar

15

2

19

61

67

32

179

Midlothian

4

3

73

69

72

104

318

Orkney Islands

7

5

38

59

89

180

366

Scottish Borders

7

2

69

63

96

110

338

Shetland Islands

5

1

44

51

57

35

187

West Lothian

9

3

132

139

136

217

624

Total for non-reporting authorities

69

23

615

705

798

1,010

3,128

Independent schools

9

4

160

159

158

153

630

Scotland Total

895

234

12,055

12,703

12,718

12,805

50,281

*Following the initial selection of the school sample, schools were given the opportunity to withdraw from the sample given any particular mitigating circumstances. These normally included extraordinary staffing issues or school mergers, closures or moves. The figures given above reflect the final school sample from which the pupil sample was drawn after these schools had been withdrawn. Numbers of schools initially selected are given in table 3 below.

I.3 Science knowledge & understanding and Science literacy assessments

Science knowledge & understanding and Science literacy were assessed within the written survey via a sample of test items and tasks. However, as in 2005 and 2006, it was decided that writing achievement would be estimated on the basis of class teachers' judgements rather than through in-survey testing, with a subset of submitted and rated writing evaluated through moderation. This addressed the continuing concern about the validity of assessing writing skills in the relatively artificial and time-constrained context of an exam scenario.

As discussed in section I.2.1, a minimum of 3,000 pupils per stage were required to undertake the knowledge and understanding assessment to produce national achievement estimates at the required degree of accuracy. In practice, substantially more pupils were assessed on this topic, since the sample was boosted to 450 pupils per stage in opted-in local authorities.

These pupils were required to attempt two booklets each containing nine tasks equally spread across the three consecutive levels to be tested (at P3, where only two levels are assessed, each booklet contained six tasks) and across the three Science outcomes. In addition, tasks were categorised by the groups and topics from the Improving Science Education Framework on which they were based. Thus the booklets were designed to be as representative as possible and to minimise the risks of any pupil being unable to attempt a question or of external criticism of their validity as Science assessments. In addition, tasks at a level were common across stages (for example, a Level B task used at P3 would also be used at P5) and some items from the 2003 AAP were repeated to allow limited comparison over time with the 2003 results.

Those pupils selected to be assessed on Science literacy (around 850 pupils at P3 and 1,200 to 1,300 at each of P5, P7 and S2, distinct from those pupils assessed on knowledge and understanding) also attempted two booklets. These covered two of the three different levels to be assessed at each stage with tasks at a level being common across stages.

The booklet design and separate pupil samples for Science literacy and knowledge & understanding, meant that constraints on the duration of an assessment session were met.

Items and tasks were distributed among pupils using 'multiple matrix sampling', a strategy for ensuring that as many test items as possible are used in a survey. This maximises curriculum coverage and therefore assessment validity, without any one pupil being required to attempt unacceptably long tests, or to be assessed over unacceptably long periods of time. Booklets were randomly allocated to pupils in such a way that as few pupils as possible would be faced with the same task or booklet in any particular school (minimising any possibility of school effects), whilst all tasks/booklets would eventually be attempted by similarly sized and similarly representative national (and authority for knowledge and understanding booklet) samples of pupils ('interpenetrating' or 'concurrent' samples).

More information about the tasks used is available in Annexes II.1, II.2 and II.3.

I.4 Sampling strategy for practical assessments

As per the 2005 and 2006 surveys, the practical part of the SSA was carried out by visiting teachers called field officers with a sub-sample of pupils in a sub-sample of the schools in the main survey (cost and logistics being too challenging to carry out in all of the survey schools). Only Gaelic schools and those participating in TIMSS would be excluded. Reporting of the results was designed to be indicative at the national level only and local authority information was not required.

A minimum of 300 pupils per stage was required for each of the four elements of the practical, 1,200 in total. Therefore a minimum of 100 practical schools per stage would be required with three pupils per stage allocated to each of the four elements in every school. However this minimum number of schools would actually be higher since, as with the main national sample, small schools would be included with no cut off and if there were less than twelve pupils per stage then all those available would be selected for practical tasks.

It was agreed that P3 and P5 stages would be selected from the same school but that P7 would be selected in different schools from those in the P3 and P5 sample. This meant that the total minimum number of schools remained at 300 but that 400 field officer visits would be required. Given that field officers would work in pairs with each pair allocated a target of six schools, it would be necessary to recruit a minimum of 135 field officers.

The strategy for schools was therefore to over-sample in the first instance to make matching to field officer pairs easier. This would also allow for school and pupil withdrawals although it was not intended to give extra reserve schools. It was expected that around 160 field officers would be required to deliver this over-sampling and all 32 local education authorities were invited to nominate practising teachers for these roles. The numbers of field officers requested from each authority reflected the authority's relative size, in terms of teacher population.

Following nomination, field officer pairs were then matched to the schools and schools were informed that they would be asked to take part in the practical assessment. Adjustments were made following a small number of field officer and school withdrawals and it was always a contingency that some field officers could be asked to carry out up to eight field visits if necessary. After withdrawals, 265 primary schools and 124 secondary schools were included in the practical sample.

Taking account of the number of pupils available practical schools participated in as many elements of the practical as possible. Individual pupils were involved in only one of those activities. Within each activity, there were three tasks at each stage which were allocated at random, except in the case of the investigation element where the task was chosen by the class teacher. Further detail about the practical tasks is available in Annex II.4.

Pupils were selected from the main stage sample in each practical school by simple random sampling. Where possible twelve pupils were selected per school, to give three pupils per element of practical assessment. Where less than twelve pupils were available, pupils were randomly allocated assessment types until the maximum number of pupils available was reached. This was constrained to ensure pupils were distributed across assessment types, for example, in principle if four pupils were available one would take part in each of the four assessments rather than allowing them to be allocated purely at random. The approach was complicated by the requirement for additional pupils in the group exercise. An allocation algorithm was developed to ensure a fair distribution of tasks to schools to which the pupils were then allocated. In total, between 1,400 and 1,500 pupils were selected per stage to allow for attrition and achieve the required minimum of 300 pupils completing each element at each stage for national reporting.

For all assessments conducted within the practical component of the survey, achievement results are reported as field officer level judgements. Findings are presented in Chapter D as sample statistics only, with no data weighting. Practical results should be treated as being indicative rather than definitive achievement estimates.

I.5 Sampling strategy for writing assessments

Writing was assessed indirectly rather than through the survey itself. For a sub-sample of pupils, schools were invited to forward a piece of extended writing in the context of Science, that would illustrate the level the pupil was currently working at. A proportion of the writing submitted was then randomly selected and centrally moderated by trained education authority representatives. This approach was preferred for authenticity, with timed unsupported writing being considered less valid than in-class supported writing.

Pupils were selected via stratified sampling of the existing main national pupil sample. Those pupils selected to take part in the Science practical were excluded, as were all pupils in Gaelic medium schools and those in schools also participating in TIMSS. Stratification was by gender and deprivation at the pupil level.

More information about the writing assessments is available in Annex II.3.

I.6 Participation rates

The above sections have outlined the sampling strategies used in the 2007 SSA. However, schools were not obliged to take part and there will always be some pupils absent on assessment days. Table 3 presents statistics on school participation.

Table 3
School participation statistics

Primary

Secondary

Total

Schools initially selected for participation

1,023

281

1,304

Schools agreeing to participate

895

234

1,129

Schools returning completed test booklets

856

227

1,083

Participation rate (%) among selected schools

84

81

83

Participation rate (%) among schools agreeing to participate

96

97

96

Schools that contributed pupil writing samples

789

158

947

Schools that participated in practical assessments

265

124

389

Schools that returned pupil questionnaires

836

182

1,018

A number of schools declined the invitation to participate or failed to respond by the due date. Where reasons were given, they included concerns about the burden on pupils and teachers (particularly in small schools), consecutive years of participation, staffing issues involvement with other surveys (e.g. TIMSS) or HMIE inspections, closures and mergers. In addition, a small number of schools had concerns about the stress which the survey may place on their pupils. Of those schools which did agree to participate, only a very small number failed to return completed test booklets.

Among originally selected schools, the participation rate was 84 per cent among primary schools and 81 per cent among secondary schools. Interestingly there was no evidence of any tendency to decline to participate or fail to return booklets the larger the pupil sample requested. In total 856 primary and 227 secondary schools took part in the main survey.

Table 4
Pupil participation statistics

P3

P5

P7

S2

Total

Pupils originally selected for participation

12,055

12,703

12,718

12,805

50,281

Pupils actually assessed

9,664

10,135

10,261

9,761

39,821

% of pupils originally selected

80

80

81

76

79

Pupils involved in the analysis of Science Knowledge & Understanding

8,809

8,832

8,978

8,522

35,141

Pupils involved in the analysis of Science Literacy

855

1,303

1,283

1,239

4,680

Pupils involved in the practical assessments

1,257

1,241

1,194

1,093

4,785

Pupils involved in the moderation of writing

2,421

2,543

2,545

1,781

9,290

Pupils returning completed questionnaires

9,927

10,353

10,564

8,044

38,888


Figures on pupil participation are presented in Table 4. These indicate that around 80 per cent of those pupils originally selected for participation were actually assessed. The reduction is due to a number of factors. Some schools who had agreed to participate did not in the event do so (completed tests were not returned) and, in the schools that did undertake assessments a small number of pupils could not be assessed. These pupils may have left the school since the sample was drawn, been withdrawn from the sample by the schools, or been absent during the assessment period. Pupil loss was less at primary schools than at secondary.

Gender and deprivation imbalances were redressed during achievement estimation, through appropriate data weighting, about which more information is given in I.7 below.

I.7 Data weighting procedures

Due to survey non-response and national sample imbalances caused by the need for local authority reporting, the Science knowledge and understanding and Science literacy written test data needed to be weighted to produce nationally representative achievement results.

The weighting attached to each pupil comprised two components. The first part of the weighting adjusts for imbalances in the pupil sample within the school and is equal to the total number of pupils in the school who are in the same stage and have the same gender and deprivation score as the pupil divided by the number of those pupils who were included in the assessment.

The second part of the weighting adjusts for imbalances at the authority level and is equal to the total number of pupils in the authority with the same gender, deprivation score and stage as the pupil divided by the total number of such pupils who attended a school that participated in the assessments.

Multiplying these two weights together gives the pupil's overall weight. A more detailed explanation of the weighting methodology follows:

Since there are many variables involved in the computation of weights for this survey, use of conventional subscript notation would result routinely in expressions involving six or seven subscripts, which could be very difficult to read. In this section, therefore, square brackets are used rather than reduced-font subscripts. Thus the expression p iskgdv/b will normally appear here as p[i,s,k,g,d,v/b].

The variables involved in the computation of weights for individual pupil results are as follows:

  • School, designated s, ranging over all Scottish schools.
  • Stages, designated k, drawn from the set {P3,P5,P7,S2}.
  • Pupils within schools, designated i.
  • Gender, designated g, drawn from the set {G,B,N}, standing for Girl, Boy and Not specified, respectively.
  • Deprivation index d = 1 if a pupil lies within deprivation decile 1 or 2,
    = 2 if a pupil lies within deprivation deciles 3-10,
    = 0 otherwise (typically unspecified).
  • Level, designated v, drawn from the set {A,B,C,D,E,F}.
  • Authority band, designated b. There are two categories of authority: the 22 reporting authorities, and the ten non-reporting authorities. Reporting authorities were treated separately, each as a single band. Non-reporting authorities were considered together in a single band. Independent schools were also grouped together, regardless of their location, in a single band. Schools are, of course, completely nested in bands.

equation

Summation over a particular subscript is indicated by a dot. Thus p[.,s,k,.,.,v/b] denotes the total number of pupils in school s at stage k tested at level v in band b. For the special case of level, the dot represents aggregation over pupils tested at one or more levels; an asterisk is used here as a special notation to denote aggregation over all pupils, whether tested or not. Thus p[.,s,k,.,.,./b] denotes the total number of pupils tested at stage k in school s, while p[.,s,k,.,.,*/b] stands for the total pupil roll size for stage k in school s, including pupils not tested. Similarly, p[.,.,k,.,.,*/.] denotes the total size of the pupil population in Scotland at stage k.

As a convenient shorthand, a pupil at stage k with gender g and deprivation index d is referred to as belonging to the group kgd. This shorthand can also be extended to cover aggregates, so that, for example, the group k.. contains all pupils at stage k.

Equation

The quantity r[i,s,k,g,d,v/b] is of interest not so much in itself but for its contribution to the aggregate r[.,s,k,g,d,v/b], which is equal to the roll size of group kgd in school s, provided school s contributed to the kgd sample at level v, and zero otherwise.

Under certain circumstances, it can happen that the actual number of pupils sampled at a given stage in a particular school, p[.,s,k,g,d,v/b], turns out to be greater than r[.,s,k,g,d,v/b], the reported group roll size. In order to avoid such paradoxes, in practice for computing weightings this composite value is used:

equation

Each pupil in school s, tested at level v, with gender g and deprivation index d, has weighting:

equation

The first part of {2} is the ratio of the total roll of group kgd pupils in school s to the total number of group kgd pupils in the same school s tested at level v. It represents the weight associated with school s in group kgd at level v.

The second part of {2} is the weight associated with the whole of authority band b, computed as the ratio of the total group kgd roll in authority band b to the total group roll size considering only schools in that authority which contributed to the kgd sample at level v.

Summing {2} over pupils and schools, we should obtain

equation

In other words, the sum of weights of all sampled pupils at level v in group kgd within an authority band should equal the total population roll size for that group within the band.

It is often convenient to normalise the basic weighting by dividing by the total roll size and multiplying by 100:

equation

By substituting the total population roll size at stage k, p[.,.,k,.,.,*/.], for the divisor in {4}, we obtain the normalised weight for a pupil within the country, rather than within the authority alone.

To restrict attention to a particular group, simply do not aggregate over the group. For example, the expression for the weight for pupils in a given school, restricted to deprived girls, considered within the authority band, would be:

equation

The corresponding normalised weighting would be:

equation

Now define 0 = f[i,s,k,g,d,v/b] = 1 as the proportion of correct marks scored by pupil i from school s in the level v assessment. f[i,s,k,g,d,v/b] is undefined for p[i,s,k,g,d,v/b] = 0.

f[i,s,k,g,d,v/b] and p[i,s,k,g,d,v/b] can be abbreviated to f[i,s,v] and p[i,s,v], respectively, where there is no ambiguity. Similarly, we can usually abbreviate w[i,s,k,g,d,v/b] to w[i,s,v/b], when there is no risk of ambiguity.

Now write f p(i,s,v) = 1 when f[i,s,v] = p, 0 otherwise. Then f 0.5(i,s,v) = 1 characterises a 'good start' at level v in the subject, a pupil showing f 0.65(i,s,v) = 1 is deemed to have 'well-established' skills at level v, and pupils such that f 0.8(i,s,v) = 1 are said to have 'very good' achievement at level v. f p(i,s,v) can be written in full as f p(i,s,k,g,d,v/b), when necessary to avoid ambiguity.

If, now, for each sampled pupil in the group of interest, we multiply f p(i,s,v) by w[i,s,v] and sum over all pupils in the group, we obtain an estimate of the number of pupils achieving p relative to the corresponding group in the population.

For example

equation

estimates the number of pupils at stage k in authority band b, of gender g and deprivation index d, achieving a 'well-established' result at level v.

To express the same quantity as a percentage of all pupils at stage k in band b, of gender g and deprivation index d relative to level v, replace w[i,s,k,g,d,v/b] with the normalised w', as in

equation

I.8 Estimating standard errors through the jackknife procedure

As the SSA is a sample survey, there is an element of uncertainty in the results. The weighting methodology is designed to reduce the effects of any sampling or response bias, but, as with any sample survey, there is always a degree of uncertainty in the results. The likely extent of the sampling variability can be quantified by calculating the 'standard error' associated with an estimate produced from a random sample.

Statistical sampling theory states that, on average.

  • Only about one sample in three would produce an estimate that differed from the (unknown) true value by more than one standard error.
  • Only about one sample in twenty would produce an estimate that differed from the true value by more than two standard errors.
  • Only about one sample in 400 would produce an estimate that differed from the true value by more than three standard errors.

By convention, the '95 per cent confidence interval' is defined as the estimate plus or minus about twice the standard error because there is only a 5 per cent chance (on average) that a sample would produce an estimate that differs from the true value of that quantity by more than this amount.

The standard error of an estimate will depend upon several things, mainly the value of the estimate and the size of the sample (or sub-sample) from which it was calculated. It is worth noting that if the estimate is 0 or 100 per cent, then the standard error for the estimate will be equal to 0. This does not mean that we are sure that the true population proportion will be 0 or 100 per cent also, but this is our estimate from the sample drawn.

The standard error can be calculated in a number of ways, but for the SSA it has been calculated using the jackknife procedure. The SSA sample is selected using a complex multi-stage cluster sampling technique, which means that the standard formulas used to calculate the standard error from a simple random sample would underestimate the standard error.

The jackknife technique was chosen because it provided unbiased estimates of the sampling errors of the means and percentages that the SSA usually reports on.

The jackknife procedure is often referred to as the 'leave one out' method. The idea of the jackknife procedure is that, given a dataset with n observations (or sampling units), n re-sampled datasets are created by excluding each observation in turn from the original data dataset. The new datasets are very similar, but the variability among them allows us to calculate an unbiased estimate of the standard error of the original dataset.

The first stage in calculating the jackknife estimate of the standard error is to calculate n estimates equation(i), where, for each i in 1 to n, equation(i) is obtained by excluding the i th observation so that each equation(i) is calculated with a sample size of n-1. From this it is then possible to calculate the standard error of the estimate by looking at how the jackknife estimates vary around the sample estimate.

The mean of equation(i) is defined as:

equation

The jackknife estimate of the statistic is defined as:

equation

The variance of the estimator equation is equal to:

equation

The jackknife estimate of the standard error of equation(i) is:

equation

I.9 Statistical significance

Because the survey's estimates may be affected by sampling errors, apparent differences of a few percentage points between sub-samples may not reflect real differences in the population. It might be that the true values in the population are similar, but the random selection of pupils for the survey has, by chance, produced a high estimate for one sub-sample and a low estimate for the other.

Throughout the report, a number of differences are referred to as being statistically significant. Usually, if something is described as being significant it means that it is important or special, but this is not the case when talking about statistical significance. A difference between two sub-groups is statistically significant if it is so large that a difference of that size (or greater) is unlikely to have occurred purely by chance.

When analysing the SSA data, statistical tests were used to compare the results from different sub-groups. If the differences between the sub-groups are large enough and the standard errors of the estimates are small enough, then we can say that the differences are likely to be genuine features of the population and that they are statistically significant.

For a crude check, if the difference between sub-groups is more than twice the sum of the standard errors of the two groups then the difference is statistically significant. If the difference is less than double the largest standard error of the two groups, then the difference is not statistically significant. Otherwise, a statistical test is needed to determine statistical significance.

All the statistical tests carried out in the SSA report are carried out at the 2 per cent level for national estimates and the 5 per cent level for local authorities. At the 5 per cent level this means that a difference is considered significant if it would only have occurred once in 20 different samples. Generally speaking this means that in order for us to report a difference as being statistically significant at the 5 per cent level, we have to be at least 95 per cent certain that this difference is a genuine feature of the data and not due to random variation. Similarly at the 2 per cent level, we have to be at least 98 per cent certain that this difference is genuine.

A two sided independent t test has been used to check for statistical significance and the null hypothesis has always been of no difference. This allows the t value to be calculated using the following formulae:

equation

Where SE is the standard error for each estimate and equation1 and equation2 are our estimates for the two groups.

Statistical sampling theory suggests that the difference is significant at the 95 per cent confidence level if t is greater than or equal to 1.96.

Calculations of confidence intervals and statistical significance only take into account of sampling variability. The survey's results could also be affected by non-response bias. If the characteristics of the pupils who participated differed markedly from those pupils who were withdrawn, there might be bias in the estimates. If that is the case, the SSA's results will not be representative of the whole population.

Without knowing the true values (for the population as a whole) of some quantities, we cannot be sure about the extent of any such biases in the SSA. However, comparison of SSA results with information from other sources suggest that they are broadly representative of the overall Scottish population, and therefore that any non-response biases are not large overall or are corrected by the weightings. However, such biases could, of course, be more significant for some sub-groups of the population or in certain education authority areas, particularly those with the highest non-response rates.

« Previous | Contents | Next »

Page updated: Thursday, June 5, 2008