On this page:

Scottish Survey of Achievement: 2005 English Language and Core Skills - Practitioner's Report

« Previous | Contents | Next »

Listen

Section A: Survey design and methodology

A.1 Survey aims and objectives

The Scottish Survey of Achievement ( SSA) differs from its antecedent, the Assessment of Achievement Programme ( AAP), in its broader aim to report pupil attainment at the level of local authorities as well as nationally.

The 2005 SSA, which took place in the period mid-May to mid-June 2005, focused on English Language and on core skills applied within a language context, with the following general aims:

  • to assess the English Language and core skills attainment of pupils in P3, P5, P7 and S2;
  • to assess reading, writing and numeracy attainment at local authority level as well as nationally, for half the Scottish authorities ('reporting authorities' 1);
  • to compare attainment across the four stages and between boys and girls;
  • to compare English Language attainment in 2005 with that in 2001;
  • to provide a learning context against which to reflect on the attainment findings.

The specific objectives of the survey were to:

  • to estimate and report the reading and numeracy attainments of the population of P3 pupils at 5-14 Levels A, B and C, of the population of P5 pupils at Levels B, C and D, of the population of P7 pupils at Levels C, D and E, and of the population of S2 pupils at Levels D, E and F, nationally (all pupils) and by reporting authority (maintained school pupils only), on the basis of in-survey testing;
  • to estimate and report the attainment of the populations of P3, P5, P7 and S2 pupils in writing, both nationally and by reporting authority, on the basis of teachers' level judgments;
  • to provide information about pupils' skills in listening and talking, using ICT, problem solving and working with others, on the basis of practical assessments organised and supervised by itinerant field officers;
  • to explore and comment on any gender differences in attainment in any aspect, including the absence of differences;
  • to compare the reading and writing attainments of pupils at P7 and S2 between 2001 and 2005 (the last AAP survey of English language took place in 2001, did not include P3 and P5, and did not include listening and talking assessment);
  • to explore pupils' and teachers' experience of, respectively, learning and teaching in English Language and mathematics, along with their views about this experience.

As the list above will confirm, this survey, like all large-scale attainment surveys, had a variety of different, if related, aims and objectives. Survey design was also subject to a number of given constraints, and attempting to meet all given objectives within the given constraints proved a challenge.

A.2 Sampling

In the AAP, national pupil samples had always been drawn to be fully representative of their populations, using a two-stage stratified proportionate sampling process. Thus, if x% of P3 pupils in the population were in small schools then this percentage of the national sample should have been in small schools as well. If y% of the P3 pupils were living in large urban areas then y% of the P3 pupils in the sample should also have been living in large conurbations. And so on. This was the intention each time. The intention was not always entirely achieved, however, because selected schools were never obliged to take part in surveys, and in those schools that did so there were always some pupils absent on the assessment days. During statistical analysis (through data weighting) the resulting imbalances in population representation were addressed.

Of particular importance here is the fact that, even had all the originally selected schools and pupils taken part in an AAP survey, there would never have been enough pupils in any individual local authority area, even in the largest authority, for it to be possible to produce meaningful attainment estimates at authority level. For SSA purposes, therefore, it was necessary to boost the representation of the reporting authorities in the survey sample, and to adjust for this over-representation through data weighting when producing the national attainment estimates (see Section B for full details of the sampling strategy and resulting samples). In total, just over 36,000 pupils in almost 1,500 mainstream schools across Scotland were randomly selected for involvement in the survey, that is approximately 9,000 pupils at each of the four stages (15% of the pupil population at each stage).

Pupil sampling was again a two-stage process, in that schools were selected first, followed by the sampling of pupils within the selected schools. To avoid too great a change in survey impact on schools as the AAP was replaced by the SSA, it was hoped to meet the following two constraints on pupil numbers per school during the sampling process:

  • the total number of pupils selected for testing in an individual school should not exceed 20 for primary schools, and 30 for secondary schools (as in recent AAP surveys);
  • pupils should be drawn from one stage only in primary schools (again, as had become normal AAP practice).

It was possible to meet both of these constraints in the group of non-reporting authorities, and in four of the 16 reporting authorities. But the constraints were impossible to satisfy throughout the survey sample, because the other 12 reporting authorities had relatively small pupil populations and/or relatively small numbers of schools. This meant that to achieve the numbers of pupils needed for each authority sample either a very small sample of schools would have to be randomly selected, with very large numbers of pupils then randomly selected for testing within each school, or alternatively all schools would need to participate in the survey, and testing would have to take place at all three stages in the primary schools. The second of these options was chosen.

Practical skills were also assessed in the survey, but for reasons of logistics and cost assessment took place in a subsample only of the survey schools. Moreover, the schools in the subsample were not selected entirely at random. They were selected for their convenient location (ease of field officer access) and for their relatively large size (at least 20 survey pupils available for assessment at one stage). See Section B for further information.

In most assessment situations, schools and pupils are not the only elements that are sampled. The test items and tasks which the pupils attempt are also essentially samples. They are samples of all the items and tasks that already exist or which could be developed to represent the abilities/skills being assessed (reading, numeracy, ICT, etc), i.e. to represent the relevant attainment 'domain'. 'Domain sampling' is the process of drawing items or tasks at random from a large pre-existing pool of relevant items or tasks, perhaps within a given assessment framework specification (e.g. x items 'in-context' and y not, no more than z on any one topic, etc).

Domain sampling would have been used in this survey, to select the items and tasks for pupil administration, had the National Assessment Bank - which now meets the needs of the National Assessment programme as well as the SSA - been large enough to permit it. But while half of the 72 tasks used in the survey to assess reading skills and almost all of the 500+ atomistic items used in the survey to assess numeracy were drawn from the bank, there was no scope for random sampling and indeed shortfalls in required numbers were made up through new item/task development (see Section C).

'Multiple matrix sampling' was employed in the distribution of items and tasks among the pupils. Multiple matrix sampling is simply a strategy for ensuring that as many test items as possible are used in a survey, maximising curriculum coverage and therefore assessment validity, without any one pupil being required to attempt unacceptably long tests, or to be assessed over unacceptably long periods of time. Essentially, different but equivalent subsets of items are administered to different but equivalent subsamples of pupils, in such a way that every item is eventually attempted by similarly sized and similarly representative samples of pupils, and every pupil is assessed with a test of the same length and general composition as any other (see Section C for further information).

A.3 Reading and numeracy assessment

The survey objectives concerning the assessment of reading, writing and numeracy had to be met within the overall pupil sample size, and within the following practical constraints:

  • the duration of an assessment session should not exceed 40 minutes at P3/P5 and 60 minutes at S2;
  • there would be a maximum of three assessment sessions per pupil (though extra time was assumed for completion of questionnaires);
  • individual pupils should not be faced with single-level numeracy tests (single-level reading tasks could not be avoided), given that the level concerned could be far below their capabilities or dauntingly above them.

It very soon became clear that the sample could not accommodate the test-based assessment of all three skill areas. For this reason, and also to address continuing concerns about the validity of assessing writing skills in the relatively artificial and time-constrained context of a national survey, it was decided to estimate writing attainment on the basis of class teachers' judgments rather than through in-survey testing, with a subset of submitted and rated writing evaluated through moderation (see Section C.2). This left the assessment of reading and numeracy to be accommodated within the written survey itself.

The constraint of three assessment sessions per pupil was met. This was already one session per pupil more than had been required in the AAP, and could not be exceeded. The increase was needed in order to accommodate the assessment of reading skills at three consecutive levels at each stage, given that the reading tasks to be used each required an entire assessment session each, and given also that the overall pupil sample had to be shared between reading assessment and numeracy assessment.

The constraint on the duration of an assessment session was also met, as was the constraint that in numeracy assessment no pupil should be faced with a single-level numeracy test; in numeracy, test booklets contained items at three different levels.

In the assessment of reading, 72 different reading tasks were administered in the survey - 12 at each of the six 5-14 levels (A to F). This was the maximum number that could be accommodated within the available survey space, and a number large enough to allow variety in topic coverage, thus assuring high validity in the resulting level-based assessments. The tasks were of the same format as those traditionally used in the AAP, and now used as National Assessments by teachers in their classrooms. Each comprised a source text, or texts, and a series of associated comprehension questions (see Section C.1 for examples). Typically, half the tasks at a level were narrative/personal and half informative. Just under half the tasks were selected from among those that pre-existed in the National Assessment Bank, with the rest being newly developed. New tasks were level-validated and trialled before survey use.

In the assessment of numeracy, 80 'atomistic' items were used at Level A, where the curriculum is relatively narrow, and 90 at each of the other five levels (see Section C.3). The great majority of items came from the National Assessment Bank, having been used in previous AAP surveys of Mathematics and/or in National Tests. Where necessary, new items were developed. In addition, 18 'mathematical literacy' tasks were also administered. Each task presented pupils with printed source material which they then used to respond to a series of linked test questions. These tasks were level validated and trialled before use in the survey.

Multiple matrix sampling was applied in the allocation of assessment materials to pupils. Reading tasks and numeracy booklets were randomly allocated to pupils in such a way that as few pupils as possible would be faced with the same task or booklet in any particular school (minimising any possibility of school effects), whilst all tasks/booklets would eventually attempted by similarly sized and similarly representative national and authority samples of pupils ('interpenetrating' or 'concurrent' samples).

In reading, the 36 tasks to be used at a particular stage (e.g. 12 at Level A, 12 at Level B and 12 at Level C for P3) were grouped to produce 12 task triplets, each triplet containing a task from each of the three relevant levels. The triplets were then randomly allocated to just over half the pupils in each national stage sample. In the schools, pupils were administered the lowest-level task first, then in a subsequent assessment session the middle-level task, and finally, in their third assessment session, the highest-level task.

In numeracy, the items to be administered at each stage were distributed among 10 different mixed-level booklets, all booklets for a stage having the same general make-up (see Section C.3). Booklets comprised test items from the three relevant levels at each stage, with overall item numbers chosen to meet the test duration constraint. Booklets came in two versions. One version presented the items to pupils in a random order, and the other version simply reversed this random order. The resulting 20 booklets at a stage (10 booklets, each in two versions) were randomly allocated to just under half the pupils in the national sample at that stage, each pupil being allocated two such booklets.

The mathematical literacy tasks were packaged two to a booklet, and each 'numeracy' pupil was randomly allocated one booklet, to be attempted in the third assessment session.

During script processing pupils' item responses were individually recorded onto response record sheets by students employed by the SQA for the duration. Each item had an associated short selection of right and wrong responses, identified as letters (multiple-choice items) or as short keywords. The transcribers simply circled the response option the pupil had selected or supplied, and these response data, after keying, were later automatically marked and converted to binary scores. Both the response data and the score data were analysed.

For both reading and numeracy, pupils were classified into attainment bands at each level on the basis of their test performances, specifically in terms of the proportions of binary scored items they successfully answered within their level-based reading task or at the level concerned across their two 'atomistic' numeracy booklets 2. Pupils correctly answering 80% or more of the test items at a level were deemed to have shown 'very good' attainment at that level. Pupils answering 65% or more of the items at a level correctly but fewer than 80% were classified as having 'well-established' skills at the level. Pupils correctly answering 50% or more of the items at a level correctly, but not as many as 65% were deemed to have made a 'good start' at the level.

In the event, just under 4,000 pupils were assessed in reading at each stage, and over 3,000 per stage were assessed in numeracy. The weighted proportions of pupils in each band at each level in each authority and nationally were computed (see Section B.9), and margins of error estimated using the jackknife technique. The results are presented in Sections E and F, respectively.

In addition to the written and practical assessment undertaken during the survey, schools were invited to submit teachers' level judgments for each of their sample pupils, for reading, writing and mathematics (level judgments that would in previous years have been submitted to SEED for all the pupils at the relevant stage in the school for inclusion in the 'National Survey 5-14'). The results of this enquiry are presented in Sections E, G and F, respectively.

A.4 Writing assessment

As noted earlier, for reasons of survey pressure (reading and numeracy given priority within a large but stretched survey sample) and authenticity (timed unsupported writing being considered less valid than in-class supported writing), no direct writing assessment took place within the 2005 survey itself. Instead, for a random third of the pupils in the survey sample at each stage, schools were invited to forward a piece of extended writing of a specified genre that would illustrate the level the pupil was working at currently: genres - 'personal', 'imaginative', 'functional' - were pre-allocated to pupils at random (essentially another example of multiple matrix sampling).

Each piece of writing submitted by schools was to have a level judgment attached, the judgment having been confirmed through teacher consultation within the school. Schools were informed of the purpose of the exercise, and were further informed that a sample of the scripts would go forward for 'moderation', i.e. they would be reviewed during an inter-rater study (see Section C.2 for further detail).

Writing attainment is reported in terms of the proportions of pupils judged by their teachers to be at the different levels at each stage (see Section G), and comment is offered on the impact of moderation on a subsample of the teachers' judgments (see Section C.3).

A.5 Practical skills assessment

Pupils took part in practical assessments in around a quarter of the survey schools, attempting tasks designed to assess their skills in one or other of a number of different areas: Listening, Talking, Writer's Craft, 'Knowledge about language', ICT, Problem-solving and 'Working with others'.

Administration of the practical assessments was the responsibility of itinerant field officers. The field officers, all practising teachers recruited from local authorities throughout the country, were given one day of task orientation prior to the survey, and then worked in pairs, each pair visiting one or other of their five assigned schools each day to carry out their assessments. Further detail is given in Section C.4.

Schools were randomly selected for involvement in the practical assessments, but with two important constraints: schools had to be within easy travelling distance of a field officer's home base, and, for efficiency reasons, the pupil sample already selected for survey participation in the school had to contain at least 20 pupils at a stage - or at least 20 pupils at P3 and P5 combined. Clearly, these constraints meant that at the primary stages the resulting practical samples could never be faithfully representative of the national pupil populations, since they were by design biased in favour of larger primary schools. However, if we can assume that size of school is not a relevant factor in terms of the practical skills of pupils then the performance findings that have emerged from the practical assessments will nevertheless be valid in reflecting national patterns of practical skills attainment.

In each 'practical' school, up to four pupils at the stage concerned were randomly selected for the assessment of listening skills, up to four for the assessment of talking skills, up to four for the assessment of writing skills (Writer's craft), up to four for the assessment of ICT skills, and up to four for the assessment of skills in problem solving and working with others.

Twelve different mixed-level listening tasks were specially developed for the survey, using CD or video-based source material. Four tasks were administered at P3, and eight at each of the other stages. To assess pupils' talking skills, the field officers interacted individually with randomly selected pupils, engaging the pupil in a dialogue and eventually allocating a 5-14 level to the pupil's performance. There were six ICT tasks in total, two of which were used at all four stages: pupils worked in pairs on these tasks, with the observing field officer completing a checklist as they worked. Finally, problem solving skills and skills when working with others were assessed as small groups of pupils participated in one or other of four discussion-based problem solving tasks; one field officer animated the discussion whilst the other observed and recorded judgmental ratings of various aspects of behaviour.

For all assessments conducted within the practical component of the survey, attainment results are reported as field officer level judgments or as average facilities over all the items at a level across all relevant tasks. Findings are presented as sample statistics only, with no data weighting.

A.6 Pupil and teacher questionnaires

Both pupils and teachers were invited to complete questionnaires, seeking information about teaching/learning circumstances, experiences and opinions.

Eight different pupil questionnaires were developed for use across the stages, four focusing on English Language and four on Mathematics. Pupils who were randomly assigned reading assessments were also randomly assigned one or other of the English Language questionnaires, and pupils who were randomly assigned numeracy assessments were randomly assigned one or other of the Mathematics questionnaires. Among other issues, the questionnaires included enquiries into pupils' native and second languages, their learning resources at home, their job aspirations, their perceptions of the value of language and maths to those working in various occupations, their enjoyment of subject learning and perceptions of subject importance, and their perceptions about the nature of their subject lessons. See Section D for further detail, and Section I for findings.

The pupils assessed for ICT skills were in addition invited to complete an ICT questionnaire, designed to explore their attitudes to ICT and to gather some contextual information relevant to ICT learning (see Section H).

A teacher questionnaire was developed for administration to class teachers at P3, P5 and P7 and to subject teachers at S2. There were two subject-specific versions of the same questionnaire, one focusing on English Language teaching and learning and the other on Mathematics teaching and learning, four stage-specific versions within each subject (i.e. the same questions, but addressed in terms of one specific stage). The various versions were distributed to schools in such a way that they should have been responded to by similarly representative samples of primary teachers and of secondary subject teachers, as appropriate. See Section D for further detail, and Section J for findings.

« Previous | Contents | Next »

Page updated: Thursday, June 29, 2006