« Previous | Contents | Next »
Listen
Assessment of Achievement Programme
Report of the First AAP Survey of Social Subjects Enquiry Skills (2002)
2. Enquiry skills: written assessment
2.1 The assessment tasks
2.1.1 Overview
The written assessment in this first social subjects survey focused on the enquiry skill Carrying out tasks, with the intention of assessing the contributory skills of retrieving, interpreting and evaluating information from a variety of different paper-based sources. A total of 45 different tasks were newly developed for the survey (see Table 2.1), comprising nine tasks targeted at each of Levels A to E, among which are three from each of the social subjects outcomes, viz. People in the past, People and place, People in society.
Table 2.1 The 45 written tasks developed for survey use |
| People in the past | People and place | People in society |
Level E | Native Americans Mary Queen of Scots The Great Plague | Rich World/Poor World Earthquakes Sculpting the Earth | Advertising Children's Rights Amnesty International |
Level D | Anne Frank The Victorians World War II | Sri Lanka Farming River Study | Library Closure Counting the Cost Fireworks |
Level C | The Vikings Titanic Robert the Bruce | Scotland's Mountains Scotland's Weather Water | School Council Pool Closure Charities |
Level B | The Wright Brothers Columbus The Romans | Travel in Scotland Villages Houses & Homes | Class Council Nursery Closure Children in Need |
Level A | Guy Fawkes Travel Then & Now Egyptians | Journey to School Mobile Shops & Parks Road Safety | The Playground Helper The Lollipop Person The Police |
Content relevant to the various strands within each Knowledge and understanding outcome was used to provide appropriate contexts for the tasks, as noted in Chapter 1. This strategy was adopted purely to ensure a balanced representation of the social subjects through the vehicle of task contextualisation. Every attempt was made to minimise the influence of prior factual knowledge on the skills assessment. For People in the past the content strands are 'People, events and societies of significance in the past', 'Change and continuity, cause and effect', 'Time and historical sequence' and 'The nature of historical evidence'. For People and place they are 'Using maps', 'The physical environment', 'The human environment' and 'Human-physical interactions'. Finally, for People in society we have 'People and needs in society', 'Rules, rights and responsibilities in society' and 'Conflict and decision making in society'. The resulting variety of topics featured in the set of 45 tasks is illustrated in the task titles (see Table 2.1).
Tasks were designed to require up to 45 minutes of testing time at P3 and P5, and up to 60 minutes at P7 and S2. Each task took the same general form: three subtasks, each requiring pupils to respond to a small number of questions by retrieving information from one or other of three different information sources. The three information sources in each task included a text, a table or chart, and one other form, which might be, among a variety of possibilities, a map, a drawing, a street plan, a chronology chart, a photograph or a poster. At Levels B to E, a fourth subtask required pupils to pull together the information from all three sources - perhaps to evaluate relevance or to support or refute a given inference. Table 2.2 provides three example tasks, one each at Levels A, C and E and from the three different outcomes.
The test questions
The number of test questions that featured in each task was 22 at Level A, 24 at Levels B and C, 26 at Level D and 28-30 at Level E. At all levels the questions required a minimal written response that could readily be marked right or wrong, appropriate or inappropriate. Typically, a variety of different types of question were used in each task, selected from the list below.
Summary completion
Many of the assessment activities relating to texts involved the transfer of information in the form of a summary completion. Here, a summary of part or all of the text is written to reflect its content and purpose. Some of the principal meaning-bearing words in the summary are then deleted. The pupils, having read the passage, are required to fill the gaps with any word(s) that recreate the meaning of the original text. An understanding of both syntactic and semantic structures is required to complete the summary task.
Sentence completion
Here, pupils complete sentences either by providing appropriate words to fill word gaps, which could be in the body of the sentence or at the end of it, or by ending the sentence appropriately by selecting from three or four given phrase options.
Short answer questions
These would typically take the form "How many…?", "How far…?", "When…?, "What proportion…?", with pupils responding by extracting specific pieces of information from any of the different types of information source, viz. texts, tables, charts, drawings, maps, diagrams, etc.
Multiple-choice
A variety of multiple-choice techniques was used, including selecting one answer option from several given options, selecting two or more 'correct' statements from five or six, etc. Some of the sentence completion items described above could be classified into this question type.
Table 2.2 Overview of three written tasks |
'Amnesty International' - People in society (Level E) 'Amnesty International' features a 300-word text (source 1), a complex superimposed bar chart (source 2), and two sets of statements attributed to human rights victims (source 3). All the information sources were directly reproduced or adapted from publicly available Amnesty International material. The text, 'What is Amnesty International', provides an overview of the history and mission of this voluntary organisation. The bar chart shows the number of countries, in Europe and worldwide, practising various types of human rights abuse. The victims' statements describe personal experiences of human rights abuse, ending with pleas to write to government leaders in the countries concerned about the problem. Text comprehension (source 1 and source 3) is assessed using summary completion, while understanding of the bar chart is assessed using short answer information retrieval items. Three final questions pull the three information sources together, by asking which source provides the most appropriate information to help young human rights victims, to find out how widespread human rights abuses are, and to set up an Amnesty Group in their own schools. There are 28 test questions in total. |
'Scotland's Weather' - People and place (Level C) This task features a 200-word text (source 1), a cross-sectional weather diagram (source 2), and a set of two horizontal bar charts (source 3). The text describes Scotland's weather & climate, along with rainfall and temperature patterns. The diagram illustrates how the 'rain shadow' effect is produced over Scotland: cloud formation over the Atlantic Ocean in the west leading to heavy rainfall over the western mountains as the clouds move east, finally resulting in low rainfall (rain shadow) towards the east coast. The bar charts continue the theme, comparing seasonal rainfall in the west of Scotland and in the east. Understanding of all three sources was assessed using summary completion, with the addition of one item for source 1, which required pupils to shade the key to a rainfall map of Scotland. Finally, pupils were asked to identify which of the three sources would serve best to find out certain kinds of information: how wet Dunbar is, what Scotland's climate is like, what rain shadows are. There are 24 test questions in total. |
'Guy Fawkes' - People in the past (Level A) This Level A task features a short text of around 100 words (source 1), a drawing (source 2) and a pictograph (source 3). The passage explains the origin of 'Guy Fawkes Night', and describes the ways bonfire night is typically celebrated today. The drawing, captioned with the poem 'Remember, remember…', shows children and adults gathered around a bonfire, with fireworks exploding in the sky. The pictograph, entitled 'What children like doing best on Guy Fawkes Night', shows the (fictional) relative popularity of three different bonfire night activities, using smiling faces to represent individual children's activity preferences. Text comprehension is assessed through a short summary completion activity. Pupils use information in the drawing to judge the truth or otherwise of various statements about Guy Fawkes Night. Their understanding of the pictograph is assessed in two ways - by transferring popularity counts into a table, and by adding further given information to the pictograph itself (ie. adding further smiling faces). There are 22 test questions in total. |
Table and chart completion
Here, pupils are required to add additional information, counts for example, to a given table or chart (including tally sheets and pictographs at the lower levels).
Matching
Occasionally, pupils were asked to match attributes - for example, matching drawings of different types of vehicle to pie charts showing travel times over a given distance, or matching house styles to relevant climates.
True, False, Can't tell
Pupils judge each of a series of statements as true or false, or indicate, with "Can't say", that there is insufficient information in the text to permit such an evaluation. The "Can't say" option removes guesswork to some extent, as well as requiring a more accurate reading of the information given in the information source.
Open ended questions
These tended to appear in higher-level tasks, and generally took the form " Why do you think that….?" or "Give reasons for…".
The three tasks overviewed in Table 2.2 serve to illustrate how some of these question types were typically used.
2.1.3 Task development, validation and pre-testing
A team of task developers worked from mid-December 2001 to mid-February 2002 to create the tasks used in this survey. 'Shredding' meetings were organised by the task developers themselves during January, and draft tasks were submitted by mid-February to the then 5-14 Assessment Unit (FFAU) in the Scottish Qualifications Authority (SQA). SQA staff vetted and edited the tasks as they felt necessary, arranged for validation and organised small-scale piloting in schools.
A validation meeting was held in mid-March, during which the tasks were reviewed by practising primary and secondary social subjects teachers who had not been involved at any stage of task development. Given the novelty of level-based assessment in this area, combined with the complex nature of the lengthy tasks used here, level classification is not a straightforward activity. Nevertheless, the teacher groups reviewing the tasks at a particular level - the level to which the task developers had worked - were given the remit to confirm or question this level categorisation in all or individual cases, and to suggest possible improvements to task content. As a result, all tasks were confirmed on a group-agreement basis to be at the levels suggested by the task developers. Several suggestions were also made for modification to certain tasks.
Given the extremely curtailed timescale that was available for materials preparation for this innovative survey, it was not possible to follow normal procedures by trialling the assessment materials on a national scale. But the newly developed assessment materials were nevertheless pre-tested informally, by the task developers in their own or colleagues' schools. Their resulting experience of pupil reaction and response served as the basis for task refinement. The tasks were further pre-tested on a small scale by SQA staff before being finalised.
2.1.4 Task administration
Individual pupils were expected to attempt two tasks in the survey, at two different levels and relating to two different social subjects outcomes. The lower-level task was intended to be the first to be administered. Tasks were paired and randomly allocated to pupils in such a way that these intentions would be realised in the schools (see Appendix C). Thus, a P7 pupil might first have tried a Level C task featuring People and place, 'Scotland's Weather' perhaps, and then a Level D task featuring People in the past, such as 'Anne Frank'. The two tasks in each task pair were presented to pupils as a single test booklet, with the associated information sources under separate cover for easy reference.
At each stage, up to 18 pupils in each school took part in the written testing (in small primary schools fewer than 18 would be available). The schools organised their own assessment sessions within the timescale they were given, which was mid-May to mid-June. Since pupils were to attempt two different tasks, schools were advised to organise two separate assessment sessions, with a break between. They had the freedom to organise the two sessions to take place on the same day or on different days within the given period. The assessment sessions were supervised by the pupils' own class teachers, or by another teacher chosen by the head teacher of the school. The supervising teacher could explain what had to be done, but was not to provide answers or confirm that a pupil's answers were correct. The sessions were not necessarily timed, but it was expected that they would vary from about 30-40 minutes at P3 to 50-60 minutes at S2, allowing time for the writing task that was included in every booklet as well as for the enquiry skills task itself. Once a school's scripts were completed they were sent to SEED for marking (see below).
In the event, almost 10000 pupils participated in the survey, around 2500 at each stage, and around 95% of them attempted two written tasks as planned.
2.1.5 Script marking
The pupils' scripts were marked centrally by a team of undergraduates, during marking meetings held in June/July 2002. While all the test questions would eventually be marked right or wrong, or appropriate/inappropriate, the markers were not always required to award marks directly themselves. In some cases - for example, the summary completion paragraphs - the markers did make judgments about appropriate mark allocations on the basis of given marking schemes. In other cases - for example multiple-choice questions - they simply noted the response option selected by the pupil, and this was later machine marked.
Scripts were distributed among markers in such a way that any one marker marked only a small number of scripts from any one booklet. This served to reduce the possible effect of differences in marker judgment on the results, for the small minority of test questions that were not of objective format. Checks for consistency were also carried out. For each task at least one script per marker was re-marked by a different marker, and the results compared. Any discrepancies were then immediately investigated. Where discrepancies revealed clear errors on the part of an individual marker, the marker in question was alerted to the problem. Where discrepancies suggested different interpretations of the marking scheme for the task concerned, clarification was immediately given to all the markers and, where necessary, the mark scheme was modified. Any clarifications or changes were carried forward when the same task was re-encountered at a higher stage.
2.1.6 Reporting enquiry skills attainment
The criteria used as the basis for reporting pupil attainment in the 2001 English language survey are re-adopted here. Pupils successfully answering 65% or more of the test questions in a task are classified as 'secure' at the level concerned, ie. as having attained the level. In addition, pupils successfully answering at least 50% of the questions but not as many as 65% are classified as having demonstrated 'basic' enquiry skills at the level concerned, while pupils successfully answering 80% or more of the questions in a task are classified as demonstrating considerable strengths at this level 5. The proportions of pupils in each classification group were first calculated for each task separately, and then averaged over the tasks at a level to produce the national estimates of attainment presented in this chapter.
2.2 Overview of pupils' attainments
2.2.1 The attainment picture across the stages
Table 2.3 provides an overview of attainment at all four stages, in terms of the proportions of pupils meeting the 65% success criterion on the tasks they attempted, averaged over all nine tasks at each level (corresponding figures for the 45 individual tasks are given in Appendix D). Figure 2.1 illustrates the picture.
Table 2.3 Enquiry skills attainment P3 to S2 * (% pupils correctly answering 65% or more items within tasks, averaged over nine tasks at each level) |
Stage | Level A | Level B | Level C | Level D | Level E |
S2 | | | | 59 | 37 |
P7 | | | 70 | 46 | |
P5 | | 77 | 39 | | |
P3 | 75 | 47 | | | |
* Figures show the percentages of pupils demonstrating attainment at the indicated level or higher: approximately 2500 pupils at each stage. |
As Table 2.3 shows, just under half the P3 pupils met the 65% criterion on their Level B task and just under half the P7 pupils did the same for their Level D task. At P5 and S2 the proportions meeting the criterion for their Level C task or Level E task, respectively, were lower, at just under 40%. At the level below, 70-75% of the primary pupils met the criterion compared with a lower 59% for S2.
Let us look now at the finer classification of pupils, which distinguishes 'basic' skill mastery (50% or more items correct), 'secure' skill mastery (65% or more items correct) and demonstration of 'considerable strengths' (80% or more items correct). Table 2.4 presents the findings, and Figure 2.2 illustrates the picture of attainment.

Interestingly, we see from Table 2.4 that the proportions of pupils showing secure attainment or considerable strengths at the highest of the two levels assessed at their stage is fairly stable across stages, falling roughly in the range 35-45%. About 25% of the P3 pupils showed 'considerable strengths' at Level B, with around 15% doing so for the higher level assessed at the other three stages. The proportions showing secure attainment or considerable strengths at the lower of the two levels assessed at their stage are roughly in the range 70-75%, with the exception of S2, where the proportion is lower, at just under 60%.
Table 2.4 Pupils' levels of mastery in enquiry skills * (% pupils classified into mastery bands, averaged over nine tasks at each level) |
| No. pupils | | < Basic | Basic | Secure | Strengths |
S2 - Level E | 2288 | 38 | 25 | 24 | 13 |
Level D | 2273 | 20 | 22 | 31 | 28 |
P7 - Level D | 2302 | 28 | 26 | 30 | 16 |
Level C | 2279 | 11 | 19 | 27 | 43 |
P5 - Level C | 2628 | 34 | 27 | 22 | 17 |
Level B | 2640 | 9 | 14 | 19 | 58 |
P3 - Level B | 2487 | 32 | 22 | 20 | 27 |
Level A | 2481 | 12 | 13 | 29 | 46 |
* '< basic' means fewer than 50% questions answered correctly, 'basic' is between 50% and 64%, 'secure' is 65% to 79%, and 'strengths' is 80%+ |
The P5 pupils showed particularly good performance at Level B, with just under 60% of them showing considerable strengths at this level on their tasks. On the other hand, this is perhaps not surprising given that Level B is the target stage for P4, so that we would expect most P5 pupils to have progressed beyond it. A less positive feature is the rather low proportion of S2 pupils showing considerable strengths at Level D, along with the finding that only just over a third of the S2 pupils demonstrated secure attainment or considerable strengths at Level E, the target level for this stage.

Around 25% of the pupils at each stage showed 'basic' skills at the higher of their two assessed levels. However, around 10% of the primary pupils failed to demonstrate even 'basic' skills at the lowest of the two levels assessed at their stage.
2.2.2 Gender comparisons
Table 2.5 presents the attainment results for boys and girls separately, averaged over the nine tasks at each level. The table shows little evidence of any systematic differences in attainment, other than for a small difference of seven percentage points in favour of the girls for the Level D tasks administered at P7 and S2.
If we look at the picture for individual tasks, we find sample differences in favour of the girls for 29 of the 45 tasks, and in favour of the boys for 15 of the tasks, at one or both stages at which the task concerned was used. The sample differences vary from a single percentage point to 20 percentage points, most being too small to reach statistical significance. The general pattern of difference is illustrated in Figure 2.3.
There were just 11 tasks for which statistically significant gender differences in attainment emerged. Three of these showed attainment differences in favour of the boys and eight showed attainment differences in favour of the girls.
At P3 we have the Level A task 'Travel Then and Now' (comparing stagecoaches with modern types of transport) and the Level B task 'The Romans' (featuring Roman engineering), both relating to People in the past, with differences of 10 percentage points in favour of the girls for the first and 11 percentage points in favour of the boys for the second.
Table 2.5 The enquiry skills attainments of boys and girls (% pupils correctly answering 65% or more of the test questions, averaged over the tasks at each level) |
Stage | Gender | Level A | Level B | Level C | Level D | Level E |
S2 | Boys | | | | 55 | 37 |
Girls | | | | 62 | 39 |
B-G | | | | -7 | -2 |
P7 | Boys | | | 69 | 43 | |
Girls | | | 71 | 50 | |
B-G | | | -2 | -7 | |
P5 | Boys | | 77 | 39 | | |
Girls | | 76 | 40 | | |
B-G | | 1 | -1 | | |
P3 | Boys | 74 | 47 | | | |
Girls | 77 | 46 | | | |
B-G | -3 | 1 | | | |
At P7 there were three tasks, all three from People in society and all with differences in favour of the girls: the Level C task 'School Council' (11 point difference), and the Level D tasks 'Library Closure' (13 point difference) and 'Counting the Cost' (18 point difference).
At S2 there were four tasks, all in favour of the girls: two People in the past tasks - the Level D task 'The Victorians' (20 point difference) and the Level E task 'The Great Plague' (14 point difference) - and two People in society tasks - 'Counting the Cost' (20 point difference), and the Level E task 'Advertising' (16 point difference).
There are clearly topic effects at work here, in expected directions. But there are just two examples of tasks for which the topic effect was strong enough to emerge to the same degree at both stages at which the task concerned was administered: the Level C task 'School Council' and the Level D task 'Counting the Cost', both from People in society and both with attainment differences in favour of the girls (11 percentage points difference at P5 and at P7 in the first case, and differences of 18 and 20 points, respectively, at P7 and S2 in the second).
'School Council' features the election of a P5 pupil to the school council. Its three information sources comprise a set of drawings, illustrating the issues that a school council would legitimately concern itself with, a set of four mini-bios for pupil candidates in the imminent election, and a table giving the election results. The winner of the election is not only a girl but, as one of the bios suggests, is the first girl in some time to represent P5 on the council. 'Counting the Cost' is essentially about the problem of how to supply a family of children with Christmas presents within a given family budget. Its sources are a fictional letter from a mother to her sister, outlining the problem she faces, a set of pictures of toys along with brief descriptions and costs, and two line charts showing family spending and family earning, with a slight increase in earnings in December but a larger increase in spending.

At P5 there were four tasks showing significant gender differences: the Level B tasks 'The Wright Brothers' and 'Columbus', and the Level C task 'The Vikings', all People in the past and all three with differences in favour of the boys (11 percentage points in the first case, 16 percentage points in the second, 12 percentage points in the third); and the Level C task 'School Council', People in society, with a difference in favour of the girls (11 percentage points).
Two other tasks with large gender differences, in favour of the girls, were administered only at S2. These are 'Advertising', which addresses the issue of sexism in advertising, and 'The Great Plague'.
2.3 Summary
The kinds of information retrieval, interpretation and evaluation skills typically required when Carrying out tasks were assessed in this survey using relatively lengthy written tasks, in which three different paper-based information sources served as the basis for a series of test items. A total of 45 tasks were newly developed for use in the survey, spread evenly across five levels and the three social subjects outcomes.
Just under 10000 pupils participated in the written assessment, that is around 2500 at each of P3, P5, P7 and S2. Every pupil was intended to attempt two different tasks at two different levels and from two different social subjects outcomes, and 95% or more did so at each stage. On the basis of their assessment results the pupils were classified as having attained the level (of the task), using the criterion of 65% or more of the test questions correctly answered, or as having shown 'basic' skills at this level (at least 50% of test questions correct, but not as many as 65%), or as having shown 'considerable strengths' at the level (80% or more of the test questions correct).
The findings show that 40-50% of the pupils at each stage can be considered to be working at the higher of the two levels assessed for that stage: around 50% of P3 pupils for Level B, 40% of P5 pupils for Level C, 50% of P7 pupils for Level D (target level for the stage) and 40% of S2 pupils for Level E (target level for the stage). About 25% of the P3 pupils showed 'considerable strengths' at Level B, with around 15% doing so for the higher level assessed at the other three stages. Around 25% of the pupils at each stage showed 'basic' skills at the higher of the two levels. On a less positive note, around 10% of the primary pupils failed to demonstrate even 'basic' skills at the lowest of the two levels assessed at their stage (Level A at P3, Level B at P5, Level C at P7), rising to 20% for S2 (Level D).
While no overall gender differences were apparent in the attainment data, the girls did tend to produce better performances than the boys on most of the individual assessment tasks. In addition, there is clear evidence of topic effects at work, with statistically significant gender differences in favour of the girls for tasks featuring girls or women and social issues, and in favour of boys for tasks featuring men, work, exploration and technology.
« Previous | Contents | Next »