« Previous | Contents | Next »
Listen
Assessment of Achievement Programme: Report
of the Sixth AAP Survey of Science (2003)
2. Knowledge and understanding
2.1 The assessment process
2.1.1 The assessment tasks
The 2000 revision of the
National Guidelines for Environmental Studies
6, that replaced the 'stage band' classification of
science content (P1-P3, P4-P6, P7-S2) with a framework of
strands and level-based attainment targets, has had a seminal
influence on the assessment and reporting of pupils'
Knowledge and understanding attainment within the AAP.
This 2003 survey is in consequence the first national survey to
report pupils'
Knowledge and understanding attainment with reference
to the 5-14 levels.
But this welcome change in practice did not come without
challenges. For the new intention to report attainment by
level, along with the decision to report attainment at more
than one level at each stage, demanded an increase in the scale
of the survey in this area. While just over 200 pencil and
paper
Knowledge and understanding tasks had been
administered in the 1999 Science survey, the 2003 survey needed
almost double this number. A new task development exercise on
this scale was not an option, given the timescale available for
survey preparation, even had all 200+ 'old' tasks been
re-usable. In the event, rather few of the 200+ existing tasks
proved to be re-usable, which complicated matters further. A
task review, carried out during the autumn of 2002, resulted in
the identification of 75 tasks as being suitable for re-use.
Most of the other tasks were not found to be relevant with
respect to the new Guidelines, either because their content was
no longer current or because they could not be classified
unambiguously into 5-14 levels. In some cases tasks were
rejected because their content duplicated that of others.
Fortunately, as noted in the previous chapter, the programme
was able to benefit from a related initiative, in the form of a
SEED-funded project based in the Universities of Aberdeen and
Strathclyde, whose general remit had been to produce a set of
exemplification materials to familiarise teachers with the
intentions behind the new Guidelines for science
7. Among the materials produced by the project team was a
large set of pencil and paper
Knowledge and understanding tasks. The tasks covered
all six 5-14 levels and all three outcomes fairly evenly, and
by design were already level classified. Although the tasks had
not been pre-tested in schools as part of the project, and
there was no time for pre-testing prior to survey use, they
were reviewed for their suitability. Many of the tasks were
adopted unchanged, save for presentational modifications to
conform to the current AAP 'house style', while others were
revised, sometimes being split to produce two or more smaller
tasks. In this way, the whole set of 360 tasks needed for the
survey was produced.
The 360 tasks administered in the survey comprised 60 tasks
at each 5-14 level (A to F), with 20 at each level from each of
the three outcomes: 'Understanding Earth and Space',
'Understanding Energy and Forces' and 'Understanding Living
Things and the Processes of Life'. Around one-sixth of the
tasks had been used in the same or similar form in the 1999 AAP
Science survey, while the remainder, the majority, were drawn
from the set of 'Aberdeen' exemplification tasks. Figures 2.1a,
2.1b, 2.1c, 2.1d, 2.1e and 2.1f reproduce six exemplar tasks
(in reduced size), one at each level and two from each
outcome.
While most tasks would have taken no more than two or three
minutes of pupil time to answer, tasks varied in length, format
and general structure, as Figures 2.1a to 2.1f show, and also
in their mark allocations (see section 2.1.5).
Figure 2.1a Level A - Understanding Living Things
and the Processes of Life

Figure 2.1b Level B - Understanding Earth and
Space

Figure 2.1c Level C - Understanding Living Things
and the Processes of Life

Figure 2.1d Level D - Understanding Energy and
Forces

Figure 2.1e Level E - Understanding Earth and
Space

Figure 2.1f Level F - Understanding Energy and
Forces

2.1.2 The test items within tasks
As Table 2.1 shows, just 44% of the tasks were 'single-item'
tasks, such as that shown in Figure 2.1b. Almost a quarter
(23%) were 2-item tasks, such as the one shown in Figure 2.1f.
Just under a fifth (19%) were 3-item tasks, such as the one
shown in Figure 2.1d. The remainder (14%) were tasks having
between four and seven items/parts each, such as that shown in
Figure 2.1c. Between them, the 360 tasks comprised 759 test
items.
Table 2.1 Single-item and multi-item tasks
No. items | No. tasks | % tasks |
7 | 1 | <1 |
6 | 5 | 1 |
5 | 12 | 3 |
4 | 33 | 9 |
3 | 69 | 19 |
2 | 83 | 23 |
1 | 157 | 44 |
Total | 360 | 100 |
Several different item formats featured in the tasks,
including:
- Multiple-choice
- A variety of multiple-choice
techniques was used, including selecting one answer option
from two or more given options (see Figure 2.1b and the
first item, or gap, in Figure 2.1e), selecting two or more
'correct' statements from five or six, etc. Classification
activities can be included here (see Figure 2.1a). In all,
30% of the 759 items were multiple-choice in format.
- Matching
Occasionally, pupils were asked to match
attributes - see, for example, Figures 2.1c and 2.1d. These
resemble multiple-choice items in some ways, the essential
difference being that the inter-dependence among the
matching items within any one task is high. Around a
quarter of all items were matching items.
- Sequencing
Typically, pupils were presented with a set of
pictures or statements, and required to put these into a
correct sequence, for example to complete a food chain or
to correctly order the stages in a life cycle. Just 6% of
the items were sequencing items.
- Short response
These would typically take the form "Name… …", "How
much…?", "Which …?, pupils responding with a single word or
number. The first item in the task in Figure 2.1f is an
example. One-fifth of the items were short-response
items.
- Open ended
These tended to appear in higher-level tasks,
and generally took the form "Describe…." or "Explain …".
Figures 2.1e and 2.1f (second item) are examples. Some
open-ended items required quite extended responses from
pupils, running to several sentences - for example, "What
is the 'big bang' theory of the origin of the universe"
(Level F). Just under a fifth (18%) of the items were
open-ended, and about a quarter of these required extended
responses.
The ratio of closed-format items (multiple choice, matching,
sequencing) to open-format items (short response, open ended,
extended response) changed in favour of open formats as 5-14
levels increased (see Table 2.2).
Table 2.2 Ratio of closed-format to open-format
items at different 5-14 levels
Level | Total items | % closed-format | %. open-format | Approx. ratio closed-open |
F | 129 | 25 | 75 | 1:3 |
E | 142 | 51 | 49 | 1:1 |
D | 126 | 57 | 43 | 4:3 |
C | 125 | 65 | 35 | 2:1 |
B | 115 | 85 | 15 | 6:1 |
A | 122 | 97 | 3 | 30:1 |
Total | 759 | 62 | 38 | 3:2 |
This changing ratio of closed-format to open-format as
levels increased is an interesting phenomenon to note,
reflecting as it presumably does science educators', or at
least task developers', views about what content-based
assessment in science should properly look like at the
different stages. But it is also very relevant to bear in mind
when the attainment findings presented later in this chapter
are reviewed. This is because there was a very strong
association between format and success rates, with open formats
producing lower attainment on average than closed formats,
partly because of the impact of often substantially higher
non-response rates. Pupils' responses to these types of tasks
were also the most vulnerable to 'transcriber error' (see
section 2.1.4).
2.1.3 Task administration
The ways that the 360 tasks were packaged for administration
in the schools has been described in Chapter 1 (see also
Appendix B). Here we add a little more detail.
The 60 tasks at each level were randomly distributed into 10
sets of six tasks, ensuring only that every set of six
comprised two tasks from each of the three outcomes.
Level-specific task sets were then paired to produce 10 Level
A/B booklets for use at P3, 10 Level B/C booklets for use at
P5, 10 Level C/D/E booklets for use at P7, and 10 Level D/E/F
booklets for use at S2. By design, every booklet contained an
equal number of tasks from each of the three outcomes, and
these were kept in 'outcome blocks'. Within outcome blocks
tasks were presented in order of increasing level.
Every booklet was printed in three different versions,
varying the order of presentation of tasks, so that no specific
tasks suffered from possible test fatigue effects by being
placed towards or at the end of booklets. Any task would then
have appeared an equal number of times at or near the beginning
of the booklet, towards the middle of the booklet, or at or
near the end of the booklet. A single numeracy task was
included in every booklet (see Chapter 4 for the results of the
numeracy assessment).
Thus, in each P3 booklet a pupil would be faced first with
two Level A tasks followed by two Level B tasks, all four
relating to one of the three outcomes, then two Level A tasks
followed by two Level B tasks relating to a second of the three
outcomes, then, perhaps, the numeracy task, and then, finally,
two Level A tasks followed by two Level B tasks relating to the
third outcome.
To assist in attainment comparisons across stages, the Level
B tasks which featured in a particular test booklet at P3, here
mixed with Level A tasks, were transferred into one of the test
booklets at P5, to be mixed with Level C tasks, and so on.
At each stage, up to 10 pupils in each school took part in
the written science assessment (in very small primary schools
fewer than 10 would be available). Each sample pupil was
intended to attempt two different test booklets, and booklet
pairs were allocated at random to pupils before the survey took
place. Provided a school could supply 10 pupils for assessment,
that school was sent ten pairs of booklets for the appropriate
stage, with every booklet appearing twice in the set. Thus, a
maximum of two pupils in any school would attempt any
particular test booklet.
The schools organised their own assessment sessions within
the timescale they were given,
viz. mid-May to mid-June, and they were advised to
organise two separate assessment sessions for their pupils,
with a break between. They had the freedom to organise the two
sessions to take place on the same day or on different days
within the given period. The assessment sessions were
supervised by the pupils' own class teachers, or by another
teacher chosen by the head teacher of the school. The
supervising teacher could explain what had to be done, but was
not allowed to provide answers or confirm that a pupil's
answers were correct. The sessions were not necessarily timed,
but it was expected that they would vary from about 30-40
minutes at P3 to 50-60 minutes at S2. It was assumed that
schools would organise the core skills reading and writing
assessment at the same time - see Chapter 4 for details and
results. Once a school's scripts were completed they were sent
to SEED for processing (see below).
In the event, almost 6000 pupils in around 600 schools
participated in the written survey of
Knowledge and understanding: data were analysed for
1405 P3 pupils in 155 schools, 1463 P5 pupils in 156 schools,
1483 P7 pupils in 156 schools, and 1306 S2 pupils in 130
schools.
2.1.4 Script processing
The pupils' scripts were processed centrally by a team of
undergraduates, during transcription meetings held in June and
July 2003. The transcribers were not required to award marks to
pupils' responses. They simply noted the response options
selected by the pupils, transferring these onto specially
designed response transcription forms, by circling response
options. Where alternative responses were offered to pupils, as
in multiple choice items or 'matching' items, the options were
reproduced on the transcription form. Where items were
open-ended, then whenever possible a set of keyterms was
identified that adequately encapsulated alternative pupil
responses. Where short keyterms were not possible to identify,
then letter codes were offered to transcribers, each letter
code being associated with a particular type of extended
written response; in these cases transcribers were supplied
with accompanying explanatory notes for use during
transcription.
Random checks for consistency were carried out during the
response transcription exercise. In general terms the procedure
was as follows. Typically, 20-30 copies of each transcribed
assessment booklet were newly transcribed "blind" by a second
transcriber, i.e. the second transcriber did not have sight of
the original transcription. The original transcription and the
second independent transcription were then compared and
discrepancies noted. The results are presented in Table
2.3.
Table 2.3 Response transcription
consistency
(Discrepancy rates across 10 booklets per stage*, 20-30
scripts per booklet)
Stage | Task levels | No. item-pupil codes checked | % .discrepancy |
S2 | D/E/F | 10610 | 4.0 |
P7 | C/D/E | 4551 | 2.3 |
P5 | B/C | 7000 | 1.6 |
P3 | A/B | 6954 | 2.2 |
* 5 booklets only at P7
Table 2.3 shows that the discrepancy rate was lowest at the
primary stages, at around 2%, and highest at S2, at 4%. Just
over half the discrepancies were associated with one or two
'problem' items in each booklet, where booklets typically
contained 20-30 items at P3 and P5 (12 tasks) and 30-40 items
at P7 and S2 (18 tasks). As might be expected, in every case
these 'problem' items were of open-ended format, requiring a
degree of subjective interpretation on the part of the
transcribers as they decided which keyterm or code corresponded
most closely with a pupil's written response.
The completed transcription forms were keyboarded by a
professional data processing company, for later machine marking
and analysis.
2.1.5 Marking
With very rare exceptions, test items were allocated a
single mark, and item marking was a relatively straightforward
automated process. Pupils' item responses, as indicated by the
response options, keyterms or letter codes circled by the
transcribers, were matched against the correct answers as
recorded in the system for the items concerned, and marks were
allocated accordingly.
Task marking, however, proved much less straightforward. Had
task marks been the simple sum of item marks, then achievable
task marks would have been the same as the number of items in a
task. In other words, task marks would in principle range from
one to seven, depending on the task. But this wide range of
achievable task marks could pose problems for interpreting
attainment results. This is because attainment results were to
be based on the application of cut-off scores to pupils' mark
totals for tasks at a level. Clearly, tasks with the highest
maximum marks would have more influence on the results than
tasks with the lowest maximum marks, without necessarily any
real justification for this greater importance.
To impose a degree of control on this situation,
task-specific criteria were agreed that, when applied, reduced
the possible mark range to between one and three marks. This
was a compromise policy, intended to accommodate the nature and
variety of the tasks used in the survey.
But how to achieve rational scale mappings? This was the
difficult challenge for subject specialists. The ways that the
six tasks shown earlier were handled will serve to illustrate
how this particular challenge was met.
The 'orbit' task shown in Figure 2.1b, which comprises a
single classical multiple-choice item, was relatively
unproblematic. This task was allocated a single mark.
The classification task shown in Figure 2.1a was processed
as a single-item task, and was also allocated a single mark,
even though three out of six pictured creatures were to be
identified as birds. The reasoning here was that the pupil
would need to be able to identify all three bird members
correctly to have shown the required knowledge and
understanding of bird characteristics, i.e. the concept being
tested.
The matching task shown in Figure 2.1d has three items, and
therefore has a three-mark total in principle. In practice the
task was dichotomously scored. Like the task in Figure 2.1a,
the decision here was that a pupil would need to match all
three circuit components to their correct circuit diagram
symbols in order to have demonstrated the relevant knowledge
and understanding. Thus, three correct matches merited one mark
while fewer than three merited none.
The task in Figure 2.1c produced a different decision. This
task has four items, and therefore four marks in total. The
mapping decision was that a pupil matching all four functions
to the correct body organs deserved three marks, one who
correctly matched three of the four deserved two marks, while a
pupil successfully matching one or two of the four gained one
mark. This was therefore a three-mark task.
The chemical reaction task in Figure 2.1e has two items, the
first a multiple choice item (select the setup which would
produce the fastest reaction) and the second an open-ended
response item (explain your choice). The decision here was that
a pupil would need to make the correct choice of setup
and give a correct explanation for the choice to
deserve a mark: in other words, pupils needed to answer both
items correctly to merit the single mark for the task.
The spring balance task shown in Figure 2.1f proved
different again. This task also has two items, so that the task
mark is in principle also two, as for the reaction task.
However, there is much less dependence between the two items in
this task - indeed the two items could have been presented
quite independently of one another, as two separate single-item
tasks. Therefore, a pupil answering both items correctly
merited two marks, while one correct item merited one mark.
This, then, retained the status of a two-mark task.
2.1.6 Reporting
Knowledge and understanding attainment
Across their two science booklets, each pupil would have
attempted 12 tasks at each of the levels included. Performances
on the 12 tasks at the same level determined pupils' attainment
classifications at that level. The cut-off score criteria
previously identified by English specialists as appropriate for
the purpose, which were used in the 2001 English Language
Survey and again in the 2002 Social Subjects Survey, were
applied here.
Pupils achieving 65% or more of the marks for the 12 tasks
at a particular level in their two booklets are classified as
'secure' at the level concerned
8, i.e. as having attained the level. Pupils achieving 80%
or more of the marks are classified as demonstrating
'considerable strengths' at this level. Pupils
achieving at least 50% of the marks but not as many as 65% are
classified as having demonstrated
'basic' attainment at the level concerned.
The proportions of pupils in each classification group were
calculated for each booklet pair separately, i.e. for each set
of tasks at the same level across a pair of booklets (the data
were weighted during this process, to adjust for imbalance in
sample representation - see Appendix B for details). The
separate booklet results were then averaged to produce the
national estimates of attainment presented in this chapter
(Levels A and B at P3, Levels B and C at P5, Levels C, D and E
at P7, Levels D, E and F at S2).
2.2 Overview of pupils' attainments 2.2.1 The attainment picture across the stages
2.2.1 The attainment picture across the stagesTable 2.4 provides an overview of attainment at all four
stages, in terms of the proportions of pupils meeting the 65%
success criterion on the tasks they attempted at particular
levels, averaged over all booklet pairs. Figure 2.2 illustrates
the picture.
Table 2.4 'Secure'
Knowledge and understanding attainment
* P3 to S2
(% pupils achieving 65% or more of the marks for 12
tasks at a level, averaged over booklet pairs: 1300-1500
pupils at each stage)
| Level A | Level B | Level C | Level D | Level E | Level F |
S2 | | | | 20 | 10 | <1 |
P7 | | | 37 | 7 | <1 | |
P5 | | 75 | 26 | | | |
P3 | 76 | 54 | | | | |
* Margins of error for the estimated proportions vary
between 11/2 and 2 percentage points.
As Table 2.4 shows, three-quarters of the P3 pupils were
deemed to be working at Level A or higher, with just over half
of all the pupils working at Level B or higher. At P5,
three-quarters of the pupils were working at Level B or higher,
and a quarter of all the pupils were working at Level C or
higher.
Just over a third of the P7 pupils were classified as
working at Level C or higher, fewer than 10% of all the pupils
were classified as working at Level D or higher, and a mere
handful of pupils (fewer than 1%) showed sufficiently good
performances to be classified as working at Level E. Just
one-fifth of the S2 pupils were classified as working at Level
D, 10% at Level E, and a handful only (fewer than 1%) at Level
F.
Let us look now at the finer classification of pupils, which
distinguishes 'basic'
Knowledge and understanding (50% or more of the marks
achieved), 'secure'
Knowledge and understanding (65% or more of the marks
achieved) and demonstration of 'considerable strengths' (80% or
more of the marks achieved). Table 2.5 presents the findings,
and Figure 2.3 illustrates the picture of attainment.
Figure 2.2 'Secure'
Knowledge and understanding attainment P3 to S2
*

* Each bar shows the percentage of pupils demonstrating
attainment at the level indicated or higher: 1300-1500
pupils at each stage.
Table 2.5 Pupils'
Knowledge and understanding attainment
(% pupils classified into each band*, averaged over the
task sets at each level)
Stage | Pupils | Level | < Basic | Basic | Secure | Strengths |
S2 | 1306 | F | 95 | 5 | 0 | 0 |
| | E | 75 | 15 | 8 | 2 |
| | D | 56 | 24 | 15 | 5 |
P7 | 1483 | E | 95 | 5 | 0 | 0 |
| | D | 77 | 16 | 6 | 1 |
| | C | 36 | 27 | 25 | 12 |
P5 | 1463 | C | 49 | 25 | 19 | 7 |
| | B | 10 | 15 | 22 | 43 |
P3 | 1405 | B | 25 | 21 | 31 | 23 |
| | A | 10 | 14 | 29 | 47 |
* '< basic' means fewer than 50% of marks achieved,
'basic' is between 50% and 64%, 'secure' is 65% to 79%, and
'strengths' is 80%+
Noteworthy features in the attainment data shown in Table
2.5 are the high proportions of pupils at P3 and at P5 who
performed sufficiently well on their task sets (80% or more of
marks achieved) to be classified as having considerable
strengths at the lower of the two levels at which they were
assessed - Level A at P3, Level B at P5. Almost a quarter of
the P3 pupils also showed considerable strengths at Level B.
This is in contrast to the situation for Levels D and E at P7
and S2, where fewer than 5% of the pupils performed so
well.
When reflecting on these results, readers should remember
that the proportions of open-ended format items increased with
increasing task level (see Table 2.2), from 3% at Level A and
15% at Level B, through 35% at Level C, 43% at Level D and 49%
at Level E, to fully 75% at Level F. The older the pupils,
therefore, the less benefit they received from information
support in their tasks, the more frequently they had to show
evidence of their knowledge and understanding of science
through the medium of writing (and from the evidence given in
Chapter 4, pupils' writing skills were not well developed in
general), and the more likely they were not to respond at all
to the questions asked (non-response rates rose to 80-90% for
some of the open-ended tasks). All these factors will have
contributed to the apparently lower attainments at P7 and S2
compared with P3 and P5.
Figure 2.3 Pupils'
Knowledge and understanding attainment
(% pupils classified into bands*, averaged over task sets at each level)

Since each individual pupil attempted just four tasks from
each outcome at any specific level (two in each booklet), it
would be of very questionable value to produce level attainment
figures by outcome using the usual cut-off score strategy. We
look therefore at average task scores for evidence of
similarity or difference.
On the evidence of average task scores, there were no
significant performance differences between the three outcomes.
The average percentage scores across the 120 tasks representing
each outcome in the survey (all levels and stages combined)
were: 44% for 'Understanding Energy and Forces', 42% for
'Understanding Earth and Space', and 43% for 'Understanding
Living Things and the Processes of Life'.
2.2.2 Gender comparisons
Table 2.6 presents the level-based attainment results for
boys and girls separately, averaged over the task sets at each
level. While the table shows some small sample differences in
one direction or the other, these are not statistically
significant. The general picture is one of gender similarity.
If we look at the three outcomes, however, we do find that
although the sample differences are extremely small, they are
in expected directions (see Table 2.7).
Table 2.6 'Secure' Knowledge and understanding
attainment*: by gender
(% pupils achieving 65% or more marks for 12 tasks at a
level, averaged over booklet pairs)
| | Level A | Level B | Level C | Level D | Level E | Level F |
S2 | Boys | | | | 22 | 10 | <1 |
Girls | | | | 17 | 9 | <1 |
B-G | | | | 5 | 1 | 0 |
P7 | Boys | | | 38 | 7 | <1 | |
Girls | | | 36 | 6 | <1 | |
B-G | | | 2 | 1 | 0 | |
P5 | Boys | | 75 | 29 | | | |
Girls | | 75 | 23 | | | |
B-G | | 0 | 6 | | | |
P3 | Boys | 74 | 52 | | | | |
Girls | 77 | 56 | | | | |
B-G | -3 | -4 | | | | |
*Figures show the percentages of pupils demonstrating
attainmentat the indicated level or higher: 650-750 pupils per gender
at each stage.
Table 2.7 Gender and outcome: average task
scores
(average percentage task score over 120 tasks in each
outcome - all levels and stages combined)
| Boys | Girls |
Energy and Forces | 45 | 43 |
Earth and Space | 43 | 42 |
Living Things and the Processes of Life | 43 | 44 |
2.2.3 Change over time
The continuing AAP strategy for exploring the issue of
change over time is to rely on comparisons of pupils'
attainments on 'common' tasks, i.e. on tasks used in identical
form in two or more surveys, as the basis for comment. This
same strategy was used again on this occasion, to compare
pupils' attainment in 2003 with their attainment in 1999. But
the strategy could only usefully be implemented at P7 and S2,
given that this was the first time that P3 and P5 pupils had
been assessed in an AAP Science survey. Moreover, the new need
to offer attainment results with reference to the 5-14 levels
has resulted in a 'common task' exercise at P7 and S2 on a very
modest scale.
The level-based attainment framework for
Knowledge and understanding in science was introduced
in the 2000 revision of the National Guidelines for
Environmental Studies, too late for implementation in the 1999
AAP Science survey. In that survey, therefore, tasks had been
classified by stage band (P1-P3, P4-P6, P7-S2) rather than
level. It has been noted earlier that when the tasks were
reviewed in preparation for the 2003 survey rather few - just
60 - were found to have continuing content relevance and to be
uniquely classifiable into appropriate 5-14 levels. In the
event, just 50 tasks at Levels C, D or E were considered
appropriate for re-use in unchanged form. Table 2.8 records the
performances of P7 and S2 pupils on these 'common' tasks in
1999 and in 2003.
Table 2.8 Average facility values for re-used tasks
at P7 and S2
Stage | Year | Level C (17 tasks) | Level D (12 tasks) | Level E (18 tasks) |
S2 | 2003 | | 46 | 43 |
1999 | | 50 | 42 |
P7 | 2003 | 49 | 35 | 25 |
1999 | 53 | 35 | 24 |
The attainment comparisons in Table 2.8 are based on a total
of 47 tasks that were administered in identical form
and marked in identical ways in both surveys: 17 tasks
at Level C (P7 only), 12 at Level D and 18 at Level E. For the
purpose of the comparison all the tasks were dichotomously
marked, and pupils needed to answer the task completely
correctly to achieve the mark.
On the basis of these small and rather arbitrary sets of
assessment tasks, we can say that there is no evidence of
attainment change over the period. The slight differences in
average task facilities at Level C for P7 and Level D for S2
are not statistically significant.
2.3 Summary
Just under 6000 pupils in around 600 schools participated in
the written science assessment, that is around 1300-1500 pupils
at each stage. In total, 360
Knowledge and understanding tasks were administered to
these pupils, 60 per level (A to F) and 120 from each of the
three outcomes. The majority of pupils attempted two different
test booklets, between them containing 12 tasks from each of
two or three levels.
On the basis of their assessment results on the 12 tasks at
a level, pupils were classified as being 'secure' at the level
(using the criterion of 65% or more of the marks achieved on
tasks at the same level), or as having shown 'basic' knowledge
and understanding at the level (at least 50% of marks achieved,
but not as many as 65%), or as having shown 'considerable
strengths' at the level (80% or more of the marks
achieved).
Three-quarters of the P3 and P5 pupils were classified as
being secure or showing considerable strengths at Levels A and
B, respectively. Just over half the P3 pupils were similarly
classified at Level B, compared with a quarter of the P5 pupils
for Level C. Just over a third of the P7 pupils were classified
as secure or showing considerable strengths at Level C, while a
fifth of the S2 pupils were similarly classified at Level D. At
most 10% of the P7 and S2 pupils were secure at the next level
up, i.e. Level D for P7, Level E for S2 - the target levels for
these stages. Virtually no P7 or S2 pupils produced evidence of
'secure' attainment or considerable strengths at Levels E and
F, respectively.
Looking at 'basic' levels of attainment and 'considerable
strengths', we see a similar picture. While almost half of the
P3 and the P5 pupils showed considerable strengths at Levels A
and B, respectively, few, if any, of the P7 and S2 pupils
produced such high performance at the levels assessed at their
stages, and indeed high proportions failed to show even 'basic'
attainment at their target levels (75-80% achieved fewer than
half marks).
Contributory factors to this picture of lower achievement at
the higher stages are the markedly higher proportions of
open-format items that featured in the tasks at Levels D, E
and, particularly, F. Such formats demand an additional appeal
to writing ability in order to show evidence of science
knowledge and understanding, they often lead also to high
non-response rates, and response assessment is vulnerable to
varying degrees of marker, or transcriber, subjectivity.
There was no evidence in the survey data of important,
consistent gender differences in attainment in science overall
- on the contrary, the general picture is one of similarity.
But there were small sample differences in expected directions
for the three outcomes: marginally in favour of the boys for
'Energy and Forces' and for 'Earth and Space' and marginally in
favour of the girls for 'Living Things and the Processes of
Life' (none of the very small differences reached statistical
significance).
On the basis of a rather arbitrary and small set of 'common'
tasks, i.e. tasks used in the same form and marked in the same
way in 1999 and 2003, the survey has produced no evidence of
any change in P7 or S2 attainment since 1999 (P3 and P5 were
assessed for the first time in 2003).
« Previous | Contents | Next »