« Previous | Contents | Next »
Listen
Assessment of Achievement Programme:
Report of the Sixth AAP Survey of Science (2003)
Appendix B: Sampling, task distribution and attainment
estimation
B.1 School and pupil sampling
The 2003 science survey was designed to assess the
science and core skills attainment pupils at P3, P5, P7 and
S2 in mainstream schools in Scotland - including
educational authority, self-governing, grant-aided and
independent schools. Special schools and Gaelic medium
schools were excluded from the sampling frame.
Representative pupil samples were selected for testing
using two-stage proportionate stratified sampling, with an
overall sampling fraction of just over 5% of the pupil
population. Separate school samples were drawn for the four
pupil stages. Before school sampling began, the population
of maintained schools was first stratified by education
authority grouping, roll size and percentage free school
meals entitlement. The 32 education authorities were
classified into four groups for this purpose, based on
their general population densities (see Table B.1). Schools
were grouped into two size bands (primary schools: under
280 pupils on roll, and 280 or more on roll; secondary
schools: under 150 S2 pupils on roll, and 150 or more S2
pupils on roll) and two bands for free school meals
entitlement (primary schools: <10%, and 10% or more;
secondary schools: <15%, and 15% or more). School size
and free school meals entitlement classifications were
based on the most recent school census data available at
the time,
viz. census data at September 2001 and at January
2002, respectively. Independent schools formed a separate
national stratum.
Table B.1 Education authority
groupings
(Based on general population density)
Group 1 | Group 2 | Group 3 | Group 4 |
Aberdeen City | East Dumbartonshire | Clackmannanshire | Aberdeenshire |
Dundee City | East Renfrewshire | East Ayrshire | Angus |
Edinburgh City | Falkirk | East Lothian | Argyll & Bute |
Glasgow City | Inverclyde | Fife | Dumfries & Galloway |
| North Lanarkshire | Midlothian | Eilean Siar |
| Renfrewshire | North Ayrshire | Highland |
| West Dumbartonshire | South Ayrshire | Moray |
| West Lothian | South Lanarkshire | Orkney Islands |
| | | Perth & Kinross |
| | | Scottish Borders |
| | | Shetland Islands |
| | | Stirling |
Schools were selected from within strata, without
replacement, with probability proportional to stage size.
At each stage around 200 schools were selected and invited
to participate in the survey, and 70-80% agreed to do so
(see Table B.2).
The sample pupils were selected in a second stage of
sampling, from within those schools that had agreed to
participate in the survey. Wherever possible, i.e. in those
schools with sufficient numbers of pupils available in the
stage concerned, 22 pupils were randomly selected within
each survey school - 10 for involvement in the assessment
of
Knowledge and Understanding in science (and in the
assessment of numeracy), six for involvement in the
assessment of reading (and writing at P5, P7 and S2), and
six reserve pupils, to act as substitutes for pupils absent
on the assessment days. In composite classes, only pupils
at the relevant stage were selected.
Table B.2 School participation
| P3 | P5 | P7 | S2 |
Schools invited to participate | 214 | 212 | 213 | 191 |
Schools agreeing to participate | 164 | 162 | 167 | 139 |
Schools returning completed test
booklets | 150 | 156 | 156 | 130 |
% participation rate for science among
invited schools | 70 | 74 | 73 | 68 |
Schools eligible for reading
assessments* | 125 | 131 | 136 | 130 |
Schools returning completed test
booklets | 99 | 93 | 114 | 130 |
% participation rate for reading among
eligible agreeing schools | 79 | 71 | 84 | 100 |
*These were schools that had sufficient sample
pupils to participate in both the science assessment
and the reading assessment
Where schools had too few pupils available at the
relevant stage to supply 10 for science assessment and six
for reading assessment then science took priority. In other
words, where schools had fewer than 16 pupils available in
the relevant stage, 10 were identified at random for
science assessment and the remainder, fewer than six, would
then take reading tasks. Where schools had 10 or fewer
pupils available at the relevant stage all of these were
identified for involvement in science assessment, and none
would do reading. There were thus some schools in the
survey sample that would take part in science assessment
only.
Where pupils with special educational needs were
selected in school samples, these were included in the test
sessions at the head teacher's discretion.
In a subset of the schools the 'science' pupils also
took part in the assessment of practical investigation
skills, while the 'reading' pupils took part in the
assessment of ICT skills or participated in focus group
discussions exploring their informed attitudes in science.
Although the 'practical' schools were drawn from across the
country, they were not selected entirely at random: two
important criteria for involvement were (i) that the school
should have sufficient pupils at the stage concerned to
justify a day visit by two field officers, and (ii) that it
should be within easy travelling distance of the field
officers' home bases. In the event, just over half of the
primary schools that agreed to participate in the survey
were involved in the practical assessments (87 at P3, 85 at
P5 and 94 at P7), as were around two-thirds of the
secondary schools (90 schools at S2).
B.2 Task distribution and achieved sample
sizes
Science and numeracy
In order to assess pupils'
Knowledge and understanding in science and to
report attainment in terms of the 5-14 levels, 360
different pencil and paper single-level tasks were
administered in this survey. These comprised 60 tasks at
each of Levels A to F, with 20 from each outcome (Earth and
space, Energy and forces, Living things and the processes
of life). Task administration followed a multiple matrix
sampling strategy.
At each stage 10 different
Knowledge and Understanding booklets were prepared
for survey administration, by randomly allocating tasks to
booklets to meet a given booklet specification. The ten
booklets were paired into ten different booklet pairs, and
booklet pairs were allocated randomly to the sample pupils
in each school. Thus every 'science' pupil attempted, or
was intended to attempt, two different test booklets, with
every booklet eventually attempted by similar numbers of
pupils in similarly representative pupil subsamples. In any
one school at most two pupils would attempt the same
booklet.
Since there was no information available beforehand
about the likely level that each pupil was currently
working at in science, it was not considered appropriate to
create single-level test booklets and to place these in
front of randomly selected pupils. Every booklet therefore
contained tasks from at least two different levels: A and B
at P3, B and C at P5, C, D and E at P7, D, E and F at S2.
Every booklet also contained a balanced spread of tasks
across the three outcomes. To facilitate attainment
comparisons across stages, the Level B tasks which featured
in a particular test booklet at P3, mixed at this stage
with Level A tasks, were transferred into one of the test
booklets at P5, to be mixed with Level C tasks, and so on.
Within booklets, tasks were grouped by outcome, and within
outcome blocks lower level tasks were presented before
higher level tasks. A single numeracy task was placed in
every booklet, at the end of one of the outcome blocks.
Every booklet was printed in three different versions,
simply by varying the order of presentation of outcome
blocks, to minimise any possible fatigue effects on any
particular tasks.
At P3 and P5, each booklet contained 12 different
science tasks, two per level from each outcome, plus a
numeracy task, and was expected to take 30-40 minutes to
complete. At P7 and S2, each booklet contained 18 science
tasks, again two per level from each outcome, plus a
numeracy task, and was expected to take 50-60 minutes to
complete.
Almost 6000 pupils in around 600 schools participated in
the written assessment of science
Knowledge and understanding: response data were
analysed for 1405 P3 pupils, 1463 P5 pupils, 1483 P7 pupils
and 1306 S2 pupils. At each stage these figures represent
between 2% and 2_% of the pupil population. Each test
booklet, and therefore every assessment task, was attempted
by around 270 pupils at P3, by around 290 pupils at P5 and
P7, and by around 250 pupils at S2.
Reading and writing
There were 15 reading tasks in total, three at each of
Levels A to E. Each task comprised a text and associated
test questions, and was expected to take the same time to
complete as a science booklet at the stage concerned. At
Levels C, D and E reading tasks were accompanied by
associated writing tasks. An individual reading task, where
relevant in company with its linked writing task, was
presented to pupils as a single test booklet.
Again, a multiple matrix sampling scheme was employed to
allocate tasks to pupils. The tasks to be administered at a
particular stage were paired into six different pairs in
such a way that every pair comprised tasks from two
adjacent levels. Task pairs were then randomly allocated to
the pupils in each school that had agreed to participate in
the survey and that had pupils available for reading
assessment. In this way every task would have been
attempted by similar numbers of pupils across the survey,
in similarly representative subsamples, and no more than
two pupils would attempt the same task in any particular
school.
In total, reading assessment data were analysed for 2564
pupils (586 at P3, 541 at P5, 665 at P7 and 772 at S2) and
writing assessment data were analysed for a total of 1957
pupils (521 at P5, 680 at P7, 756 at S2). At each stage
these figures represent 1% to 1_% of the pupil population.
Each task was attempted by around 185 pupils at P3, 175 at
P5, 215 at P7 and 245 at S2.
Investigative skills in science, ICT skills and
informed attitudes
Nine investigation tasks were administered in the
survey, along with six ICT tasks. In addition, the 148
field officers who carried out the practical assessments
also animated a total of 647 focus group discussions with
pupils. As usual, tasks and focus group participations were
allocated to pupils at random.
In the majority of the schools that participated in
assessment in this area, eight pupils were assessed for
their science investigation skills and a further eight for
ICT skills. Performance data were analysed for a total of
2635 pupils for science investigations and 2611 pupils for
ICT: 609 and 615, respectively, at P3; 619 and 625,
respectively, at P5; 710 and 697, respectively, at P7; 697
and 674, respectively, at S2. The number of pupils who
undertook any particular task varied between 150 and 450,
depending on the subject, stage and task. Among the 647
focus groups that were rated for informed attitudes, 80-90
groups discussed one or other of the two topics that
featured at P5, P7 and S2, while 171 groups discussed the
single topic that featured at P3.
B.3 Attainment estimation
In science and reading total scores were first computed
for pupils, for each of their level-based 'tests': 12 tasks
at a level in science, offering total maximum marks of
12-14 at P3 and P5 and 20-30 at P7 and S2, and one task at
a level in reading, offering total marks of between 18 and
35 marks, depending on the level. Cut-off scores were then
applied, and pupils classified into one or other of three
attainment groups on the basis of these: 'basic skills',
'secure attainment' or 'considerable strengths'
11. The proportions of pupils classified into the three
groups at relevant levels were calculated separately for
every booklet pair in science and for every reading task,
with the attainment data weighted appropriately to adjust
for imbalances in sample representation caused by the
non-participation of some schools. The resulting
proportions were then simply averaged over pairs of science
booklets (ten pairs per stage) or reading tasks (three per
level) to produce the population attainment estimates
reported in Chapters 2 and 4, respectively.
Margins of error for the attainment estimates arising
from a single booklet pair in science would be a maximum of
around six percentage points, reducing to a maximum of
around two percentage points for the final averaged
population estimates at a level. Margins of error for the
attainment estimates deriving from a single reading task
would be a maximum of around seven percentage points,
reducing to a maximum of around four percentage points for
the final population estimates at a level. It should be
noted that these figures cannot take account of any
measurement error that will have arisen from the possible
incorrect classification of individual pupils, for some of
whom the decisions made might have been different had the
pupils concerned been assessed on a different day or on the
same day with a different reading task or pair of science
booklets (test reliabilities - alpha values - are typically
in the range 0.7-0.8 for the 12-task science 'tests', and
0.7-0.9 for each reading task). Neither do they take
account of the measurement error that will have arisen from
the fact that the tasks used in this survey are merely
representative of all the similar tasks that might have
been developed and used in their place.
In the case of writing, practising teachers evaluated
pupils' scripts and allocated level judgments. As always
with extended writing, judgments of quality were subjective
to some extent, as the inter-rater agreement study
described in Chapter 4 confirms: the average inter-rater
agreement rate when applying a 'best fit' evaluation scheme
was 40%. With this in mind, the resulting writing
attainment data have been presented in Chapter 4 as sample
statistics only.
Given the nature of the practical assessment tasks -
which were novel in nature and which did not lend
themselves to pupil classification by level - no attempt
has been made to produce weighted estimates of practical
skills attainment on this occasion. School and pupil
questionnaire findings are also presented in this report as
sample statistics rather than formal population
estimates.
« Previous | Contents | Next »