« Previous | Contents | Next »
Listen
Section 3: Evaluation
In this section we first consider some general issues
regarding the relationship between policy making and
evaluation (3.1) then go on to describe the approaches
adopted by the external evaluations of the demonstration
projects (3.2), the internal monitoring and evaluation work
(3.3) and finally consider the relationship between the two
(3.4).
3.1 Approaches to evaluation
Evaluation is a form of applied research concerned with
assessing the results, impacts and/or outcomes achieved by
some form of intervention (whether this be a project, a
programme, an institution or a policy) in order to inform
judgements about that intervention. While there are many
different types of evaluation and methodologies,
essentially there are two broad approaches:
- Evaluations that are concerned with
proving effectiveness; they are concerned with
the achievement of aims/objectives and impact/outcomes
and with explaining success/failure; these are
analysis-oriented and framed within agendas concerned
with accountability and knowledge-building
- Evaluations that are concerned with
improving the implementation of a programme or
policy, or strengthening institutions, communities or
networks; these are action-oriented and often framed
within agendas concerned with development and
empowerment.
(Chelimsky, 2001; Stern, 2004)
Combining these two approaches within a single
evaluation can lead to tensions since not only will the
basic purpose of the evaluation differ, but there are also
differences in the epistemological and methodological
approaches adopted, and the relationships between the
evaluators and those involved in programme implementation
(independent observation and analysis versus active
engagement).
One major consideration in the selection of appropriate
evaluation approaches and designs is the stage of the
programme or policy cycle - planning, development,
implementation. The alignment of the evaluation focus with
the stage in the policy/programme cycle is formalised
within many evaluation frameworks. For example, the
European Commission differentiates between ex ante
evaluation (sometimes called 'appraisal') and impact
assessment, which are undertaken at the development stage,
and interim and ex post evaluation, which are undertaken
once the intervention has been implemented (European
Commission, 2003). Within the current
UK (evidence-based) policy context, and
certainly within public health/health improvement, there is
also an emphasis on the use of systematic reviews of
results from evaluation studies to inform the planning
stage in the programme/policy cycle and to ensure that
proposals are evidence-based. This is formalised within the
HEBS (now
NHS Health Scotland) evaluation
framework for health promotion (
HEBS, 1999; Wimbush & Watson,
2000).
Implementing this kind of systematic, staged approach to
evaluation can be difficult. Many government programmes are
funded with the expectation of immediate implementation and
delivery within a fixed, e.g. three year period, with
little allowance for an initial developmental/design phase.
One implication is that evaluations of programme
effectiveness are commissioned at too early a stage. There
is seldom consideration of the need for an initial phase of
formative evaluation while the programme is being developed
and set up so that interventions can be piloted for their
feasibility, acceptability and likely effectiveness within
a particular local context, and thereafter adjusted to
improve their chances of effectiveness. This exactly
describes the situation in which the
DPs, and their evaluations,
developed.
3.2 External evaluation
Among the bids for external evaluation, two contrasting
models were proposed. One of these, stemming from the
clinical trial and health services research tradition,
employed an objective, 'hands-off' approach to evaluate the
outcomes from
HR, while the other employed a much more
collaborative developmental and 'hands-on' approach, using
Theories of Change (ToC), to first clarify the objectives
of the
SW and
HaHP projects and then monitor progress
towards these objectives. All evaluations used a
quasi-experimental research design as one component.
The adoption of contrasting approaches to evaluation
permits an assessment of the strengths and weaknesses of
each approach and some of the problems from the viewpoints
of both the
DPs and the evaluation teams in adopting
one model rather than another.
3.2.1 Healthy Respect: evaluation
design
In preparing their evaluation of
HR, the external team (
HR/E) included the top line aims and
objectives of
HR to improve sexual health for young
people in Lothian, each generating a number of
'pre-specified' hypotheses. The evaluation focussed on (a)
sexual health outcomes of young people in Lothian by
reference both to routine sexual health data (e.g. teenage
conception and termination rates) and survey data on
secondary school pupils' sexual health knowledge, behaviour
and uptake of services (b) the organisation and performance
of interagency partners in the provision of sexual health
services and (c) the implementation and process of each
HR component project. The evaluation
would test the following hypotheses relating to the first
two objectives (a)
HR would impact on attitudes and
behavioural change (better communication with
parents/teachers on sexual health issues, reduction in
proportion having underage sex, and increased knowledge and
(reported) use of condoms), service access, acceptability
and uptake; reduce conception/abortion rates and increase
rates of Chlamydia testing (b)
HR would increase interaction and
networking between service providers to the perceived
benefit of clients. The third objective (c) focussed on the
processes by which these changes were hypothesised to
occur.
To address these objectives/hypotheses,
HR/E proposed the following evaluation
to be conducted over 4 years from November 2000 to November
2004: (a) a quasi-experimental 'before-after design'
comparing young people in the intervention area (Lothian)
with a comparison area (Grampian), selected both for
practical reasons and to represent another East coast
region (thus avoiding west of Scotland cultural and
religious complications), to be supplemented by qualitative
data derived from focus groups. In the case of school
surveys, representative samples of S3/S4 pupils in each
region would be selected and power calculations (based on
responses to the Scottish
WHOHBSC [Todd et al., 1999]) indicated that
around 2000 boys and 2000 girls were required in both
intervention and comparison areas to demonstrate an effect
(e.g. on underage sex); (b) a mapping exercise of sexual
health and related services combined with interviews with
agency personnel. To address objective (c),
HR/E proposed qualitative (e.g.
interviews and focus groups with clients and service
providers) methods, each designed to identify process
measures and implementation of the individual projects.
HR/E were clear from the outset that
with multiple concurrent initiatives, it would not be
possible to attribute an intervention effect on
population-level outcomes to individual components of
HR.
3.2.2 Healthy Respect: a 'hands-off' approach
to evaluation
The underlying model of the
HR evaluation was therefore based on the
assumption of an intervention with fixed overall
HR aims and objectives, from which
specific hypotheses could be formulated and which were
testable by reference to a design combining a
quasi-experimental method to identify an intervention
effect on specified outcomes, with various qualitative
methods used to illuminate processes which might bring this
about. In this model, the independence of the evaluators is
imperative with contact between the evaluators and
implementers kept to a minimum for data collection
purposes. The evaluation team do not seek to influence the
direction or development of the intervention, nor is the
evaluation open to influence by the project team. While
consensual and collaborative working with the
HR team is necessary to obtain data, the
relationship is otherwise non-interactive. In the revised
proposal (August 2000),
HR/E specifically identified that one of
their goals was to 'avoid contamination of
HR' (p.6). Thus,
HR/E have not fed back interim results
to
HR on the grounds that this might
influence the direction of the project. An alternative
approach such as theory of change was described as being
outwith the approach to
HR evaluation.
The adoption of the 'hands-off' approach to evaluation
used by
HR/E seeks to provide an unbiased test
of the initial hypotheses. However, there are potential
disadvantages:
- It assumes that the demonstration and component
projects' objectives are stable. Any change in
objectives, or failure to implement them, makes the
project less evaluable. Although identification of such
changes is one of the reasons for the qualitative
process evaluation and provides evidence about the
extent to which the intervention was implemented
according to plan, the shifting emphasis of
HR towards process rather than
outcome measures was identified as a major problem for
HR/E. It was perceived as a
departure from their original understanding of
HR's preparedness to be tested for
impact and effectiveness. (4.11.03).
- It assumes objectives are clearly articulated and
communicated. This was not the case in the early stages
of
HR, both the project objectives and
management continuity being unstable for at least 8
months from August 2000. Against this background,
HR/E drew up a Memorandum of
Agreement to agree objectives and mechanisms for
process evaluation, roles and responsibilities and
respective intellectual property right areas. The
Memorandum was consistent with the commissioned
evaluation (August 2000) in not offering interim
feedback
- The lack of feedback from the evaluation to
HR was experienced as frustrating by
project staff, leaving them feeling they lacked
guidance and any sense of whether or not they were
achieving their objectives.
- The 'hands-off' model of evaluation also runs the
risk of generating a gap in expectations between
internal and external evaluation teams. In the early
stages of
HR, the perception that
HR/E had not given sufficient
attention to self-esteem led the project manager to
develop a separate research proposal to address the
issue, which was not subsequently funded. There were
many other examples of
HR component projects undertaking
research (often of dubious quality) to fit in with a
culture of evaluation internal to
HR. One consequence of this is that
internal and external evaluations may produce disparate
process findings.
3.2.3 Healthy Respect: assessment
The external evaluation involves several components
including health outcomes of teenage pregnancy, compliance
with national recommendations for the detection and
management of Chlamydia in the context of the National
Chlamydia
SIGN Guideline Audit and service
provision for
STIs. Central to it is a repeat
cross-sectional survey of sexual health knowledge,
attitudes and (reported) behaviours among S3/4 secondary
pupils in intervention and control schools (Lothian and
Grampian respectively). The first survey, involving 10
Lothian (2760 pupils -80%) and 5 Grampian (1501 pupils -
83%) schools, was completed between September and December
2001, the second in 9/10 Lothian and all 5 Grampian schools
completed during the same period in 2003.
In principle, this design should be able to address the
key hypotheses of the evaluation even allowing for the risk
of contamination between areas, and the difficulty of
causal attribution of effects to the intervention given the
multi-faceted nature of
HR. The surveys appear to have been
conducted efficiently with good response rates within
schools (although absentees were not followed up). However,
in several respects the design was less than optimal:
- The original
HR/E objective (August 2000) to
compare representative samples of schools (and pupils)
was not achieved, and in the case of Lothian was not
achievable because
HR was only operating in selected
Lothian schools. The ten intervention schools that
'signed up' to
HR were self-selected, i.e. were
'volunteers'. They may represent those most committed
to sexual health education, potentially biasing
estimates of the intervention effect.
- The original aim for the control sample was to
identify ten schools in Grampian matched by size,
rurality and level of deprivation. In the event, only
five of the 17 selected and invited secondary schools
in Grampian, agreed to take part. These schools may not
be representative of Grampian schools, nor were they
well-matched controls for Lothian schools. This may
compensate for the bias in the Lothian sample, but the
degree of under or over compensation is impossible to
estimate, making interpretation of results difficult
and generalisation risky. The potential selection bias
and hence representativeness of the schools is an issue
that will be addressed explicitly in the final
report.
- The reduction in sample size caused by school
recruitment problems necessitated a re-assessment of
the power of the sample to detect specified effect
sizes. The original estimates (above) were 2000 pupils
of each sex in both intervention and control areas, the
response to the first survey indicating this was not
achieved, especially in Grampian. The Progress Report
(October 2002) notes, however, that the achieved sample
sizes of the first survey based on the revised sample
estimate (using a new 2:1 Lothian/Grampian ratio),
conducted before the first survey, 'continued to allow
detection of an effect size of 4% to 5%' (p.3). This
was based on the revised power calculation agreed with
the
CSO, assuming a 4-5% difference in
prevalence of an outcome such as reported experience of
sexual intercourse given a baseline prevalence of
approximately 20%.
We do not yet have any results from
HR/E on the school-based intervention.
It seems likely, however, that the problems of school
recruitment may make these results difficult to interpret.
If there is no difference between Lothian and Grampian
pupils (controlling for confounders), it could be
attributable to a number of factors. If there is, it may
not be generalisable to a wider population because of
potential biases introduced by the schools selected.
3.2.4 Healthy Respect: process
evaluation
Process evaluation was a limited part of the original
research proposal from
HR/E, but was extended in the revised
version at the request of the Scottish Executive. It
includes:
- Assessing the effectiveness of interagency working
using descriptive before-and-after inventories and
mapping of service provision, partnerships and networks
and through observation of professionals' contacts and
activities through diary keeping.
- Describing the implementation process of
HR's 12 component projects by
identifying key process indicators of implementation
for each of the projects. The views of projects'
clients and providers are sought to identify best
practice, perceived impact and acceptability.
The process evaluation was intended to be used to
identify, understand and interpret any observed changes in
outcome measures, making associations between process and
outcome where strict attribution is not possible. While the
process evaluation and context mapping helped keep the
evaluation team up to date with the evolution of the
demonstration project, the team sought to maintain their
independent position by avoiding feedback from the process
evaluation.
So far, at the time of writing, there have been no
reports from the process evaluation of
HR. The contribution of this element of
the Healthy Respect evaluation to understanding causal
attribution is likely to be weak without an overall
programme theory to make the links between goals,
individual project activities and outcomes. The process
evaluation will only aid in assessing the effectiveness of
each component project in terms of their own objectives,
against a set of 7 criteria derived from literature on good
practice on health promotion.
3.2.5 Starting Well: evaluation design
3.2.5.1 Theory of change - a 'hands-on'
approach to evaluation
In the (revised) bid to evaluate
SW, the external team (
SW/E) made a distinction between the
criteria and methodology used to evaluate an intervention
trial and those appropriate for a 'demonstration project',
the rationale for the latter being as much about 'improving
the intervention as proving that it works' (p.7). The idea
that the evaluation should shape the direction of the
intervention contrasts with the approach described above.
The rationale for this more interactive approach is a
recognition that 'real-life' interventions rarely stand
still and often depart from their initial objectives, as a
consequence of external events, such as policy changes or
service reorganisation, or internal changes in direction
initiated by those implementing the intervention. Indeed,
initial objectives themselves may not be clear. From this
perspective, involvement of the evaluator in the
development and course of the
DP is desirable since it provides a
means of identifying what is being evaluated. The method
proposed to address these issues in the
SW (and
HAHP - see below) evaluations was the
'theory of change' (ToC) (Fullbright-Anderson, 1998; Judge
& Bauld, 2001).
ToC seeks a better understanding of the processes in an
intervention that might produce predicted change. The first
step is to identify the connections made by key
stakeholders between
DP inputs and desired outcomes to make
an assessment of the likelihood that the goals can be
achieved. Thus, in this first stage, through interaction
between
SW and
SW/E (interviews/focus groups with
steering group members), the aim was to clarify objectives.
After the initial stage, the focus of ToC switches to the
documentation of processes designed to assess whether
intended actions take place and whether predicted changes
are observed.
3.2.5.2 Process Evaluation
In the case of
SW/E, the process evaluation identified
three key issues that were an integral part of
SW's
TOC: the extent to which intensive home
visiting led to the development of therapeutic alliances
between families and their home visitors; the
implementation issues involved in developing a skill mix
approach to home visiting; and the degree to which
intensive home visiting at an individual family level led
to improved community and strategic responses to child and
family health problems. This took the form of detailed case
studies with 59
SW families and associated Health
Visitors (
HVs) and lay support workers in order to
evaluate the extent to which specified components of the
home intervention (e.g. the Family Health Plan) were being
achieved. It also involved detailed documentation of what
actually happened during the implementation of
SW in respect of all its components. In
a later document (Shute & Judge, in press), these were
reduced to 3 central components (a) case studies (as
before) (b) formation and development of staff team of
professionals and paraprofessionals (c) influence of
identified health needs on local and higher-level planning.
While these methods and measures would normally be part of
the process evaluation in any intervention, the 'theory of
change' involves the systematic documentation of change in
intervention components as they relate to intended
outcomes. It is claimed that this facilitates a more
sensitive analysis of 'causal' attribution than is often
the case, enabling better identification of both the
reasons for intervention success and failure.
3.2.5.3 Quasi-experimental study
The use of ToC, however, was intended to complement
rather than replace a traditional quasi-experimental
approach.
SW/E proposed a comparison (not unlike
that of
HR/E) of two cohorts, one in the
SW (intervention) area, the other in a
comparison area with broadly comparable demographic and
socio-economic characteristics. The
SW/E cohort is a time-specified subset
of all
SW participants, defined in this (and
the comparison cohort) as all babies born between June 2001
and June 2002 . It was initially proposed to recruit
families via
HVs at the point at which they first
made contact (assumed to be in the antenatal period) and
follow them up for a period of 30 months (contacts at
birth, 6 weeks, 6 months, 18 months and 30 months). However
the latter follow up was abandoned. Thus, in addition to
implementing the
SW programme, both recruitment to and
administration of
SW/E instruments would be conducted by
HVs, supplemented by
SW/E interviewers in the comparison
area. In general, this was what happened, recruitment and
some instruments (e.g. postal questionnaires) being
administered by
HVs, trained research nurses
additionally conducting home interviews at 6 and 18 months,
including the administration of the
HOME measure.
A range of outcome measures was proposed for each point
of contact including maternal and family characteristics
(antenatal), birthweight/gestation (birth), postnatal
depression (6 weeks), maternal diet, breastfeeding,
immunisations,
HOME score (6 and 18 months) etc.
Reflecting the emphasis on parenting as a key outcome, the
HOME score (with 6 sub-scores) was seen
as the core outcome measure. The projected number of
families in the
SW area was 600-700 per year (1500-2000
over 3 years). Two options were given for the comparison
area involving (a) a one year or (b) three year cohort
sample respectively. Power calculations were given in
relation to predicted effect sizes on
HOME and child accidents, the latter
being used to demonstrate that in option (a) the sample
size would not be adequate.
3.2.6 Starting Well: assessment
3.2.6.1 Theory of Change
ToC is an interesting development in evaluation
methodology and seems particularly appropriate when the
initial objectives of a project are unclear. However, the
following points are worth noting:
- ToC has as one of its overall aims the improvement
of interventions. It does not start with objectives as
defined by a project (as with
HR/E) but seeks via interaction with
DPs to clarify objectives and
possibly redefine them. This is a departure from the
usual scientific principle of independence and might in
various ways compromise the implementation of the
DP. For example,
DPs might become over-dependent on
evaluators for direction and guidance. This is unlikely
to be a problem in the formative stages of a project
when objectives are being formulated, but if objectives
(and related processes) continue to change in a
'mature' project, it becomes more difficult to know
what is being evaluated. ToC would, however, facilitate
an understanding of how and why such change occurred
(e.g. the
DP became committed to a new
approach), but what then is the 'project' that is being
evaluated overall?
- Inasmuch as ToC leads to a change in objectives,
there is a risk that the
DP shifts away from the evidence
base it rested on in the first place, thereby
inadvertently reducing the likelihood of obtaining an
intervention effect
- Any change in project objectives resulting from ToC
that occurred after an evaluation (e.g. before/after
survey) had been commenced would potentially render
that evaluation flawed. This highlights the fact that
ToC is of particular value in the formative stages of a
project and should not intervene after a project is up
and running. Beyond the initial stages, ToC functions
much like any other process measure (a point made by
HR/E).
- ToC may place demands on
DPs to become involved in procedures
they may not have expected and might not want. The view
from
HaHP would have been that this was
another set of controls while
SW did not regard the ToC feedback
as particularly useful.
Since both
SW and
HaHP process evaluations used ToC
methods, a combined discussion of how well ToC worked in
practice is included in section 3.2.8 below.
3.2.6.2 Quasi-experimental study
As in the case of
HR/E, the choice of a quasi-experimental
design was an appropriate methodology to test for
differences in outcomes between intervention and comparison
areas. There were some modifications to the design
resulting both from delays in implementing
SW and from the practicalities of
conducting all the proposed follow-ups within the
designated timeframe. Thus, the number of
SW/E contacts has been limited to three
points (10-14 days after birth, 6 months and 18 months) and
the availability of data has been limited by the way
SW was itself implemented (most notably
in the lack of contact with families in the ante-natal
period). However, while these problems restrict the
capacity of the quasi-experimental design to deliver they
do not invalidate it. The extent to which it has delivered
depends on the following considerations:
- Recruitment to the survey was more problematic than
anticipated. Although 98% of all families participated
in
SW,
HVs were less successful in
recruiting them to
SW/E. In the
SW areas, of 604 births in the
specified time period, 375 (62.5%) mothers agreed to
take part; in the comparison area, of (an estimated)
600 births, 262 (43.7%) consented to participate (total
n=627), possibly reflecting a lower level of commitment
among non-
SWHVs. Consequently, recruitment of
controls was later extended to health visiting teams in
the West of Glasgow. It is not clear how this affected
the representativeness of the control sample.
- Response rates to each of the
SW/E contacts have been lower than
expected. In the first (baseline) contact, involving a
postal questionnaire, data were only available on
447/637 (70%) of cases (71%
SW area, 69% comparison area); at
six months (for interview) the comparable figure was
better 493/637 (77%) of cases (80%
SW, 73% comparison). However, an
analysis of six month outcomes (Shute and Judge, in
press) was based on all those with baseline and six
month data, reducing the sample to 359/637 (57%) or 30%
of the population of families in both areas. The number
of cases available for analysis (
SW 213, Comparison, 146), was below
the earlier estimate of the numbers (220 families per
area) needed to detect an intervention effect, raising
the possibility that relevant effects would fail to be
detected.
- These problems raise the question whether each
sample is representative of its respective population,
previously regarded as 'a prerequisite for this
evaluation aim' (First Annual report, p.5). This issue
was not addressed in the paper analysing six month
outcomes (Shute et al., submitted) where both areas
were reported to have similar proportions of lone
mothers and non home-owners to the Glasgow population.
This does suggest the possibility that the
SW/E samples were biased towards
less deprived families in both the
SW and comparison areas. The issue
was highlighted in the 18 month progress report (e.g.
within area comparisons of opt-ins and opt-outs) and
should also be addressed in the final report.
- It was also evident from the six month data that
outcomes for ethnic minorities differ (higher maternal
depression, poorer
HOME scores), which raises the
question of the applicability of the
SW intervention to this particular
sub-group or the validity of these instruments in a
transcultural context. Although this can be controlled
for in multivariate analysis, a sensitivity analysis to
determine how it impacts on the overall intervention
effect would be useful.
- Throughout its development,
SW has evolved, to the extent that
it is no longer regarded as a project so much as an
approach (Ross & de Caestecker, submitted). This
has, as acknowledged, made it extremely difficult to
evaluate since families have not been exposed to a
constant intervention but rather to different types and
levels of intervention. Thus, in the preliminary phase,
the early families may not have received the full
intervention, in later phases the intervention may have
become diluted, the result of which is that without
taking this into account it may not be possible to
detect an effect of
SW at its most optimal. This issue
was alluded to by Shute & Judge who used the
concept of the 'mature'
DP to indicate the point at which a
project is fully up and running and to some extent
constant.
In summary, although
SW/E have conducted this part of the
evaluation to a high standard, problems of recruitment have
limited its capacity to deliver. It is not yet clear how
representative the samples are of their respective
populations, nor whether the numbers are adequate to
demonstrate an intervention effect on parenting. In
retrospect, it may have been better to seek consent from
mothers in the ante-natal period (with additional data
collection benefits) and to collect baseline data via
interview rather than postal questionnaire. It is also
possible that the full effect of
SW has been obscured by changes in the
DP over time which might be revealed via
identification, and related analysis, of a 'mature'
phase.
3.2.7
HaHP: evaluation design
The external evaluation of
HaHP involves 4 separate but linked
approaches that are intended to give a balanced perspective
on the overall processes, impacts and outcomes of the
demonstration project.
These are:
- Theory of Change
- A quasi-experimental survey
- Contextual analysis
- A range of interrelated studies of key settings and
organisations (the community, primary care and the
local authority)
For a variety of reasons, planned and unplanned, both
the
HaHP project itself and the detailed
format of the evaluation have developed and altered over
the period of the project. For example, the interrelated
studies were introduced to strengthen the evaluation once
it became clear that problems with the surveys would limit
their usefulness. The integrated case studies focused on
two settings (primary care and community) and one
organisation (local authority), and looked at the extent of
service development and the impact of
HaHP on professionals and/or agenda
change, at both strategic and operational levels.
Many of the general points made in relation to
SW/E also apply to the evaluation of
HaHP (
HaHP/E). Without the final report we
cannot judge the success of the overall approach. Here we
consider issues related to the quasi-experimental
surveys.
3.2.7.1 Quasi-experimental study
This component of the evaluation aimed to assess the
impact of the overall intervention in Paisley. A comparison
area, Inverclyde, was identified, with similar population
characteristics and geographically adjacent to the study
area.
Randomly selected adults aged 20-70, within age and sex
quotas, were to be assessed for
CHD risk factors and health related
behaviours at the beginning and reassessed at the end of
the intervention period in both Paisley and Inverclyde. The
assessments consisted of a questionnaire and attendance at
a nurse led clinical assessment.
However, despite major efforts by the evaluators, the
response rate for the baseline survey was a disappointing
28% in Paisley and 27% in Inverclyde. Changes in data
protection regulations and inaccuracies in the addresses on
the community health index caused particular difficulties
which could not necessarily have been anticipated. The low
response reflects recent experience in heath and lifestyle
surveys throughout Scotland where many areas have suffered
falling levels of response in the past few years.
The poor response rate and an over-representation of
older people and less deprived areas meant that the survey
population was not representative of the Paisley and
Inverclyde populations. Several options to tackle this
problem were considered and the evaluation proposal was
revised as a result. The revised approach aims to use
secondary data, including monitoring information from
within the project.
This is likely to cause different problems and it may
prove difficult to get good quality comprehensive data on
trends in risk factors and health related behaviour in the
general population and in various sub groups such as young
people. This has obvious implications for
HaHP as a national demonstration project
for
CHD prevention where any impact of the
interventions on mortality would not be seen for some years
and some effect on intermediate measures such as risk
factors would be expected.
The design of this part of the evaluation was
problematic from the start, with most other community based
CHD prevention studies being unable to
demonstrate an attributable reduction in risk factors in
the study areas compared to control areas. In summary, the
complex nature of
HaHP, the unrealistic timescales for
planning and several unforeseen constraints have led to
real problems both for the implementation and the
evaluation.
3.2.8 Theory of Change in practice
The process evaluation element of the external
evaluations of Starting Well and Have a Heart Paisley
consisted of two distinct strands:
- A Theory of Change strand which attempted to
explore, surface and develop the 'programme logic'
(i.e. the logical connections between the programme's
aims, the programme's activities and the intended
outcomes) through interviews with key strategic and
operational staff and the analysis of key
documents
- A formative evaluation element that examined and
described the implementation process of key elements of
the
DPs using qualitative research
methods
Both these elements were intended to provide feedback
and learning for the
DPs themselves and for the wider policy
and practice communities. Interim reports were produced in
2003 based on the early findings from these process
evaluations elements (Mackenzie, 2003; Blamey, 2001 and
2003). The contribution of the ToC approach is said to be
threefold (Weiss, 1995):
- Sharpening planning and implementation
- Facilitating the development of an evaluation
framework
- Reducing problems associated with causal
attribution
The extent to which these objectives were realised
within
HaHP and Starting Well was explicitly
addressed in a paper (Mackenzie, M. and Blamey, A ,
2004).
a)
Sharpening planning and implementation. From the
perspective of the
HaHP and
SW demonstration projects, ToC was 'a
dominant element' of the external evaluation but it had
mixed results. It was seen as an interesting and helpful
tool in terms of project development and providing
formative feedback to project management, giving them a
mandate to make proposals for future developments. On the
other hand it was also seen as burdensome and the feedback
was sometimes too late to inform decisions. An alternative
view was expressed by
HaHP/E indicating that the early work in
HaHP (Blamey 2001) made clear comment on
the lack of evidence-base practice in key areas of
HaHP as well as the over ambitious
nature of the plans, yet these issues were not fully
addressed by the project.
The ToC approach proved particularly useful for
HaHP because of the major difficulties
the
DP faced in its early stages. The
initial plans for
HaHP were overly ambitious, and, as a
result, timescales for delivery were lengthened and
expectations of outputs and outcomes were reduced. There
was a lack of initial planning time for such a complex
initiative as well as operational problems such as staff
recruitment and retention. (Blamey 2003). As a result, the
theory based approach was very well received by the
HaHP project team, who appreciated the
support and direction it enabled the team to develop.
For
SW, the process of reflection involved
was seen as the most useful aspect of the ToC approach. For
both
SW and
HaHP, the main shortcoming was that it
was a tool more appropriate to use at the planning stage,
before the project started, rather than while the project
was in development and maturing. From the perspective of
the external evaluation teams, ToC was a helpful tool for
surfacing conflicts in project goals, priorities and
approaches among key stakeholders, but offered nothing in
resolving these.
b)
Facilitating the development of an evaluation
framework. The ToC was intended to provide a framework
of expectations for the evaluation to test. The value of
ToC before the start of the projects is to clarify project
components, linking them to projected outcomes and thus
clarifying the fit between overall project objectives and
overall project 'package'. From the projects' perspective,
ToC can and did play a role in project design/clarification
of projects elements/package as a whole, and guiding of
evaluation. However, the implied sequential process was not
apparent; it remained stuck in the project design phase
rather than informing the evaluation focus. The main
barrier for the projects to making the process work
optimally was timing, it being desirable to apply ToC at
design stage rather than once a project is up and running.
From the evaluators' perspective, on the other hand, it was
seen as useful in identifying key questions for the
internal evaluation and to help the external evaluation
team to further specify evaluation questions and prioritise
the focus of the evaluation (eg roles within the
workforce).
c)
Attribution - understanding what brought about the
observed effects
In order to fully realise the potential of the ToC
approach in helping to unravel problems of causal
attribution, it has to be carried out in a very intensive
way to create a well specified, detailed theory of change.
This was not possible to achieve in the context of
SW and
HaHP which are both complex
multi-stranded interventions, developed and implemented in
a fast-moving climate where detailed, highly specified
planning does not exist and large areas of the projects are
not grounded within an evidence base. A further reservation
was the linearity of causal effect implied by the ToC
approach, whereas in complex systems the synergistic effect
of interaction between the individual components needs to
be allowed for - a project may be ineffective on its own
but be effective in the wider context of the overall
programme.
3.3 Internal Monitoring and Evaluation
It was widely acknowledged by the
DP teams that the establishment of an
effective internal monitoring programme was given
insufficient priority within the early stages of developing
and implementing the projects. This led to delays in
establishing proper performance monitoring systems,
compounded by difficulties in recruiting evaluation staff
with the appropriate skills and experience.
3.3.1 Healthy Respect
After a difficult beginning with several project
management changes, it was only once the current project
manager was in post that an internal evaluation function
was developed. Standard project management methods have
been adopted with an observational, process-focused
approach. This involves using pro-formas to collect
quarterly 'audit' information from all the constituent
projects and providing feedback reports to projects for
discussion. The data are analysed by the
NHS Board's Health Information Unit and
have been used to inform decision-making about continued
funding of the projects. The continual demand for
monitoring data was seen as imposing a heavy burden on the
smaller sub-projects. Schools have been reluctant to
provide information on
SHARE training coverage due to lack of
capacity. Some additional research (eg media evaluation)
has been commissioned by the internal evaluation
officer.
After initial difficulties, the relationships between
Healthy Respect's internal team and
HR/E were described as 'good' in terms
of collaboration and information sharing. The Memorandum of
Agreement between
HR and
HR/E was seen as very important in
delineating the respective roles of internal and external
evaluation teams and avoiding duplication. The external
evaluation team drafted a pro forma for the individual
projects to use for their quarterly 'audit' returns (in
lieu of diaries), but the external team have been reluctant
to get involved in the projects and interim analyses of
their survey data were strictly proscribed.
3.3.2 Starting Well
At the outset, there was little appreciation of the need
for and role of the internal evaluation function. It was
not until March 2002 (2 years from the start) that Starting
Well was able to recruit someone with the appropriate
skills for the internal evaluation post. The internal
evaluation has produced a monthly management report using
Family Health Plan data that is captured and collated
centrally on a database in each area. Feeding the
information back to the teams has served as a focus for
team discussions. There has been an increasing demand for
this information and ad hoc analyses have been conducted.
The external evaluation team uses this internal monitoring
data for the families in their cohort. Issues relating to
data protection had to be sorted before data sharing was
possible. Other research has also been commissioned by the
internal evaluation officer once she was in post, such as
action research on nursing practice, the use of practice
guidelines and further work on community development.
3.3.3 Have A Heart Paisley
Similarly, the internal evaluation was severely delayed
with
HaHP. The project had expected the
external evaluation to fulfil all requirements. Thus, the
internal evaluation post was created at a junior officer
level. Problems mounted due to limited baseline data
collected at the start of the project, a low response rate
to the baseline survey, problems in recruiting and
retaining the internal evaluation post holder and a
realisation that the external evaluation role did not
encompass internal monitoring and evaluation functions. As
the project evolved the interface between internal and
external evaluation became more blurred, and a more
integrated approach developed. Relationships between the
HaHP project and the external evaluation
team were described as good in terms of useful feedback and
a constant presence at management team meetings.
In addition, in
HaHP, some key aspects of the project
were not covered by the external evaluation and struggled
to get funded evaluation programmes in place. Examples of
this were the cardiac rehabilitation programme and the
development of a
CHD disease register which was intended
to provide the basis of a systematic approach to secondary
prevention.
HaHP submitted unsuccessful bids to the
CSO and to the British Heart Foundation
for funding for these evaluations. Subsequently, they
developed an ambitious internal evaluation that is
currently underway.
It is likely that the effectiveness of the cardiac
rehabilitation programme and of the disease register in
ensuring the systematic implementation of evidence-based
interventions will be of particular interest to the
NHS throughout Scotland. In retrospect,
the effective evaluation of this key element of
HaHP should have been given a higher
priority from the start and resourced adequately.
3.4 Relationship between internal and external
evaluations
The experience of the three internal evaluations
highlights the following 3 issues:
Evaluation and implementation role
At the outset, the focus of the
DPs was defined as implementation/action
and disseminating good practice; research and evaluation
was not seen as part of these roles and there were
expectations that the external evaluation team would do the
research/evaluation. In particular, the requirements for
internal monitoring and evaluation were unclear at the
outset and lacked designated and clear strategic
leadership.
The relationship between internal and external
evaluation
There was a lack of clarity at the outset about the need
for, and the nature and level of the internal evaluation
function on the part of the
DPs and the Scottish Executive. This
meant that internal and external roles were not planned as
complementary, integrated functions but kept quite
separate. From
HR's perspective, there had always been
doubts about how the internal and external evaluation would
fit together, and concerns about what would happen if there
were a mismatch between the two. However, by commissioning
the external evaluations first and independently from the
projects, the internal evaluations were by default left
doing everything else.
HaHP felt that a clear understanding of
what was required from the internal evaluation, and what
resources were needed, should have been established at the
outset. The lack of integration between internal and
external evaluation roles also meant that there is no
mechanism for bringing together and synthesising
information from the internal and external evaluations. The
HaHP team believed that it must be
integrated much better than it has been in the past.
Starting Well's view was that it was important for all the
learning to be synthesised rather than focusing
disproportionately on the results from the external
evaluation.
Evaluation capacity and culture within implementing
organisations
The whole process of uncertainty and lack of clarity and
leadership for the internal evaluation function was
compounded by the lack of capacity of some groups of staff
to be 'critical practitioners' and to understand and build
an internal evaluation role into the planning, development
and review cycle.
NHS secondary care was one exception, in
that they had a strong culture of reflective practice and
were developing information systems to support this.
Variations in the evaluation culture across sectors were
also noted by
HR - the voluntary sector were seen to
be very familiar with critical reflection and producing
reports for funders; local authorities were seen to have an
'inspection culture' and produce committee reports when
necessary; while the
NHS were seen to have a more 'academic'
approach.
« Previous | Contents | Next »