We take the risk of bias seriously, so we recently carried out a review of existing third party research on bias in teacher assessment. This has given us a greater understanding and helped to inform and improve the quality of the guidance that we published for teachers who are making judgements on students’ performance this academic year.
Last autumn, we analysed and evaluated the centre assessment grades that were awarded in 2020. Our equalities analysis for GCSE, AS and A level students and for students taking vocational and technical qualifications found no evidence that students were systemically disadvantaged on the basis of particular protected characteristics or socioeconomic status. This is good news.
And, while the review we are publishing today reaches several conclusions, it doesn’t necessarily follow that there will be bias in teacher assessments this academic year. It does, however, highlight the importance of having safeguards in place, including quality assurance arrangements.
What is teacher assessment bias?
To understand what bias is, we need to introduce the idea of error in assessment. In simple terms, error is the difference between the result that a student ought to have been awarded from an assessment – given their level of attainment in the subject being assessed – and the result that they ended up being awarded.
Error is a feature of all measurement, not just educational assessment. We design our measuring procedures to minimise its likelihood, but we cannot eliminate it entirely. Academic assessment experts often distinguish between 2 types of measurement error. The first is error that seems to operate randomly – known as unreliability – such as the luck of the draw that means you end up having to answer an exam question on a topic you forgot to revise. The second is error that seems to operate systematically – known as bias.
No errors are good, and we do all that we can to eliminate them. But some errors feel worse than others. Bias that systematically affects one group of students more than others (for example, girls more than boys) feels especially pernicious. Educational assessment needs to be fair for all, especially when the stakes are high.
What do we know about teacher assessment bias?
Our brains make sense of the world by simplifying it. This process is natural, and necessary. Yet, sometimes these simplifications inappropriately distort our beliefs about the world, resulting in unconscious biases that can compromise any judgements that we then make. All judgements that we make as humans are susceptible to biases of this sort, without us necessarily even being aware of them. This includes the judgements that teachers make when they assess students.
Last year, teachers were asked to assess students based on a prediction of the grade that would have been awarded if exams had gone ahead. As part of an equalities impact analysis, in April 2020 we published a review of research into bias in teacher judgement, with a particular focus on teacher predictions (for example, predictions used for A level grades for university admissions). We concluded that there was some evidence of bias related to ethnicity and socioeconomic status.
This year, we’ve asked teachers to do something different. They have not been asked to predict the grade that would have been awarded, but to assess each student on their current level of learning, based on a range of evidence that demonstrates their standard of performance. This is clearly teacher assessment rather than prediction, and the review of research we have published today (17 May) focused squarely upon the possibility of bias in teacher assessments.
The assessment arrangements covered by the literature we’ve reviewed aren’t directly comparable to this year’s arrangements for teacher-assessed grades in England, but they are all examples of teacher assessment. We reviewed the available evidence to consider what we might learn from it.
Both the April 2020 review that looked at A level grades predicted for university and the review we are publishing today (17 May) looked specifically for evidence of systematic divergence between teacher-based results and test-based results, linked to gender, ethnicity, or suchlike. Because there is more opportunity for bias to creep in to teacher-based results than test-based results, divergence of this sort is more likely to represent bias in the teacher-based results.
Our review reached several conclusions. We found evidence that:
- gender bias was mixed – but a slight bias in favour of girls (or against boys) was a common finding
- ethnicity bias was mixed – there were findings of bias against as well as in favour of each minority group (relative to the majority group) as well as findings of no bias
- disadvantage bias was less mixed – bias against the more disadvantaged (or in favour of the less disadvantaged) was a common finding
- SEN bias was less mixed – bias against pupils with special educational needs (or in favour of those without) was a common finding
The literature that we drew upon was fairly limited in size, and it is possible that it might have been skewed to some extent by publication bias, whereby evidence of an effect occurring is more likely to get published than evidence of no effect. So, it doesn’t necessarily follow that teacher-assessed grades will be biased in these ways this year. However, the literature does tell us that there is a risk of bias in teacher assessment and that is why it is so important that arrangements are in place this year to mitigate this risk.
How can we mitigate the risk of teacher assessment bias?
To support teachers making judgements this year, we revised our Information for centres about making objective judgements. This document makes practical suggestions, including to:
- make sure each judgement is based purely upon evidence of how a student has performed, putting other factors to one side (for example, their attitude, or behaviour)
- make yourself aware of the different kinds of unconscious cognitive biases that can compromise judgements, and think about strategies for minimising them
- shed light on the factors that are influencing you, by discussing each judgement in detail with colleagues, including SENCos or SEND experts
- see if it is possible to generate evidence that has the potential to indicate the presence or absence of bias in your judgements
JCQ have also provided training material for teachers on maintaining objectivity.
It is particularly important for judgements not to be made by a single teacher in isolation, which is why we have said that each grade must be signed off by at least two teachers. The involvement of more than one teacher in every teacher-assessed grade that is submitted is one way that the risk of bias will be managed this year.
Our equalities analyses from last autumn found no systemic disadvantage for students on the basis of particular protected characteristics or socioeconomic status. So, notwithstanding the risk evident in the review we published today, it would be wrong to assume that bias is necessarily a pervasive feature of teacher judgement.
Because grades will not be moderated statistically this year, steps have been put in place to support teachers’ judgements, and to quality assure teacher-assessed grades before they are confirmed. And, as we did last year, we will carry out equality analyses of this summer’s results to look for patterns in results once they have been confirmed, including patterns across different groups of students.
Bearing in mind the impact of the pandemic on learning over the past year or so and the reality that students will be assessed only on content they have been taught, it will be very hard to interpret the patterns of results that we will see this summer. We will soon publish a set of reports on Learning During the Pandemic – a compilation of detailed analytical reviews of the literature that has been published in England and overseas since March 2020 – which will help to explain in more detail the broader context for assessments this summer. This will provide a backdrop for the work that we will then undertake to make as much sense as possible of the patterns of results that emerge over the summer.