In February and March year 11 students from over 300 schools in England took part in the third National Reference Test (NRT) in English and maths. This is the first year that those results, which are still being analysed, will be used as a source of evidence when the exam boards set grade boundaries in GCSE English language and maths.
As we explained in an earlier blog, the NRT is designed to measure small changes in students’ performance over time in GCSE English and maths, which might otherwise be difficult to detect. The NRT uses the same questions year-on-year and so we can be confident that any changes in performance are not due to a particular year’s paper being more or less demanding.
What do NRT results look like?
Results are reported to us at 3 key grade boundaries – 7/6, 5/4 and 4/3 – and show the percentage of students achieving those grades, set against the baseline cohort in 2017. The results are reported with confidence intervals – these reflect the possibility that we might have got slightly different results if a different sample of students had taken the NRT. If the difference in results is larger than the confidence interval, we can be more confident that it reflects a real difference in performance between one year and the next. If the difference is within the confidence intervals, we will not make an adjustment.
How will Ofqual use NRT results?
We will share the NRT results with exam boards as part of our usual discussions about maintaining standards. The key question will be whether the NRT results indicate a change in student performance that should be reflected in the overall grades awarded.
Our job is to ensure standards are maintained over time. We will, therefore, be cautious in deciding whether to adjust to grade standards, especially where we believe there is still evidence of a sawtooth effect – that is, where we believe that overall performance has increased slightly in the early years of a new qualification due to increasing familiarity with the requirements of the exam, and more past papers being available, rather than due to genuine improvements.
We will also make sure that our approach is consistent between years, subjects and grade boundaries. And, all other things being equal, we would give the same weight to evidence that performance had declined as we would to evidence that performance had improved. Where we do decide to make an adjustment, we will generally seek to make a relatively small adjustment, to reduce the risk of a large shift in the proportions of students achieving each grade from one year to the next, given that students might be competing in future with those in adjacent cohorts.
We will publish results from the 2019 NRT, together with our decisions, on 22 August, to coincide with GCSE results.
How will any adjustment be made?
Exam boards currently use statistical predictions about the expected proportion of students achieving the key grades, to guide where they set their grade boundaries. If we believe that the NRT evidence is sufficiently compelling to make an adjustment at one or more grades, we will do this by making an adjustment to the predictions for all boards.
Here’s an example. If the NRT results showed a statistically significant change of +2.5 percentage points at grade 7 in English language, we might decide to make an adjustment. In deciding the size of that adjustment we will consider additional contextual evidence, such as the student survey taken by all those sitting the NRT to assess their motivation in the exam, or normal bias in the sample. We might require exam boards to adjust predictions by the full 2.5pp, or by a smaller amount, perhaps 1.5pp. Exam boards will implement that change by adjusting their predictions in a way that reflects the relative differences in the prior attainment profile of their own cohorts.
It is worth noting that an adjustment of, for example, +2pp might not mean a change of +2pp on the results published in August, for two reasons.
First, grade boundaries can only be set on whole marks – 74 or 75 but not 74.5, for example. Since there are many hundreds (or even thousands) of students on each mark, the cumulative percentage of students on a mark – for example, 74 – won’t exactly match the prediction. Exam boards will select a grade boundary mark which might mean a slightly higher or lower percentage than the prediction itself. This happens all the time. In the case of an NRT adjustment, it means that, for example, a +2pp adjustment might end up being +2.2pp or +1.9pp.
The other reason that national results might not reflect the exact adjustment is that the adjustment will be made using the cohort of 16-year-old students who have been matched to their Key Stage 2 scores, but the same grade boundaries will apply to all students, regardless of their age. The overall results released in August include not just 16-year-olds but also those students in year 10 and below, as well as post-16 students. Changes in other age groups (post-16 students doing better or worse than previous years, for example) could ‘cancel out’ any change due to NRT, when results for all age groups are reported.
What about all the other subjects?
We will only use the NRT evidence to consider an adjustment in GCSE English language or maths, as the NRT items are based on the sorts of questions used in GCSE English language and maths. However, we have a programme of work in place to consider how we might strengthen the use of senior examiner judgement in setting grade boundaries, so that exam boards are better able to detect and take account of small changes in performance over time. We piloted a comparative judgement approach in a small number of subjects last year, and we will be working with exam boards to carry out further pilots this summer.
The NRT is a useful tool to help us maintain standards over time, and ensure fairness for students taking their exams this summer.
If you would like to talk to Ofqual about any of the issues raised in this blog, please contact us at public.enquiries@ofqual.gov.uk.
2 comments
Comment by V Everett posted on
My daughter's schools did these. Word got around that if they did badly it woukd make grade boundaries lower. So they wrote as little as possible.
Comment by Kate Keating posted on
The NRT is designed to detect small changes in student performance from one year to the next. If students perform less well on the NRT, perhaps by writing less as you suggest, this would suggest that this year's year 11 students are less proficient in English or maths than previous years. We know that student motivation is important and we guard against the impact of this sort of behaviour by carrying out a survey after the NRT to capture students' motivation and attitudes to the test. We will consider the results of that survey alongside the results of the NRT.