We often talk about creating a level playing field for students taking their GCSEs, AS and A levels, regardless of which exam board they enter with. That’s our aim this summer, in both reformed and unreformed qualifications.
Our priority is to make sure that it is no easier to get a particular grade with one board than another; in order words to align the grade standards between boards (and between tiers for each board in tiered subjects such as GCSE maths). This is particularly important in the reformed qualifications, as exam boards this summer will set the standard for future years.
We judge the comparability of grade standards using the statistical predictions of how many students were expected to each be awarded each grade. We compare each board’s results to its prediction and if each board is reasonably close to its prediction, then we judge that the grade standards across all boards are aligned.
Here’s a recap on how predictions are generated, and an example that I hope will bring this to life.
Predictions step 1
First, select an appropriate ‘reference year’ in the past (for example, 2016). Then, match as many of those students in the reference year to their prior attainment (for A level, that would mean matching those students to their GCSE results two years earlier and calculating a ‘mean GCSE score’). Next, separate those students according to their mean GCSE score and look at how many in the top group achieved an A*, A, B and so on. If you put all that in a table, you have an ‘outcome matrix’. This will tell you what the probability is of a student in the top group for mean GCSE achieving a grade A, or an A*, for example.
Predictions step 2
Once you have that outcome matrix, you can use it with the current year’s students. Each board will match its current students to their GCSE prior attainment, and use the outcome matrix to generate a prediction for the key grades (for A level, that’s A*, A and E). The predictions will reflect the ability profile of the students entering for each board, so one board might have a higher prediction at A* if it has more students with higher mean GCSE scores.
All of this is done at cohort level, so we are not predicting individual student’s results.
We set ‘reporting tolerances’ around those predictions, at 1, 2 or 3% depending on the number of matched students. The predictions are more reliable the more students you include. So the greater the entry size the closer to prediction the outcomes should be (and the lower our tolerance).
When the boards set their grade boundaries, they report results to us, and they report how far they are from the prediction for each specification.
A real-life example
Here’s a real example from 2016, taken from the data we publish each year on inter-board comparability.
In A level media studies there are three specifications offered by AQA, OCR and WJEC. All three have more than 3000 students matched to their mean GCSE score, so the reporting tolerance at grade A for all three board is plus or minus 1%. That means if each board is within 1% of its prediction at grade A, it does not have to provide any additional evidence to us to support its decision on where to set the grade boundaries. If a board wants to set grade boundaries that means results are outside that 1% tolerance, we consider their evidence but we also consider whether that might affect inter-board comparability.
In 2016, all three boards were within that 1% tolerance. AQA had a prediction of 11% and their results for matched students were 0.4% above prediction. OCR’s prediction was 10.2% and their results were 0.4% below prediction. WJEC had a prediction of 10.6% and its results were 0.8% below that prediction. We judge that if all awards are within tolerance, then the grade standards between boards are aligned. Some people describe our approach as norm referencing. It’s not, because results can go up and down depending on the ability profile of the cohort. We don’t expect the percentage of students at each grade to be the same across all boards, because the ability profile of each board will be different.
This summer, we’ll be comparing exam board results for matched students (those for which the exam boards have prior performance data) against the predictions for each board, and using that to judge the alignment between them.
If you want more information about how boards will align grade standards between tiers in GCSE maths, we have published some slides and commentary.
And there is also more information about inter-board comparability in 2016.
Associate Director, Standards and Comparability
 In the case of reporting tolerances, the percentages refer to percentage points. For example, if the prediction is 54% and there is a reporting tolerance of 1%, the exam board can set boundaries that produce results between 53% and 55% without needing to provide additional evidence to support those decisions.