The UK exam debacle reminds us that algorithms can’t fix broken systems | MIT Technology Review
Nearly 40% of students ended up receiving exam scores downgraded from their teachers’ predictions, threatening to cost them their university spots. Analysis of the algorithm also revealed that it had disproportionately hurt students from working-class and disadvantaged communities and inflated the scores of students from private schools. On August 16, hundreds chanted “Fuck the algorithm” in front of the UK’s Department of Education building in London to protest the results. By the next day, Ofqual had reversed its decision. Students will now be awarded either their teacher’s predicted scores or the algorithm’s—whichever is higher.
The debacle feels like a textbook example of algorithmic discrimination. Those who have since dissected the algorithm have pointed out how predictable it was that things would go awry; it was trained, in part, not just on each student’s past academic performance but also on the past entrance-exam performance of the student’s school. The approach could only have led to punishment of outstanding outliers in favor of a consistent average.
But the root of the problem runs deeper than bad data or poor algorithmic design. The more fundamental errors were made before Ofqual even chose to pursue an algorithm. At bottom, the regulator lost sight of the ultimate goal: to help students transition into university during anxiety-ridden times. In this unprecedented situation, the exam system should have been completely rethought.
“There was just a spectacular failure of imagination,” says Hye Jung Han, a researcher at Human Rights Watch in the US, who focuses on children’s rights and technology. “They just didn’t question the very premise of so many of their processes even when they should have.”
The objective completely shaped the way Ofqual went about pursuing the problem. The need for standardization overruled everything else. The regulator then logically chose one of the best standardization tools, a statistical model, for predicting a distribution of entrance-exam scores for 2020 that would match the distribution from 2019.
Had Ofqual chosen the other objective, things would have gone quite differently. It likely would have scrapped the algorithm and worked with universities to change how the exam grades are weighted in their admissions processes. “If they just looked one step past their immediate problem and looked at what are the purpose of grades—to go to university, to be able to get jobs—they could have flexibly worked with universities and with workplaces to say, ‘Hey, this year grades are going to look different, which means that any important decisions that traditionally were made based off of grades also need to flexible and need to be changed,” says Han.
Ofqual’s failures are not unique. In a report published last week by the Oxford Internet Institute, researchers found that one of the most common traps organizations fall into when implementing algorithms is the belief that they will fix really complex structural issues. These projects “lend themselves to a kind of magical thinking,” says Gina Neff, an associate professor at the institute, who coauthored the report. “Somehow the algorithm will simply wash away any teacher bias, wash away any attempt at cheating or gaming the system.”
But the truth is, algorithms cannot fix broken systems. They inherit the flaws of the systems in which they’re placed. In this case, the students and their futures ultimately bore the brunt of the harm. “ I think it’s the first time that an entire nation has felt the injustice of an algorithm simultaneously ,” says Fry.