Using Large Learning Models in the History Classroom: practical perspectives

History: The Journal of the Historical Association has a long tradition of addressing questions of pedagogic practice in its pages. Most recently, this has included an article on school-university collaborations in our June 2025 issue. Moreover, our December 2025 issue is set to feature a series of contributions on 'Creative History in the Classroom'. The present piece extends the discussion to our journal's blog.
Since the release of an early version of the Large Language Model (LLM) called ChatGPT in November 2022, excitement and disquiet have accompanied the subsequent proliferation of LLMs. One might almost talk of a ‘moral panic’.
Perhaps unsurprisingly, the discipline of History has been particularly active in debating the merits and pitfalls of generative AI. Students enrolled on History degrees have traditionally been expected to read large amounts of text, synthesise this information in combination with their own insight and judgment, and form arguments about it in long-form writing. It is clear that LLMs offer a significant and, in the end, perhaps insuperable challenge to the standard take-home essay format.
With this challenge in mind, the authors of this article decided to see how LLMs might be creatively harnessed for use in the History classroom. These pedagogical experiments took place in the academic year 2023/24. Subsequent developments in what is – in academic terms – a very fast-moving space will certainly have a bearing on our findings.
Two approaches for using AI in the History classroom
Edinburgh
In autumn 2023, West taught a 20-credit course to c. 45 undergraduate students at the University of Edinburgh on the divorce scandal of King Lothar and Queen Theutberga, two rulers in early medieval Europe. The undergraduates were in the third year of their four-year degree. This course was assessed by one summative invigilated exam at the end of term, but also included a formative assessment component, worth 15% of the overall course grade.
This component was intended to help the students prepare for the exam. However, instead of asking the students to write a conventional coursework essay, West asked them individually to use an LLM to generate a short (500 word) essay in response to a standard essay question. The student task was then to write a 1500-word critique of that AI-generated essay, drawing on wider reading, classroom discussion and their own historical judgment. West then graded these critiques. Most of the students used ChatGPT, though a few used Google Bard.
Results
The AI-generated essays were detailed and of reasonable quality, despite the relatively specialised nature of the historical material. Nevertheless, the students found it straightforward to develop a critique, and overall the student work was of good quality. As far one could tell, no students generated their critique using AI.
Some critiques provided a conceptually-focused analysis of AI's characteristically vague contextualisation. Others concentrated on specific inaccuracies and omissions in the AI-generated essays, proceeding on a line-by-line or paragraph-by-paragraph basis, which gave them a slightly mechanical feel. Another feature which often escaped analysis was AI’s marked tendency to offer its own moral judgements, for instance repeatedly condemning the medieval trial by ordeal as superstitious and unfair. With this in mind, future iterations of this assessment might benefit from clarified guidance, encouraging the students to focus not just on matters of detail but on the wider picture.
Sheffield
In spring 2024, Moses taught a 15-credit option module at the University of Sheffield to 8 MA students on the history of welfare, sexuality and the family in late nineteenth and twentieth-century Britain. Although run through the Department of History, the cohort was evenly split between History and Sociology MA students. The course was assessed through one summative essay at the end of the term.
Moses divided the class into four pairs consisting of one sociology student and one historian each. These pairs were asked to prepare a 500-word weekly response in advance of each seminar session. Students were encouraged to use Google Bard/Gemini or another form of generative AI to ask about the seminar questions, and then to critique the AI-generated answer in their weekly responses. Those weekly responses were then shared on the module’s virtual learning environment so that they could be consulted by the whole cohort. This was a formative exercise that was not assessed with a mark.
Results
Weekly responses to the AI-generated essays proved fruitful for student thinking about individual seminar topics, as well as for thinking about how to write history well. Student responses highlighted how AI-generated essays outlined the contours of a weekly seminar topic, but failed to glean some of the nuances pointed out in the recommended secondary literature. The AI-generated answers therefore provided a point of departure for students to highlight their learning from their reading. They also proved useful for students as an initial inroad for discussing weekly seminar topics, both inside and outside the classroom.
In a module feedback survey for Moses’ MA option, only 20% of respondents considered generative AI useful for learning directly about the past. Nevertheless, 80% noted that they found the AI exercise useful for their learning. It was the process of critically thinking about the AI-generated answers that students found helpful for their studies.
Reflections
At one level, widespread academic suspicion of AI is understandable. Yet AI is not going away, at least as far as we can foresee, and indeed the latest models are already much more capable than those available in 2023/24. Even if it were practically possible to police the exclusion of generative AI at university (which seems unlikely), we suggest that this route would not be desirable. It would be fundamentally inauthentic to ask students completely to deprive themselves of a near-universally available tool throughout their university study, not least when it is now embedded in most new computers and software.
This blog, therefore, is a call for creative play. We do not argue that our experiments should be generalised; if every course asked students to critique an AI-essay, or to use AI to prepare for their classes, the exercise would become stale. We are also clear that there remains room for assessment free from direct AI use, such as oral presentations, class tests and exams, let alone long-form essays for which generative AI may have been consulted only as a preparatory aid, rather than a means of completing the final product.
All this is to say that we think that History needs to address AI head-on, and not only through stern admonitions about ‘cheating’. Critical literacy in AI, for students and staff alike, is key. That way, students will not only learn about the capabilities and abilities of AI in ways that will stand them in good stead in their future careers (and lives more generally); they can also, we suggest, learn more about their own capabilities and abilities as thinkers, writers and original creators.