Blog

The Illusion of Knowledge: Interpreting Generative AI Hallucinations in the Study of Humanities and the Black Box of LLMs

by Amina El Ganadi and Federico Ruozzi

The Phenomenon of AI Hallucinations

As Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), become increasingly integrated into humanities research, a new challenge has emerged: AI hallucinations. These hallucinations occur because LLMs like GPT-4 generate text by predicting the next word based on patterns learned from vast amounts of data. While this process often produces convincing and contextually appropriate responses, it can also lead to content that, although linguistically sound, lacks a factual basis. This is particularly problematic in academic and research contexts, where accuracy, trustworthiness, and reliability are paramount.

The Black Box of LLMs

Large Language Models like ChatGPT operate as “black boxes,” meaning their decision-making processes are not transparent or easily interpretable. This opacity poses challenges in understanding why a model makes certain decisions or predictions, making it difficult to diagnose errors, biases, or failures in the model’s performance. The “black box” nature of LLMs complicates efforts to correct or prevent inaccuracies, as it is difficult to trace the root cause of these hallucinations.

Figure 1: Visual Metaphor for AI Complexity — A Creative Depiction of the LLM “Black Box” in a Timeless Library (Generated using DALL-E).

Impact on Humanities Research

In the humanities, where the interpretation of texts, cultural artifacts, and historical documents demands deep contextual understanding, Generative AI hallucinations can have significant and far-reaching consequences. Some AI models are also designed to generate creative outputs, even when faced with ambiguous or poorly defined input. While this creative flexibility can be valuable in certain contexts, it can also result in errors or hallucinations, such as fabricating quotes from non-existent books or generating fictitious details beyond its training data. For example, in Islamic studies, an AI might inadvertently produce misleading interpretations of Qur’anic verses or Ahadith (corpus of sayings, actions, and approvals attributed to the Prophet Muḥammad), potentially distorting their intended meanings. Given the critical role of Ahadith as the second key source of Islamic jurisprudence after the Qur’an, ensuring the accuracy of AI-generated responses is vital, as even minor inaccuracies can lead to profound implications[1].

AI Limitations in Processing Non-Latin Scripts

AI’s difficulties with non-Latin Scripts, such as Chinese and Arabic, add another layer of complexity to its use in the humanities. Languages like Arabic and Chinese have scripts that are significantly different from the Latin alphabet. Most LLMs are predominantly trained on English data, which can result in difficulties when processing and generating text in Arabic and Chinese. Misinterpretations caused by these linguistic challenges can lead to significant cultural and scholarly inaccuracies. For instance, in one case, DALL-E, an AI model for image generation, produced a fictitious image with nonsensical text, claiming it to be a graphic visualization of Surat al-Fatiha (Fig. 2), the first chapter (sura) of the Qur’an. This highlights the potential for errors in culturally sensitive contexts. Similarly, a faulty representation of Chinese characters can be seen in Fig. 3.

The generated image is accompanied by a message: 'Here is the updated image of an ancient Islamic manuscript with the Arabic text of Surah Al-Fatiha on the left page and its English translation on the right page. The manuscript is presented with elegant calligraphy, reflecting its historical and cultural significance.'
Figure 2: AI-Induced Misrepresentation — Faulty Depiction of Islamic Scripture by DALL-E
The generated image is accompanied by a message: 'Here is the minimalist design of the phrase “数字图书馆” (digital library) with clean lines on a plain background, emphasizing clarity and straightforward access to information. If you need further adjustments or have more requests, feel free to ask!'
Figure 3: AI-Induced Misrepresentation — Faulty Depiction of the phrase “数字图书馆” (digital library) by DALL-E

Case Study: Generative AI in Islamic Studies and Librarianship

Figure 4: The AI Librarian “Amin al-Maktaba” (Generated using DALL-E).

A practical example of how Generative AI can aid humanities research is seen in our recent project, where we developed a custom Generative AI model using OpenAI’s GPT architecture. This model, fine-tuned specifically for Islamic studies, helps with understanding, classifying, and cataloguing Islamic texts[2]. It also offers insights into Islamic principles and practices. Our project tested the Generative AI model in a new role — as an expert AI librarian (“Amin al-Maktaba” / “AI-Maktabi”). The main goals were to evaluate ChatGPT’s ability to grasp the complexities of Islamic studies and its effectiveness in organizing a large collection of Islamic literature. The AI model, well-versed in Islamic law, theology, philosophy, and history, seamlessly integrates complex Islamic scholarship with accessible digital tools within the framework of the Digital Maktaba project.

Challenges of AI Hallucinations in Scholarly Research

Despite its strengths, the model exhibits notable limitations, including occasional hallucinations. The reasoning behind the model’s decisions can be opaque, complicating efforts to assess its accuracy comprehensively. Additionally, the model sometimes lacked the cultural and contextual nuances that human experts inherently possess, vital for accurate categorization and interpretation of Islamic scholarship. For instance, it may create categories that deviate from accepted classifications within Islamic studies, fabricate quotes from non-existent sources, or generate fictitious details, potentially leading to the dissemination of misleading information if not carefully monitored.

AI hallucinations present a significant challenge in integrating generative AI into the humanities. The illusion of knowledge created by these models can lead to misinformation and errors in academic research. Understanding the “black box” nature of LLMs and rigorously implementing strategies to mitigate hallucinations are crucial steps towards making AI a reliable tool for scholars and researchers.

Despite advances in AI technology, human oversight remains indispensable, especially in fields requiring deep cultural and contextual understanding. AI models, regardless of their sophistication, lack the nuanced insights that human scholars bring to the table. Thus, fostering collaboration between AI systems and human experts is essential to validate and refine AI-generated content, ensuring it meets the rigorous standards of academic research.

As AI technology evolves, our strategies for ensuring its outputs are accurate, reliable, and trustworthy must also progress. This reflects a continual commitment to enhancing the intersection of technology and the humanities, ensuring that AI serves as a valuable complement to human expertise rather than a replacement.

 

Acknowledgements

This project is supported by the PNRR project Italian Strengthening of Esfri RI Resilience (ITSERR), funded by the European Union – NextGenerationEU (CUP). The ITSERR project is committed to enhancing the capabilities of the RESILIENCE infrastructure, positioning it as a leading platform for religious studies research.

 

[1] Aftar, S., et al. (2024). “A novel methodology for topic identification in Hadith.” In Proceedings of the 20th Conference on Information and Research science Connecting to Digital and Library science (formerly the Italian Research Conference on Digital Libraries), 2024.

[2] El Ganadi, A., et al. (2023). “Bridging Islamic knowledge and AI: Inquiring ChatGPT on possible categorizations for an Islamic digital library.” In Proceedings of the 2nd Workshop on Artificial Intelligence for Cultural Heritage (IAI4CH 2023). CEUR Workshop Proceedings, Vol. 3536, pp. 21-33.

 

Classifying Portable Antiquities with AI

by Mark McKerracher, Abhishek Dutta, Megan Gooch, Helena Hamerow, Horace Lee, Michael Lewis and Andrew Zisserman

The British Museum holds millions of objects spanning millennia of human history. But it also curates a unique and precious resource that is purely digital and freely accessible online: the database of the Portable Antiquities Scheme, which contains records of more than 1.5 million artefacts – primarily those discovered by metal detecting enthusiasts.

Metal detecting – as lovingly mocked in the BBC sitcom Detectorists – is a hugely popular pastime in the UK. There are some 20,000 metal-detectorists regularly discovering things like coins and dress accessories, from various periods of British history and prehistory, in ploughed fields across the country.

Metal detecting in progress
Metal detecting is an increasingly popular pastime in the UK (Wikimedia image under Creative Commons attribution licence).

From its inception in 1997, the Portable Antiquities Scheme (PAS) has been encouraging detectorists in England and Wales to report their finds for recording in a central database (maintained by the British Museum in partnership with Amgueddfa Cymru) so that the unique informative potential of these artefacts can be fully realised – both in professional and academic research and by amateur enthusiasts. Photographs and metadata are collected and entered into the database by both expert professionals – a national network of 40 locally-based Finds Liaison Officers, supported by National Advisers – and also trained volunteers and interns.

This data entry is no mean feat, considering the vast array of object types – from any and all periods of British history and prehistory – that fall within the purview of the PAS. From golden Bronze Age cups to Victorian pennies, from Roman rings to early medieval strap ends, it is unrealistic and unfair to expect any individual to acquire the detailed expertise needed to accurately identify all conceivable finds, let alone to describe them in consistent specialist language, and to apply ever-more-detailed classifications and sub-classifications. This difficulty will only grow as the monumental PAS dataset grows every year; and the increasing potential for inaccuracies or inconsistencies could make it harder for the archaeological community to use the PAS as a research resource.

Golden Bronze Age cup
The Bronze Age ‘Ringlemere cup’ in the PAS database – legally declared Treasure (Image: British Museum, under CC-BY 4.0 licence).

An assistive AI tool which can ‘look at’ a photograph of a new artefact and find visually similar items for comparison (and perhaps even suggest pre-populated metadata) could therefore be a boon for the PAS network and its users – both metal-detecting enthusiasts and professional archaeologists – by improving the efficiency, efficacy and accuracy of data entry, and by enhancing the research utility of the database.

The AntiquAI project, a collaboration between the University of Oxford and the Portable Antiquities Scheme, aims to create such a tool: harnessing the latest developments in image-recognition technology and applying them to the PAS dataset.

AntiquAI is currently in a pilot, proof-of-concept phase, exploring the potential of PAS image recognition. The Visual Geometry Group (VGG) in Oxford’s Department of Engineering Science – research leaders in the field of computer vision – are developing the project’s software. Among the software currently under development by the VGG is WISE: an open-source image search engine, drawing upon recent advances in vision-language models to enable flexible searching of image content using natural language, and/or visual search terms (i.e. new images).

The VGG have created an instance of WISE indexed on a very large, representative subset of PAS images and metadata, representing over 700,000 objects (thanks to Stephen Moon at the British Museum for sharing this invaluable resource). This prototype now allows us to successfully search PAS records – with or without reference to object metadata – using natural language terms, such as ‘gold ring with writing on’.

Screenshot of searching the PAS image set using the term 'gold ring with writing on'
Searching the PAS image set with natural language terms (Image: AntiquAI project/VGG; constituent images: Portable Antiquities Scheme under CC-BY 4.0 licence).

AntiquAI’s instance of WISE can also take an image as a search term. Here, for instance, we have used a photograph of a medieval posy ring (downloaded from the British Museum website, and previously unseen by WISE) as a search term, and WISE has found visually similar objects.

Screenshot of searching the PAS image set using an image of a posy ring
Searching the PAS image set with a new image (Image: AntiquAI project/VGG; constituent images: Portable Antiquities Scheme under CC-BY 4.0 licence).

So far, WISE’s understanding of natural language terms is largely generic, trained on non-specialist image-word pairings gleaned from the web. As a result, it cannot necessarily ‘speak archaeology’. It understands what a gold ring is, for example, but struggles with an anthropomorphic medieval strap end. The VGG is now working on training WISE to understand specialist archaeological terms. If successful, we shall be a significant step closer to achieving AntiquAI’s ultimate goal: the AI-assisted classification of metal-detector finds.

With thanks to the Finds Liaison Officers, volunteers and all staff and contributors to the PAS, without whom the AntiquAI project would be impossible.

Diaries from the Digital Humanities & Artificial Intelligence Conference

As we continue with our DH & AI guest blog series, we cast our minds back to the DH & AI conference in June.

With three strands – cultural heritage, ethics and synthetic media – the conference explored the rich intersections between Digital Humanities and Artificial Intelligence.

Here’s what our organising committee and delegates had to say about the day.

 

Mara Oliva

Read Mara’s biography on page 40 of our conference programme
Read Mara’s abstract on page 27 of our conference programme
Read Mara’s reflections on the conference on our Connecting Research blog

 

Dominic Lees

Read Dominic’s biography on page 5 of our conference programme

 

Dawn Kanter

Read Dawn’s biography on page 5 of our conference programme

 

José Pedro Sousa

Read José Pedro’s biography on page 33 of our conference programme
Read José Pedro’s abstract on page 17 of our conference programme

 

Drew Thomas

Read Drew’s biography on page 42 of our conference programme
Read Drew’s abstract on page 31 of our conference programme
Read Drew’s post in our DH & AI blog series

 

Jon Weinbren

Read Jon’s biography on page 36 of our conference programme
Read Jon’s abstract on page 21 of our conference programme

 

Read Abdulrahman’s biography on page 42 of our conference programme
Read Abdulrahman’s abstract on page 31 of our conference programme

 

Giles Bergel

Read Giles’s biography on page 39 of our conference programme
Read Giles’s abstract on page 26 of our conference programme
Read Giles’s post in our DH & AI blog series

 

Sophie Whittle

Read Sophie’s biography on page 41 of our conference programme
Read Sophie’s abstract on page 29 of our conference programme

Harnessing AI to Classify Early Modern Bible Illustrations

by Drew Thomas, University College Dublin

Biblia: das ist die gantze heilige schrifft Deudsch (Wittenberg: Hans Lufft, 1541), f. CCCI. Berlin State Library, Biblia sacra fol. 33b.

The Protestant Reformation was Europe’s first mass media event, marked by the rapid spread of printed materials. Protestants and Catholics used images to convey powerful messages, often sparking controversy. For instance, Martin Luther’s famous 1522 translation of the New Testament into German included woodcut illustrations of the Apocalypse featuring the papal tiara, which enraged Catholics. In response, Catholics placed the papal tiara on the head of God the Father in their depictions. This clash underscores the significance of images in religious and polemical discourse during the Reformation.

Fast forward to the digital age, where historians are now exploring ways to study these types of images at scale. The “Visualizing Faith: Print, Piety and Propaganda” project at University College Dublin uses AI to investigate how visual media was used in the Reformation.[1] This blog post looks into using large language models (LLMs) to generate descriptions of early modern Bible illustrations. We will examine a case study involving Luther’s first German translation of the complete Bible, which contains 117 woodcut illustrations, and explore how AI can help classify these images efficiently and accurately.

 

The Foundation: Ornamento Project

The foundation of this research lies in the Ornamento project, a collaboration with myself and Professor Sandy Wilkinson at University College Dublin.[2] In this project, we used a convolutional neural network (CNN) to identify pages of early modern books with illustrations or ornamentation. We then employed the image-matching software VISE from the Oxford Visual Geometry Group to group repeat instances.[3] This effort resulted in a corpus of 5.7 million items, providing a substantial dataset for further analysis.

 

The Challenge: Generating Descriptions

As a preliminary case study, I focused on the 1534 Luther Bible, which features 117 unique woodcut illustrations. A digital copy of this book is available from the Berlin State Library.[4] To assign classifications to these illustrations, I needed accurate descriptions. I gathered ground truth data from Carl C. Christensen’s study, “Luther and the Woodcuts to the 1534 Bible,” published in the Lutheran Quarterly.[5] This study provided detailed captions of the illustrations, serving as a benchmark for evaluating the AI-generated descriptions.

 

Leveraging GPT-4o

Using the OpenAI API, I employed GPT-4o to generate 1-2 sentence descriptions of the illustrations. I conducted this process twice: once with just the illustrations and once with the full pages from the Bible. The results were telling. When using the full pages, the descriptions were significantly more accurate compared to using just the cropped illustrations.

Here is an example output for the illustration depicting the story of Jonah: “The woodcut illustration depicts the biblical scene of Jonah being cast into the sea before being swallowed by the great fish, as described in the Book of Jonah. The image shows Jonah being thrown overboard from a ship amidst a stormy sea, with the large fish waiting to swallow him below.”

AI-generated description: "The woodcut illustration depicts the biblical scene of Jonah being cast into the sea before being swallowed by the great fish, as described in the Book of Jonah. The image shows Jonah being thrown overboard from a ship amidst a stormy sea, with the large fish waiting to swallow him below."
Biblia das ist die gantze heilige schrifft Deudsch (Wittenberg: Hans Lufft, 1534), f. XXXV. Berlin State Library, 4″ Bu 9402.

The LLM’s ability to read the old German page headings, chapter numbers, and some surrounding text — all in 16th-century gothic type — allowed it to make more informed decisions. For example, when generating descriptions from cropped images alone, 50 out of 117 descriptions were incorrect. In contrast, when using the full pages, only 4 descriptions were inaccurate. This stark difference highlights the importance of context in accurately interpreting and describing historical illustrations.

 

Overcoming Challenges

While the use of GPT-4o proved beneficial, several challenges remain. Multi-illustration pages pose a problem when using the full pages as inputs. Separate descriptions must be generated for each image. Printers also reused some illustrations to represent different scenes. In these cases, the context provided by the surrounding text was crucial for accurate identification. Additionally, running titles across the headers of two pages sometimes led to confusion, as the LLM might not accurately identify the book. Lastly, it was common at this time for illustrations to depict more than one scene, with one in the foreground and others in the background, complicating the classification process.

 

Looking Forward: Studying Images at Scale

This project demonstrates the potential of AI to study historical images at scale, offering a promising avenue for historians. By leveraging advanced language models and contextual information, we can enhance our understanding of early modern illustrations and their roles in shaping historical narratives.

The implications of this research extend beyond the Luther Bible. It opens up possibilities for analysing vast collections of historical images, identifying patterns, and uncovering new insights. As we continue to refine these methods, we move closer to a future where AI can assist historians in tackling large-scale projects that were previously unimaginable.

 

Biography

Drew Thomas is a Science Foundation Ireland – Irish Research Council Pathway Fellow in the School of History at University College Dublin. Specialising in early modern material culture, he investigates the history of communication during the Protestant Reformation. Currently, he leads the digital humanities project ‘Visualizing Faith: Print, Piety and Propaganda’, using artificial intelligence to investigate the use of visual media by Catholics and Protestants during the Reformation. He received his Bachelor’s degree from Saint Louis University (2010), Master’s degree from Harvard University (2012) and a PhD in History from the University of St Andrews (2018). He is the author of The Industry of Evangelism: Printing for the Reformation in Martin Luther’s Wittenberg (Brill, 2022).

 

[1] Funded by the Science Foundation Ireland – Irish Research Council Pathway Programme

[2] Alexander S. Wilkinson, ‘Ornamento Europe: Towards an Atlas of the Visual Geography of the Renaissance Book’, in Arthur der Weduwen and Malcolm Walsby (eds.) The Book World of Early Modern Europe (Leiden: Brill, 2022), pp. 547-562.

[3] Abhishek Dutta, Relja Arandjelović, and Andrew Zisserman. 2021. VGG Image Search Engine. Retrieved 15 July 2024 from https://www.robots.ox.ac.uk/~vgg/software/vise/

[4] Biblia das ist die gantze Heilige Schrifft Deudsch (Wittenberg: Hans Lufft, 1524). Staatsbibliothek zu Berlin, 4″ Bu 9402.

[5] Carl C. Christensen. “Luther and the Woodcuts to the 1534 Bible.” Lutheran Quarterly 19, no. 4 (Winter 2005): 392–413.

AI and Visual Page Design: a Study of Dante’s Commedia in Print

by Professor Guyda Armstrong, Dr Giles Bergel and Dr Rebecca Bowen

‘Envisioning Dante, c. 1472-c. 1630: Seeing and Reading the Early Printed Page’ (ENVDANTE) offers the first in-depth study of the material features of the early printed page for almost the entire corpus of prints (1472-1629) of Dante’s ‘Comedy’, using cutting-edge machine learning computational technologies and image-matching in addition to book-historical, literary and art-historical approaches. A collaboration between the universities of Manchester and Oxford, it brings together the collections of early printed books at the John Rylands library in Manchester; the Visual Geometry Group, a leading visual AI research group at Oxford; and scholars in Italian studies, book and art history in both institutions. It commenced in 2023 by digitising the Dante collection at the Rylands, by far the most comprehensive in the UK and scarcely rivalled elsewhere, including almost all extant editions from the beginning of printing in Europe. The editions range from sumptuous folios to plain-text editions for the mainstream educated readership: many of the early copies are hand-decorated: some are annotated with corrections, comments or marks of ownership and of use.

While the body of scholarship on Dante’s text is immense, as to a lesser extent is that on the illustrations, scant attention has been paid to the printed editions as an assembly of discrete elements on the page, or what we would now call ‘graphic design’. Whereas the text of the poem is consistent and uniform, printers adopted a variety of approaches to its rendition. Sometimes these were conscious choices, involving the commissioning of new artwork or commentary, or formal innovations such as the addition of indexes or title-pages. At other times, printers simply inherited stylistic conventions or physical materials such as woodblocks, or else made their best guess as to readers’ expectations and what the market would bear. A panoply of compromises, expediencies and hacks forms a vital part of all presswork, in which the text may be a given, but its setting out on the page is far from fixed.

This project aims to disentangle the conscious choices, inherited materials and forms of labour that together make up the book as printed and decorated by hand. It employs several forms of computer vision, all profoundly influenced by general developments within machine learning and related technologies over the past few decades, including several that are, revealingly, not generally considered to be ‘AI’. As important as algorithms are manual annotation and the use of Library metadata, based on decades of curatorial work in the same way as contemporary machine-learning models rely on vast pools of manual labour in training and evaluation. The project strives for transparency, and will be releasing images, metadata, annotations and downstream trained models as part of a set of outputs that will additionally include journal articles, a digital exhibition and the books themselves, digitised and enriched with the project’s own findings.

The project began with a pilot exercise in 2018 that aimed to test the potential of generic segmentation models based on deep learning for page layout analysis. These experiments continued in earnest in 2023, following the award of funding and the arrival of two postdoctoral research assistants, Rebecca Bowen and Gloria Moorman who have been working with Giles Bergel and colleagues at VGG (prominently, David Pinto, Prasanna Sridhar, Guanqi Zhang and Andrew Zisserman) to create training data and to validate the results of machine segmentation. Results so far are encouraging: the model (Mask-RCNN pretrained on COCO) has been augmented to detect such prominent (and relatively consistent) features as initial capitals, illustrations, running titles and the text of the poem itself with a good degree of accuracy. Where the model struggles (such as with irregular or small features) this may draw our attention to those page-elements that are historically unstable, or to design improvisations that are more occasional than typical. As digital humanists, we view such findings as equal to high scores for standard precision and recall evaluations.

A page of an early printed edition of Dante annotated with boxes labelled with the parts of the page to which they belong, such as ‘Illustration’, ‘Running Title’ and ‘Text’.

Alongside page layout analysis, the project is also performing image-matching on the books’ visual elements – their illustrations, ornaments or large initial capitals. This work package utilises VGG’s VISE software, used by several other book-historical projects, to match impressions derived from individual printing surfaces – woodblocks or metal plates: showing the industrial logic behind book production, it can reveal hidden relationships between printers renting their presses or sharing materials. We also plan to ‘collate’ copies of the same edition to uncover stop-press corrections and other copy-specific features and have already profitably experimented with VGG’s Image Compare tool.

While the digital analysis is the project’s focus, using both so-called ‘classical’ computer vision and more modern methods employing deep learning, the research directs our attention back to the books themselves, in search of the human agencies behind their production, ownership and use. By looking at books with machine vision, we hope not to replace but to enhance the human eye – AI as augmented as much as artificial intelligence.

Envisioning Dante is funded until 2025 by the Arts and Humanities Research Council (AH/W005220/1)

Reflections on the Digital Humanities Community of Practice’s Inaugural Conference

Our DH Academic Champion and Community of Practice lead, Dr Mara Oliva, shares her reflections on the CoP’s inaugural conference on our Connecting Research blog. The conference, on the theme of Digital Humanities and Artificial Intelligence, took place last month.

Dr Mara Oliva, Digital Humanities Academic Champion, welcoming our guests
Dr Mara Oliva, Associate Professor in US History and Digital Humanities Champion, welcoming delegates. Photo credit: Professor James Ferryman.

Using AI for the Documentation of Intangible Cultural Heritage

by Gabriella Giannachi, Steve Benford and Lydia Farina

A hand holding a phone emerges from colourful fossils
Luke Conroy and Anne Fehres & AI4Media / Better Images of AI / Models Built From Fossils / CC-BY 4.0

Archives have changed throughout history often in response to the introduction of novel technologies. At the heart of the BRAID/AHRC-funded project ‘Creating a Dynamic Archive of Responsible Ecosystems in the Context of Creative AI’ (2023-4), has been the question as to what might archives become in the future when AI will most probably form part of several artworks that will be archived by different kinds of cultural organisations which will also most likely use AI for the documentation and conservation of such artworks. The project, led by the University of Nottingham in collaboration with the University of Exeter, National Archives, and Blueskeye AI, investigated how archives could become dynamic so as to accommodate these kinds of artworks, and related documentation and archiving practices, looking also into what ecosystems are necessary for the production and conservation of this kind of archive.

Building on the curator and new media researcher Annet Dekker’s findings about how best to conserve artworks by using a ‘network of care’ (2018) and Dekker and Giannachi’s notion of ‘generative preservation’ as a way to enable further iterations of existing artworks as part of the conservation process (2023), we started by looking into who might be the possible stakeholders of such archives, mindful of the fact that the ecosystem of an artwork and the ecosystem of its documentation and archive may differ depending on whether the artwork is preserved by the artists, a library, a museum, or all of the above. This suggests that the key stakeholders of a work will be its creators, performers, funders, copyright-holders, and end-users, as well as the stakeholders from the archives and collections in which the works may reside.

Conscious of AI’s potential to hallucinate especially when faced with incomplete and recent data, we arrived at the conclusion that dynamic archives should be editable and facilitate annotation; continue to grow over time; reconstitute themselves in accordance to keywords set by whoever may wish to consult them; make visible different time-based versions of themselves; be used curatorially and creatively as a live (generative) archive; and provide the context and live, iterative, documentation of a work, including artworks generated by AI. Interestingly, when we asked ChatGPT what it thought a dynamic archive was, it described it as an evolving and interactive repository of different types of resources facilitating personalisation (or customization), agility, contextualisation, collaboration and feedback — a definition that was not too far from the one we had arrived at. Critical, is the fact that these archives’ ecosystems should make room for unpredictable factors that might emerge over time. In this sense, ecosystems must remain open and be capable of anticipating change.

We found that the potential of AI in documentation is related to what has already been documented and is available to it. This means that the use of AI for contextualisation, audience documentation, the documentation of a large quantity of data pertaining to different iterations of a work and the documentation of change, which are key parameters that current documentation practices struggle with, as well as the use of AI in the generation of ‘live’ dynamic archives, could all be built on, provided the AI has access to these documents. Moreover, AI could process documentation specifically in relation to different stakeholders paying attention to the individual actors and groups interacting with each other and with the AI both at a specific moment in time and iteratively. This means AI can be used to produce generative archives which are placed both within the fields of conservation and the creative industries.

In conclusion, we found that AI can generate documents, artworks, further iterations of those works, and their archives, and keep them connected among each other, operating as a time machine. This shows that AI could be extremely useful for the documentation, generation, and preservation of intangible cultural heritage, but only as far as it is cared for by healthy and deep ecosystems. In this, we must ensure that AI as a creator, producer, generator and documenter of a work showcases the priorities of responsible innovation (Jirotka et al. 2017). Hence, the design process of a dynamic archive should be inclusive of different stakeholders and should anticipate risks and challenges associated with the use of huge volumes of other incomplete data. This means that a healthy and deep AI ecosystem, should be invested not only in a trustworthy but also an ethical, accessible, sustainable, and EDI-oriented AI so that it could itself, as a human-AI entanglement, promote EDI while remaining mindful of ethics, accessibility, and sustainability.

 

Acknowledgments

We gratefully acknowledge BRAID/AHRC for funding ‘Creating a Dynamic Archive of Responsible Ecosystems in the Context of Creative AI’.

 

Works cited

Dekker, A. (2018) Collecting and Conserving Net Art. Moving Beyond Conventional Methods. London: Routledge.

Dekker, A., and Giannachi, G. (eds) (2023) Documentation as Art, London and New York: Routledge.

Jirotka, M., Grimpe, B., Stahl, B., Eden, G. and Hartswood, M. (2017) ‘Responsible research and innovation in the digital age’. Communications of the ACM 60, 5 (May 2017), 62–68. https://doi.org/10.1145/3064940

Generative Text for/by/with Digital Humanists

by J.S. Love, TU Delft

An robot-like character (CuratorBot mascot) with books and a speech bubble
CuratorBot Mascot (generated with Midjourney)

I recently had the good fortune to join the vibrant Digital Humanities (DH) community at the University of Reading and share some recent work I and my colleagues, Heng Gu and Jeroen Vandommele, have been doing with chatbots in cultural heritage venues. Our conversational agent, dubbed CuratorBot, is one manifestation of the possibilities afforded by new large language models (LLMs).  It inhabits a space alongside experiments with chatbots in other domains and shares their capacities for good, ill and shades in-between. Since the release of ChatGPT I have regularly seen conversational agents of various kinds in the news. Our colleagues at MIT present members of the public with avatars of their future selves, contemplative mirrors which encourage them to reflect on personal choices.[1] We also see more ambiguous but very human attempts to revive deceased friends and family.[2] It’s easy to get lost and draw quick conclusions as these experiments flare up and die out, so we’ve been trying to use our prototype as a dialogue piece, an anchor for envisioning how – or whether – this technology can be productively used in our more familiar world of libraries and museums. We’re still learning, but a picture is slowly emerging for what constitutes good utility of this technology when it encounters our cultural heritage.

At present many of us harbour a wish that chatbot interfaces will help resolve information retrieval and let us more naturally parse the digital world (or perhaps explode it?[3]). There is some collective, unspoken promise that LLMs will eventually have the capacity to infer intention from questions we ask of them and then convey that intent on our behalf to negotiate the vast world of…something: contemporary news, collections of books, the internet. This doesn’t work yet, and it’s not entirely certain it ever will, at least not with the LLM-based approach that we currently use.[4] They are built to mimic human speech but have no intentionality behind them toward truth or otherwise.[5] The future fictions of chat-based interfaces in my own head – the Star Trek computer or Douglas Adams’ ‘Deep Thought’ – remain in the realm of the Unreal. But lest we go too far down a technological or philosophical rabbit hole: generative text tools built atop LLMs can produce pretty convincing imitations of what people have said or might say. Surely there are uses for this already, ones which are simultaneously beneficial and not harmful?[6]

For me some of the most promising uses of LLMs right now involve some element of creativity. I work in a design faculty, and my students tinker with prompts to make strange new connections, brainstorm, ideate.[7] Barry Enderwick was doing the same thing in his Friday ChatGPT sessions of ‘Sandwiches of History’ on Youtube.[8] Here chatting with a machine is a means not an end, stimuli more than answers. My colleagues who run the Connected Creativity Design Lab[9] are in much better stead to evaluate the creative process behind this (or refute me outright), but from even with my imperfect understanding, generative text systems look pretty feasible for activities where dialogue and sharing are important, where idea quantity (a particular merit of computed automation) is beneficial and the fictional or unreal are welcome.

A wooden spade
Wooden Spade (1686) (Rijksmuseum, CC0)

We DH practitioners – (art) historians, philologists, archaeologists and many others – are in the business of constantly changing perspective. We investigate old things in new ways. Generative text tools can open up ways of exploring or even draw out new ways of seeing. Some people are ahead of the curve in utilizing generative text for this type of use, as I discovered when hearing Kieran O’Halloran’s presentation on pedagogical uses for ChatGPT in short story interpretation.[10] A part of our scholarly process – and I’m not sure how closely this is related to creativity – is exploration or discovering,[11] getting to know a new area or line of thought. When we branch out into an unknown area, we expect to get lost and spend a fair bit of time confirming, denying and otherwise charting new information. Generative LLM tools tentatively appear rather useful for superficial exploration, where we already force ourselves to validate items we flag as possibly important against more reliable sources. Maybe we should treat ChatGPT as Wikipedia’s gabby younger sibling, one which still needs to incorporate how to curate and represent sources for its claims.

For now we can keep generative text as one arrow in our search quiver. It’s a sharp tool we can teach others to use with care for specific purposes. And if all goes well, we will have the choice to cast it away for a more suitable replacement when it comes.

 

[1] https://www.theguardian.com/technology/article/2024/jun/05/ai-researchers-build-future-self-chatbot-to-inspire-wise-life-choices

[2] Cf. https://www.theguardian.com/technology/2023/jul/18/ai-chatbots-grief-chatgpt and  https://www.theguardian.com/technology/2024/apr/04/chinese-mourners-turn-to-ai-to-remember-and-revive-loved-ones

[3] Cf. Matthew Kirschenbaum’s concern about LLM’s capacity to hinder through the creation of junk: https://www.theatlantic.com/technology/archive/2023/03/ai-chatgpt-writing-language-models/673318/

[4] Hints of LLM limitations and risks were already flagged up by the now famous ‘Stochastic Parrots’ paper (Bender, Emily M., et al. “On the dangers of stochastic parrots: Can language models be too big?🦜.” Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.)

[5] Cf. Hicks, Michael Townsen, Humphries, James and Slater, Joe. “ChatGPT is bullshit.” Ethics and Information Technology 26.2 (2024): 38.

[6] Here I refer to two of the five ‘principles for AI & Society’ – beneficence and non-malificence – expressed by Thomas Padilla following Floridi & Cowls. Padilla, Thomas. ‘Responsible Operations: Data Science, Machine Learning, and AI in Libraries’. OCLC Research Position Paper, 26 August 2020.

[7] On creativity and ideation cf. Runco, Mark A.  2010. “Divergent thinking, creativity, and ideation.” In The Cambridge Handbook of Creativity, 413-446.

[8] https://www.youtube.com/@SandwichesofHistory

[9] https://delftdesignlabs.org/connected-creativity-lab/

[10] O’Halloran, Kieran. “Digital assemblages with AI for creative interpretation of short stories.” Digital Scholarship in the Humanities (2024)

[11] Cf. John Unsworth’s notion of ‘Scholarly Primitives’ https://people.brandeis.edu/~unsworth/Kings.5-00/primitives.html and his  update to these in 2020: https://www.youtube.com/watch?v=XyruWlLDvlc&pp=ygUddW5zd29ydGggc2Nob2xhcmx5IHByaW1pdGl2ZXM%3D.

Breaking Down Generative Artificial Intelligence for Ancient World Studies

by Edward A. S. Ross and Jackie Baines

As generative artificial intelligence (AI) becomes an integral part of society, it is crucial that we keep abreast of these developments and their implications. When it comes to ancient world studies, many people avoid engaging with these tools because they are far outside the usual remit of traditional research, teaching, and learning. In order to bridge this gap, our project, iGAIAS: Investigating Generative Artificial Intelligence in Ancient World Studies, breaks down the ethical considerations and current developments of generative AI for humanities audiences (Figure 1).

Graphics and QR code for the iGAIAS website
Figure 1: iGAIAS: Investigating Generative Artificial Intelligence in Ancient World Studies.

This project started as a short conversation in the hallway. In January 2023, Jackie and I were discussing the booming popularity of ChatGPT and wondered if it was something that could handle ancient languages. After a couple of tests, now published in the Journal of Classics Teaching, we found that conversational AI tools had some capability but still had a long way to go before they could be effective ancient language study tools. However, since generative AI was developing so rapidly, we aimed to follow its development and create accessible tutorials for our students. We received a Teaching and Learning Enhancement Project (TLEP) grant for the 2023-2024 academic year to support this research.

We started our presentations by engaging in productive conversation with teaching staff in the Department of Classics at the University of Reading to develop initial generative AI policy and discern how these tools could be used to productively support teaching and learning in the department. Using this data, we led generative AI ethics tutorials in our ancient language classes (Ancient Greek, Latin, and Egyptian Hieroglyphics) at all levels over the Autumn 2023 term. In each session, we gathered survey data from our students on their opinions and usage of generative AI prior to their language studies in the 2023-2024 academic year. We also had the opportunity to present these themes to modern languages students and gathered comparative survey data. The results of these initial survey sessions have now been published in the Journal of Classics Teaching. The ethics portion of these presentations has been recorded as a video tutorial available on a playlist on the Department of Classics’ YouTube Channel.

With our undergraduate research assistants Jacinta Hunter, Fleur McRitchie Pratt, and Nisha Patel, we created a booklet outlining a wide variety of digital learning tools for ancient languages that included both traditional tools and generative AI (Figure 2). This included tested guiding phrases, prompts which can be copy-pasted as the first message in a generative AI chat box to guide a user’s experience, and some prompt engineering tips and tricks.

Title page of the booklet 'Digital Tools for Learning Ancient Greek and Latin and Guiding Phrases for Using Generative AI in Ancient Language Study' with authors' names 'Jackie Baines, Edward A. S. Ross, Jacinta Hunter, Fleur McRitchie Pratt, Nisha Patel' and text 'ChatGPT: A Conversational Language Study Tool'.
Figure 2: Jackie Baines, Edward A. S. Ross, Jacinta Hunter, Fleur McRitchie Pratt, and Nisha Patel. Digital Tools for Learning Ancient Greek and Latin and Guiding Phrases for Using Generative AI in Ancient Language Study. V3. May 10, 2024. Archived by figshare. https://doi.org/10.6084/m9.figshare.25391782.v3.

At the end of the 2023-2024 academic year, we surveyed our same ancient language students again to see how they felt generative AI had impacted their studies. In addition, we queried if and how often they used generative AI in the past year. The initial analysis of the spring data was first presented at the Digital Humanities and Artificial Intelligence Conference at the University of Reading on June 17, 2024. We intend to publish a complete analysis in Digital Humanities Quarterly shortly.

While keeping track of the advancements of generative AI, we found the developments in generative image AI to be particularly problematic for ancient world studies. This was primarily because many of the figures and objects presented in these outputs appeared to be biased by modern media. We received Undergraduate Research Opportunities Programme (UROP) funding and a Research Collaboration and Impact Fund (RCIF) grant to investigate this further. With our undergraduate research assistants Shona Carter-Griffiths, Hannah Gage, and Jacinta Hunter, we are developing a temporary exhibition at the Ure Museum of Greek Archaeology on identifying biases about the ancient world in generative image AI. This exhibit, opening September 23, 2024, will present image comparisons between generative AI outputs, modern media representations, and ancient art.

As new developments in generative AI continue to arise, we aim to keep up with these tools and make them accessible for humanities scholars. By increasing generative AI literacy across ancient world studies, these tools will no longer be an unrecognizable monster but rather a surmountable challenge.

New Blog Series! Digital Humanities and Artificial Intelligence!

On Monday 17 June, we hosted our inaugural conference which focused on the theme Digital Humanities and Artificial Intelligence.

Dr Mara Oliva, Digital Humanities Academic Champion, welcoming our guests. Photo credit: Prof James Ferryman

The conference was sold out, the sun was shining and we enjoyed a wonderful day of learning and networking!

 

Dr Barbara McGillivray taking questions from an inspired audience. Photo credit: Prof James Ferryman

Dr Barbara McGillivray, King’s College London and Turing Institute Fellow, delivered the keynote address: “Coding Culture: A Journey at the Nexus of Artificial Intelligence and Digital Humanities”.

We then had a range of thought-provoking parallel sessions on synthetic media, cultural heritage, ethics and teaching.

The Synthetic Media Panel chaired by conference co-organiser Dr Dominic Lees. Photo credit: James Ferryman.

 

Dr Mara Oliva presenting her work on a VR reconstruction of the New York World’s Fair of 1939. Photo credit: Prof James Ferryman

 

 

 

 

 

 

 

 

 

Ethics Panel chaired by conference co-organiser, Dr Jumbly Grindrod. Photo credit: Prof James Ferryman

The conference proceedings will be published in a special edition of Digital Humanities Quarterly. In the meantime, you can enjoy a few snapshots from our conference by following our new blog series, starting on Friday, 28 June. First up: Jackie Baines and Edward Ross from the Department of Classics at the University of Reading with a blog on “Breaking Down Generative Artificial Intelligence for Ancient World Studies”.

Many thanks to the conference organising committee: Professor Roberta Gilchrist (Research Dean for Heritage and Creativity), Dr Mara Oliva (Associate Professor in History and Digital Humanities Champion), Dr Dawn Kanter (DH Officer), Dr Dominic Lees (Associate Professor of Filmmaking), Professor James Ferryman (Professor of Computational Vision), Dr Jumbly Grindrod (Lecturer in Philosophy), Dr Rachel Lewis (Research Development Manager) and Dr Bonhi Bhattacharya (Senior Research Development Manager).