"MIND THE GAP! Advancing Cuneiform Studies through Digital Collaboration"
Vortrag von G. R. Smidt, E. Lefever, K. De Graef, K. K. T. Chandrasekar, L. Foket (Gent) | Digital Classicist Seminar
The cuneiform corpus is rich in material and challenges for digital projects. With more than 500.000 cuneiform texts currently excavated, the size of the corpus makes it a cornerstone in extant ancient witness accounts. As a corpus, it is ripe with data that not only benefits researchers of cuneiform cultures, but various fields such as linguistics, economics, and philosophy. The objects themselves pose interesting challenges as well, they are written in a tradition that can be traced back to the first ever texts and they are mainly impressed into clay as a 3D script. Working with this corpus in the 21st century CE requires approaches that in some cases are still under-developed for our field. Open access to the corpus is essential as objects have been strewn in collections all over the world with often little regard to coherence of text assemblages and place of origin. Current advances in machine learning have progressed the state-of-the-art for ancient language processing. Automatic recognition of signs can speed up the process of reading the immense number of tablets and potentially mitigate issues stemming from difficult-to-read signs. Language models will help us grasp ancient languages better, we will be able to quantify observations and contextualise close readings.
When working to help develop such solutions, we recognise that cooperation between cuneiformists and digital specialists is paramount. The CUNE-IIIF-ORM project is centred around cooperation. Our goals are to disseminate, increase and augment the corpus of Old Babylonian (c. 2000-1600 BCE) Akkadian texts. Old Babylonian texts from the Royal Museums of Art and History will be digitized, annotated with meta and textual data, and linked with relevant texts. We utilise the International Image Interoperability Framework (IIIF) to create manifests that can be exhibited online in formats fitting for the user, and that can be freely accessed and redistributed. To increase the number of texts in the corpus of Old Babylonian Akkadian texts, we work with high-quality annotations of 2D+ images to create an Optical Character Recognition (OCR) model. The goal is to create a pipeline that will assist in making digitized textual publications. With Natural Language Processing (NLP) we will develop the tools needed to semi-automatically annotate Old Babylonian Akkadian texts and later query these texts, which can provide us with both a deeper and more nuanced knowledge of Akkadian.
During this talk we will introduce the project’s goals and we will account for how the three elements (IIIF, OCR and NLP) are intertwined. Furthermore, we will delve into each of the three elements separately, but mainly focus on NLP as the textual content is ultimately the message that is carried over up to 5000 years.
Zeit & Ort
19.12.2023 | 16:00 c.t.
Berlin-Brandenburgische Akademie der Wissenschaften
Staatsbibliothek Berlin
Unter den Linden 8
10117 Berlin
Raum 07W04