Predicting the Past | Contextualising, restoring, and attributing ancient texts

Ancient History relies on disciplines such as Epigraphy, the study of inscribed texts known as “inscriptions”, for evidence of the thought, language, society and history of past civilizations. However, over the centuries many inscriptions have been damaged to the point of illegibility, transported far from their original location, and their date of writing is steeped in uncertainty. We present Ithaca, the first Deep Neural Network for the textual restoration, geographical and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian's workflow: its architecture focuses on collaboration, decision support, and interpretability.

Example of restored inscription — Restoration of a damaged inscription, recording a decree from 485/4 BCE concerning the Acropolis of Athens (IG I³ 4B, CC BY-SA 3.0, WikiMedia).

While Ithaca alone achieves 62% accuracy when restoring damaged texts, as soon as historians use Ithaca their performance leaps from 25% to 72%, confirming this synergistic research aid's impact. Ithaca can attribute inscriptions to their original findspot with 71% accuracy and can date them with a distance of less than 30 years from ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in Ancient History.

Example results table — Ithaca's experimental results. Evaluated methods for text restoration, geographical (region) and chronological attribution (date) on I.PHI's test set. For CER and Years, lower scores are better (↓).

This work shows how models like Ithaca can unlock the cooperative potential between AI and historians, transformationally impacting the way we study and write about one of the most significant periods in human history.

Ithaca was conceived and researched by Yannis Assael*, Thea Sommerschield*, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, and Nando de Freitas. This web experience was developed and built on Google Cloud by Justin Grayston, Benjamin Maynard, and Ricardo Cardenas.

Read the paper

View on GitHub

Use Ithaca for your research

Enter your ancient Greek epigraphic text, including spaces, in the box below to restore missing characters, and attribute the inscription to its original place and time of writing.

Use a question mark (?) for each character you want the model to predict. Each query can predict up to 20 question marks (consecutive or not).
Use a single hash (#) to predict text sequences of unknown length:

You can also adjust the maximum expected length of the restoration using the bar sliders.
The total number of characters predicted across both unknown (#) and known (?) length gaps cannot exceed the maximum length you set.
Your text should not have any consecutive unknown length restoration characters (e.g. ##), or an unknown length restoration character adjacent to a fixed length character (e.g. ?#).

Use a dash (-) for any missing sections or characters in your text that do not need restoring. You can also use dashes to pad a short text.
When performing restoration, you can adjust the sampling temperature, which controls how creative or conservative the restoration outputs are. Lower is more like a formulaic funerary text, higher is more like a dedicatory verse inscription.
To restore longer sequences, input inscription images, or inspect more restoration hypotheses, please refer to the Colaboratory notebook.

θεος τυχα γνεφας αναγυλλα σιβυλλα επερωτοντι τον θεον αι τα δικαια μαστευοντι ταυταν νικην περι θηματιο

103/760

To run a restoration you need to add at least one "?" or "#" character for the system to restore

To include as much historical context as possible, we extend the list of retrieved parallels to incorporate the validation and test sets. These sets are not used for training and do not affect the model's predictions.

Attribution outputs

Geographical attribution hypotheses

Bar chart and map distribution for Ithaca's Top-10 geographical attribution hypotheses, ranked by probability among 84 regions of the ancient world. The circle size on the map is directly proportional to the prediction's probability.

Geographical attribution saliency map

Saliency map highlighting in purple shading which unique input text features contributed the most to Ithaca's top geographical attribution hypothesis, computed in the section above.

Chronological attribution hypotheses

Ithaca's chronological attribution hypotheses, visualised as a categorical distribution over decades, in yellow, between 800 BCE and 800 CE. The average of the distribution is depicted with a green line. This enables the handling of date intervals more effectively and aids the interpretability of the hypotheses.

Chronological attribution saliency map

Saliency map highlighting in purple shading which unique input text features contributed the most to Ithaca's top chronological attribution hypothesis, computed in the section above.

Restoring and attributing ancient texts using deep neural networks