Handwritten Text Recognition of Medieval Documents
3 Zoom online sessions | NOV 8 and 22, DEC 6
3-day workshop in Vienna | DEC 18–20
Call for applications | Poster
We are witnessing a groundbreaking transformation in the study of manuscripts due to machine learning tools which have been developed to enable automatic transcription of documents. These innovative tools are becoming a significant part of large and small editorial projects, and are being leveraged by libraries to enhance the accessibility and searchability of their collections.
For the third year in a row, the Institute for Medieval Research of the Austrian Academy of Sciences, in collaboration with MARBAS, Princeton University and the Faculty of Arts, Comenius University in Bratislava, will host another HTR Winter School, focusing on the automatic recognition of handwritten texts in medieval documents. The course consists of two main parts: 3 online sessions (Zoom, Friday afternoons) and a three-day in-person workshop in Vienna. Between meetings, we will use two online platforms, Notion.so and Discord.com, to coordinate our work. During the first phase, participants will be briefly introduced to the theory of handwritten text recognition and comprehensively instructed in its practical application using the tool Transkribus (transkribus.eu).
We will work in six groups, focusing on various languages and scripts:
Carolingian Latin | Late Medieval Latin | Byzantine Greek | Medieval Czech | Medieval German | Syriac
Each group will have its own supervisors and will transcribe passages from selected manuscripts. We will then use these passages for training new HTR models and improving existing models, while learning about various intricacies and challenges. Participants are also welcome to consult their own material in addition to the team work.
During the in-person workshop in Vienna, we will finalise the projects and visit libraries in Vienna to view the selected manuscripts. At the end of the course you will receive a certificate and the results will be published on Zenodo and HTR-United with the names of the contributors.
The course is primarily designed for Master‘s or PhD students, however, we will consider other applicants as well.
You are expected to be familiar with the language of the group you want to join: Carolingian Latin, Late Medieval Latin, Byzantine Greek, Medieval Czech, Medieval German, and Syriac.
We expect you to have knowledge of medieval palaeography and manuscript studies. If in doubt, don‘t hesitate to ask.
The course will be taught in English.
There is no participation fee, but you are expected to cover the cost of your travel to Vienna, including accommodation.
SCHEDULE
- What is HTR? A general introduction
- Transkribus 1 (uploading documents, layout recognition, simple transcription)
- Introduction into manuscripts and working in groups
Session 2 | Friday NOV 22
- Transkribus 2 (introduction to models, CER, learning curve)
- Working in groups (first transcripitions)
Session 3 | Friday DEC 6
- Exporting documents / Sharing is caring: How to share and publish your data 1 (introduction)
- Tagging structures and text
- Working in groups (training of models)
Wednesday Dec 18
- Finishing work on the transcription and models
- Sharing is caring: How to share and publish your data 2
- Working in groups
Thursday Dec 19
- Alternatives to Transkribus
- What can you do with automatic transcriptions?
- Library visit - See the manuscripts
- Working in groups
Friday Dec 20
- Time to finish the work
- Presentation of results by working groups