The task area is part of the diverse long-term project activities that are bundled at the ACDH-CH in the project cluster Dictionary of Historical Bavarian Dialects in Austria and South Tyrol (WBÖ). The focus of the task area “WBÖ – Corpus & Infrastructure” is on the processes of data generation (especially in the form of image- and full-text-digitization of historical analogue data), data preparation (primarily in the form of data enrichment, annotations and linking) and the open-access-publication of all data in the Lexical Information System Austria (LIÖ).

The core element is the so-called WBÖ-Document Database. It is based on the paper slips of the WBÖ main catalog, compiled from the 1910s onwards. The majority of this document collection (from the section of the letter D onwards) was first digitally recorded in TUSTEP in the  1990s and early 2000s and transferred to XML/TEI format from 2016 onwards. The data primarily contains information on the lemma, the meaning and the phonetics as well as the origin and the collector of the items. Since 2016, the database (from the section of the letter D) has been accessible via LIÖ  and thus available for research beyond lexicographical work. The data is being successively processed and enriched with metadata. The georeferencing enables the cartographic projection of the data via the mapping-tool in LIÖ.

As part of the Retro-Digitization of the already printed volumes (see task area “WBÖ – Lexicography”) the generated XML database will be linked to the WBÖ - Document Database in the future.

Since 2017, the entire WBÖ main catalog has been compiled in image format. The scans are then saved in the paper slip catalogue and linked to the corresponding entries in the WBÖ - Document Database.  In the following years, AI-supported digitization of the letter sections A - C will be implemented, so that the complete paper slip catalogue will be available in the document database. In addition, the task area is dedicated to the enrichment and integration of new databases that have been created as part of other WBÖ activities (e.g. the Corpus "Austrian Dialect Recordings in the 20th Century") or are still being developed (e.g. The ABC(s) of dialects).

The ongoing optimization of the data and its curation for the internal repository ARCHE (A Resource Centre for the HumanitiEs) is supported by the research unit DH Research and Infrastructure (DHRI).

 

The core team of the task area includes the following members of the Research Unit Linguistics: Angela Bergermayer, Hans Christian Breuer, Marlene Haslinger-Fenzl, Markus Kunzmann (CO-PI), Susanne Schmalwieser, Sonja Schwaiger and Philipp Stöckle (CO-PI). The task area works closely with the research unit DHRI, with particular involvement of the text technology experts Daniel Schopper (technical supervisor of the sub-project) and Omar Siam.