Language contact and linguistic areas in the light of digital lexicography

The workshop’s aim is to bring together researchers from the ACDH-CH and other institutions, who work on digital dictionaries or other lexical resources and often face theoretical and technical issues regarding the description of language contact phenomena, borrowings, linguistic variation as well as structural similarities and convergencies between varieties belonging to different language families (linguistic areas). Especially the long tradition of (monumental) historical lexicography almost exclusively focused on a specific language or linguistic variety, where several issues of language contact, such as borrowings, were treated poorly or their explanation was restricted to a mere reference on the specialized dictionary of the source language; thus, in many cases difficult lemmata and structures remained unexplained by merely shifting the “problem” to other linguistic and cultural realms and vice versa. For instance, the fruitful cultural and linguistic contact and exchange, which has lasted for millennia between the people of the Eastern Mediterranean, is still lacking a proper and correct documentation in the respective dictionaries due to the compartmentalization of knowledge among the various related academic disciplines.

The workshop aspires to reveal and discuss how up-to-date e-lexicography utilizing baseline encoding and the alignment of digital lexical resources can serve as a crucial tool, not only for reviewing and augmenting existing reference works but also for collecting and exploring linguistic data regarding similarities between languages and varieties beyond the level of “Wanderwörter”, such as structural convergencies in morphology and syntax.

We invite presentations on methodologies, tools, show cases and case studies, as well as descriptions of ongoing or planned research projects from linguistics, lexicology, e-lexicography, and the digital humanities in general, which focus on the creation and management of linguistic data collections and/or their linking with other resources in order to investigate phenomena pertaining to language contact, linguistic areas and variation in synchrony and diachrony.

In particular, we focus on topics regarding the lexical entry compilation and its encoding (XML/TEI-LexO) as well as the interoperability and sustainability of digital lexical data collections and lexicographic resources. Further related topics, such as multilingual technologies, corpus annotation for corpus-based lexicography, specialized vocabularies, entity linking and ontologies are also welcome.

09:15 – 09:30

Introduction

Organizing team:

Ch. Katsikadeli, Th. Klampfl, K. Mörth

09:30 – 10:00

Sustainability in terminology and lexicography practice: A data perspective

Tanja Wissik

10:00 – 10:30

Revisiting the Lexicography of Loanwords in the Eastern Mediterranean: Greek borrowings in Aramaic

Michael Gassner Christina Katsikadeli, Thomas Klampfl

10:30 – 11:00

Coffee break

 

11:00 – 11:30

Breathing new Life into an Old Inscription

Julian Posch

11:30 – 12:00

How Many Words to Record? The Ancient Egyptian Writing System as a Standardising Façade of Linguistic and Lexicographic Diversity

Roman Gundacker

12:00 – 13:30

Lunch break

 

13:30 – 14:00

Egyptian Root Lexicon — Online Database

 Helmut Satzinger, Danijela Stefanović

14:00 – 14:30

(Historical) dialect dictionaries as a means for reconstructing the history of Slavic-German language contact in Vienna

Agnes Kim

14:30 – 15:00

Sprachdynamische Prozesse zwischen Sprachkontakt und innersprachlicher Variation

Andreas Gellan, Markus Kunzmann, Philipp Stöckle

15:00 – 15:30

Coffee break

 

15:30 – 16:00

Modelling diatopic information in TEI-lex0:  the case of the SHAWI Dictionary

Michaela Rausch-Supola, Daniel Schopper, Karlheinz Mörth

16:00 – 16:30

Loanwords in the Tocharian Lexicon

Hannes A. Fellner, Bernhard Koller, Angelo Mascheroni

16:30

Closing words

 

 


Abstracts

Revisiting the Lexicography of Loanwords in the Eastern Mediterranean: Greek borrowings in Aramaic

Christina Katsikadeli, Thomas Klampfl, Michael Gassner

Greek loanwords, which total over two thousand items stemming from various dialects, make up the largest group of non-native words in the totality of the Hebrew/Aramaic lexicon (in Mishnaic Hebrew, Jewish Palestinian Aramaic and Jewish Babylonian Aramaic). The present study is based on recent findings from two FWF-Projects: "Dictionary of Loanwords in the Midrash Genesis-Rabbah (GenR)" and "Dictionary of Loanwords in the Yalkut Shimoni" (University of Salzburg/ÖAW) concerning the Greek (and Latin) loanwords in the Rabbinic literature, the study of which still remains a desideratum. The paper focuses on the compilation of digital entries according to the TEI-LexO baseline regarding hapax legomena and problematic cases of (alleged) Greek loanwords: we examine each attestation of the respective lexeme in its context and offer an up-to-date linguistic analysis, concerning the etymology, the morphophonology as well as detailed fine-grained semantics. Furthermore, we will present comparisons with the respective Greek loaned vocabulary in Syriac Aramaic and Coptic sources in order to highlight methodological issues as well as the merits of the digital alignment of lexicographical entries within their Eastern Mediterranean context. 


Breathing new Life into an Old Inscription

Julian Posch

The tomb of Khnumhotep II is located in the necropolis of Beni Hassan, about 20km south of modern el-Minya – Egypt. Khnumhotep II belonged to the ruling family of the Oryx-nome and can be dated to the reigns of Amenemhet II and Senwosret II (≈1878–1837 BCE [date according to E. Hornung, R. Krauss & D.A. Warburton 2006]) of the Middle Kingdom. Khnumhotep’s tomb is very famous for his (auto-)biography. In this text, he describes the achievements of his life, including the kings under whom he served. Although several studies have focused on this inscription, due to its sometimes fragmentary preservation or semantic gaps of understanding, some parts of this text remain unclear.

The Australian mission, led by Naguib Kanawati, has made high-resolution photographs of this inscription available online, making it possible for the first time to study this inscription up close remotely. The application of tools such as the DStretch plug-in – developed for dipinti – or Hierax – developed for papyri – further enhances the information provided by these photographs. Based on this material and through the application of these tools, a new reading of the passage in columns 206-208 could be proposed. Built on the pioneering work of Adolf Erman and Hermann Grapow resulting in the seminal Berliner Wörterbuch, the "Thesaurus Linguae Aegyptiae" (TLA) project (2013–2034) has continued their lexical work and transferred it to a digital environment. Besides the TLA project, several other similar projects with specific focusses have been developed, partly in collaboration with the TLA, to broaden the lexicographic understanding of the ancient Egyptian language. As a result, they have become one of the fundamental tools for Egyptological linguistic research. The case study from the tomb of Khnumhotep II presented in this paper will highlight the possibilities offered by these digital tools, focusing on the lexicographic study of the word wmt(.w/.t) "gateway/thickness" and its (possible) semantic shift. In addition, the semantics of zmȝ "lung" will be examined in relation to the tomb's architecture.

This paper will therefore provide an insight into the Egyptological landscape of digital tools for lexicographic research and their application using the case of columns 206-208 in the tomb of Khnumhotep II at Beni Hassan.


How Many Words to Record? The Ancient Egyptian Writing System as a Standardising Façade of Linguistic and Lexicographic Diversity

Roman Gundacker

Every aspect of linguistic and lexicographical research on ancient Egyptian depends on written records. By its nature, the ancient Egyptian writing system only denotes consonants, but no vowels. Accordingly, linguists and lexicographers have accustomed themselves to working with highly standardised consonantal skeletons with only a few paying attention to the sometimes rich diversity of hieroglyphic, hieratic, and demotic ways of writing. The ‘Nebenüberlieferung’ and Coptic offspring of Egyptian words are only rarely consulted in order to define the morphology of a word or several words sharing one and the same graphical representation. As a result, much work has been carried out along the façade of the ancient Egyptian writing system, but the doors to the complex reality of the ancient Egyptian language per se have not been pushed open. In order to catch a glimpse thereof, this presentation will exemplify the pitfalls of current Egyptological linguistics and lexicography and attempt to get one step closer towards a better assessment of the ancient Egyptian ‘Wortschatz’.


Egyptian Root Lexicon — Online Database

Helmut Satzinger, Danijela Stefanović

In any language, several lexemes may be obviously derived from one and the same root. This is usually visible in their form, but also in their meaning. Although these roots may be homonymous with a derived concrete lexeme, they are, however, abstractions. Consequently, these abstractions will often display the original meaning of the root and be a guide to the original semantic role of the derived concrete lexemes. The knowledge of the roots is essential in various aspects. Nevertheless, Egyptology has neglected this field until recently. The project "Egyptian Root Lexicon" was funded by Austrian Science Fund (FWF Der Wissenschaftsfonds) in 2016, and launched in 2017. The major collaborators were Kristina Hutter (University of Vienna, 2017 – initial databank entries) and Danijela Stefanović (University of Belgrade / University of Vienna, 2020–2021 – work on the databank entries and on the structure of the lexicon), and partially also Alfred Hutter (University of Vienna, 2018). As main result of the Egyptian Root Lexicon project, in addition to the printed publication,1 an extensive database encompassing more than 11000 individual entries was established (presently in the File Maker Pro format). The project team was able to mark, on the basis of attested lexemes on obvious phonetic and semantic resemblance, more than 3500 roots. As the etymological research in the field of Afroasiatic is not sufficiently advanced, the lexical roots are not set up on an etymological basis. The project team created the root system where the individual roots are numerically marked with specific identifiers, in alphabetic arrangement, with their subsequent lexemes marked with an identity number, the "ID," as created by the Thesaurus Linguae Aegyptiae (TLA), of the Berlin Academy of Sciences. The next step will be creation of the searchable online database (freely accessible) Demo version of the database is available at https://coping.at/customers/root-dictionary/lexem.html)


Sprachdynamische Prozesse zwischen Sprachkontakt und innersprachlicher Variation

Andreas Gellan, Markus Kunzmann, Philipp Stöckle

Der Hauptkatalog des 1911 initiierten bayerisch-österreichischen Wörterbuches wurde dafür geschaffen, um ein semasiologisches Wörterbuch zu erstellen, das den gesamten bairischen Sprachraum der damaligen Zeit abbildet. Daneben wurde ein komplexes Verweissystem erstellt, das die unzähligen Synonyme abbilden soll und als Grundlage für einen wortgeographischen Sprachatlas geplant war. Dadurch, dass die Lemmatisierung des Handzettelkatalogs nach dem Grundwort erfolgt, bei der die Komposita dem besagten Grundwort untergeordnet werden, unterscheidet sich die Sortierung der Datenbank auch grundlegend von anderen neuzeitlichen Wörterbuchprojekten. Diese Datenbank-Systematik erlaubt eine ergiebige Analyse der Lexik, die Betrachtung von phonetischen, morphologischen und syntaktischen Phänomenen ist aber stark beschränkt, ebenso können gesellschaftliche und kulturelle bzw. volkskundliche Aspekte nur schwer herausgefiltert werden. Durch die in den 90ern und frühen 2000ern umgesetzte Transkription der größtenteils kurrentschriftlichen Handzettelbelege wurde eine Datenbasis im TUSTEP-Format geschaffen, die es am ACDH-CH ermöglichte, die Daten in ein XML-TEI-Format zu überführen und - neben der Wörterbuchredaktion selbst - einer breiten Nutzer:innenschaft über das Lexikalische Informationssystem Österreich (LIÖ) zugänglich zu machen. Da das Datenmaterial über mehrere Jahrzehnte gesammelt und auf Grundlage einer indirekten und direkten Datenerhebung sowie anhand von Literaturexzerpten erstellt wurde, erweisen sich die Daten als äußerst heterogen, da die unterschiedlichen Informationen an verschiedensten Stellen der Datenbank zu finden sind, was sich auch in den unzähligen Tags innerhalb der Datenbankstruktur widerspielt. Demzufolge bilden die Hauptlemmata die konsequenteste Struktur. Dieser folgt auch das Wörterbuch der bairischen Mundarten in Österreich (WBÖ). Mit Hilfe unterschiedlicher Tools und Methoden, die im Laufe der letzten Jahre am ACDH-CH entwickelt bzw. genutzt wurden, können in der Datenbank auch andere Inhalte anhand von vordefinierten Tags gesucht, klassifiziert und extrahiert werden. So werden unterschiedliche Tags zur Lautung, Bedeutung und Etymologie, zu Belegsätzen und Lehnwörtern u.v.m. kombiniert, um diese für unterschiedliche Fragestellungen heranzuziehen. In dem Vortrag werden jene bairischen Lehnwörter näher betrachtet, die in den benachbarten Sprachregionen übernommen wurden. Darüber hinaus liegt der Fokus auf phonetischen, morphologischen und syntaktischen Erscheinungen, die sich innerhalb des Bairischen in den Kontakträumen ausgebreitet haben.


Modelling diatopic information in TEI Lex-0: the case of the SHAWI Dictionary

Karlheinz Mörth, Daniel Schopper, Michaela Rausch-Supola

The presentation will focus on discussing standards used in the domain of digital lexicography and touch on the baseline encoding for lexicographic data which goes by the name TEI Lex-0. As an example, we will refer to data taken from the SHAWI project which investigates the language of small cattle breeding Bedouin tribes in various regions of south-eastern Turkey who speak an Arabic variety. As a result of the project, two complementary digital resources will be presented: a corpus based on transcribed audio-recordings and a dictionary which will furnish sufficient material of the varieties under investigation to close a number of gaps in their linguistic description.

When creating lexical resources, lexicographers have to first decide on the micro and macro structure of their publication. In digital approaches, an additional layer has to be considered: the model applied to the data used to build these resources. All the projects of the VICAV cluster have strived to build their encoding, as far as possible, on the community standard furnished by the Guidelines of the Text Encoding Initiative. We will consider recent developments in this community and the landscape of digital lexicography at large, discuss approaches we have taken during the on-going creation of the SHAWI dictionary and showcase some technical solutions we have used in this process. We will present particular structural features of the schema of the new dictionary highlighting the case of geolocations in lexicographic resources.


Date

6 June 2024


Place

Austrian Academy of Sciences
Bäckerstraße 13
Seminar rooms 1 & 2
1010 Vienna