The corpus has been, developed in the context of the Special Research Project “German in Austria” (FWF F60) and comprises more than 1000 hours of spoken language variation of the German language in Austria. For the creation of the corpus, more than 850  respondents (from different age and occupational types) from all language areas in Austria were recorded in various survey settings (mainly interviews, conversations among friends, language production experiments, reading and translation tasks, reading aloud tasks a.o. A significant part of the data is transcribed (according to orthographic standard or with GATII) and automatically enriched with PoS-tags. 
The corpus is built as a relational, PostGreSQL database. SpaCy was used for the automatic annotation. All audio files are in the .ogg-format. 

SFB DiÖ

 

Data repository

GitHub