-
Background data for: Sprachliches Place-Making. Eine sprachwissenschaftliche ...
This dataset contains corpus statistical calculations that were used to investigate patterns of linguistic place-making in the German language. Patterns are defined here... -
Biblia Pauperum-Transcriptions. A Pilot
This presentation introduces the conceptual framework behind Biblia pauperum-Transcriptions, a browser-based viewer for manuscript transcriptions and digital facsimiles. This... -
German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)
Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions. -
Genre-sensitive Neural Situation Entity classifier (DE, EN)
This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We... -
Pre-trained POS tagging models for German social media
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015).... -
tweeDe
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework -
Affixoid Dataset (DE)
The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca... -
Sentiment Compound Data (DE)
This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds. -
A harmonised testsuite for social media POS tagging (DE)
A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically... -
GER_SET: Situation Entity Type labelled corpus for German
Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like... -
GermEval-2018 Corpus (DE)
This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection. -
Multispektraler Datensatz zu der Handschrift Zentralbibliothek Zürich, RP 3 "...
The manuscript RP3 of the Zentralbibliothek in Zurich contains six love letters, the ‘Zürcher Liebesbriefe’ (‘Zurich Love Letters’) and one... -
XRF analysis on metal castings from Lübeck, Germany
XRF analysis was performed in August 2022 on 7 metal artefacts at different places in Lübeck, Germany: Epitaph of Bartholomäus Heisegger (St. Anne's... -
Replication Data for: Zur Determiniererlosigkeit bei prädikativ verwendeten z...
This data set contains the replication data for the article "Zur Determiniererlosigkeit bei prädikativ verwendeten zählbaren Nomen im Deutschen: Korpusdaten und ihre... -
Subset of KoLaS (Commented Learner Corpus Academic Writing), Plain Text Version
For this upload, all Word files (.doc and .docx) in the original KoLaS corpus were converted to plain text. For more information... -
Germeval 2017 Embeddings
Word Embeddings to our paper and conll converted data of the shared task -
MT@BZ annotation guidelines v1.0
The MT@BZ annotation guidelines are guidelines for legal Italian-German machine translation quality assessment. Particularly, they cover the South Tyrolean German variety. They... -
MT@BZ translation corpus v1.0
The MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More... -
Parallel Corpus of documents from the Technical Regulations Information Syste...
Specialized parallel corpus Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of... -
Wittgenstein Archives at the University of Bergen (WAB): WiTTLex - The WiTTFi...
WiTTLex - The WiTTFind Lexicon of Wittgenstein’s Philosophical Nachlass, with Frequency Lists and Indication of the Words’ Sources in the Nachlass WiTTLex is an electronic...