Dataset - B2FIND

Background data for: Sprachliches Place-Making. Eine sprachwissenschaftliche ...

This dataset contains corpus statistical calculations that were used to investigate patterns of linguistic place-making in the German language. Patterns are defined here...

Replication data for: Hartmann (2016): Word-Formation Change

These data contain corpus concordances from a diachronic study of German nominalization patterns, reported on in Hartmann (forhc.), investigating three word-formation patterns...

Replication Data for: Zur Determiniererlosigkeit bei prädikativ verwendeten z...

This data set contains the replication data for the article "Zur Determiniererlosigkeit bei prädikativ verwendeten zählbaren Nomen im Deutschen: Korpusdaten und ihre...

The Hamburg MapTask Corpus (HAMATAC)

Audio recordings of map tasks with adult L2 users of German. The speakers´ L1 and their L2 proficiencies vary. The maps used for the tasks are available. Audioaufnahmen...

Replication Data for: A corpus-based analysis of the Dat-Nom/Nom-Dat alternat...

Dataset abstract The dataset includes an annotated sample of N = 13292 German written sentences with a Nominative and a Dative argument. The sentences comprise 76 different...

IQB-Bildungstrend Sprachen 2015 (IQB-BT 2015) IQB Trends in Student Achievem...

Der IQB-Bildungstrend 2015 - Sprachliche Kompetenzen am Ende der 9. Jahrgangsstufe im zweiten Ländervergleich (IQB-BT 2015) ist eine im Auftrag der Kultusministerkonferenz (KMK)...

IQB-Ländervergleich Sprachen 2008/2009 (IQB-LV 2008-9) IQB National Assessme...

Der IQB Ländervergleich Sprachen 2008/2009 ist eine im Auftrag der Kultusministerkonferenz (KMK) der Bundesrepublik Deutschland bundesweit durchgeführte Studie. Sie erfasste die...

Mehrsprachigkeitsentwicklung im Zeitverlauf (MEZ) Multilingual Development: ...

Die Studie untersuchte individuelle und kontextuelle Bedingungen für die Entwicklung von Mehrsprachigkeit bei Schülerinnen, die weiterführende Schulen in Deutschland besuchten....

Mehrsprachigkeitsentwicklung im Zeitverlauf (MEZ) Multilingual Development: ...

Die Studie untersuchte individuelle und kontextuelle Bedingungen für die Entwicklung von Mehrsprachigkeit bei Schülerinnen, die weiterführende Schulen in Deutschland besuchten....

Mehrsprachigkeitsentwicklung im Zeitverlauf (MEZ) Multilingual Development: ...

Die Studie untersuchte individuelle und kontextuelle Bedingungen für die Entwicklung von Mehrsprachigkeit bei Schülerinnen, die weiterführende Schulen in Deutschland besuchten....

Preliminary investigation of materials from the Hamburger Rotes Stadtsbuch RS...

Dataset of the preliminary investigation of materials from the Hamburger Rotes Stadtsbuch RSH 111-1_92692 . Devices used: Elio (Bruker/XGLab): 40 kV and 80...

Preliminary investigation of materials from the Hamburger Rotes Stadtsbuch RS...

Dataset of the preliminary investigation of materials from the Hamburger Rotes Stadtsbuch RSH 111-1_92692 . Devices used: Elio (Bruker/XGLab): 40 kV and 80...

German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)

Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions.

Genre-sensitive Neural Situation Entity classifier (DE, EN)

This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We...

Pre-trained POS tagging models for German social media

Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015)....

tweeDe

A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework

Affixoid Dataset (DE)

The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca...

Sentiment Compound Data (DE)

This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds.

A harmonised testsuite for social media POS tagging (DE)

A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically...

GER_SET: Situation Entity Type labelled corpus for German

Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like...

155 datasets found