Dataset - B2FIND

Information structure and historical English OV/VO variation

This dataset contains the data that is used in: Struik, Tara and Ans van Kemenade. Information structure and OV word order in Old and Middle English: a phase-based approach. To...

B2 Tangale

Tangale sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles,...

B4 Muspilli

Complete text, status: work in progress, digitalization, translation to English, manually annotated with parts of speech, syntactic category, grammatical function, clause...

B7 Wolof (web)

The corpus comprises out of a collection of texts from discussion forums in the web, randomly chosen for their near-standard like orthography and language, and treating...

B4 Sächsische Weltchronik

The corpus contains a chronic from the 13th century in Middle Low German. Es handelt sich um eine Chronik, in Mittelniederdeutsch, 13 Jh. Beschreibung der Textzeugen usw. in:...

B2 Bura

Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated CLARIN Metadata summary for B2 Bura...

B7 Wolof (Wikipedia)

The corpus comprises out of a collection of texts from the Wolof Wikipedia, randomly chosen for their near-standard like orthography and language, and treating different topics....

B2 Guruntum

Guruntum sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles,...

B1 Fon

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.)...

B2 Marghi

Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated. CLARIN Metadata summary for B2 Marghi...

B4 Historisches Predigtenkorpus zum Nachfeld

HIPKON is the first corpus based on only one text type (sermons) and on one dialect area, Upper German (Bavarian-Alemannic). The sermons cover the time from Middle High German...

A5 Hausa Umarnin Uwa

This corpus of Umarnin Uwa film transcripts contains 47 transcripts with a total of 10194 tokens. It provides information including automatic POS tagging, speaker and...

B4 Tatian Corpus of Deviating Examples 2.1

The present corpus, the Tatian Corpus of Deviating Examples T-CODEX 2.1, provides morpho-syntactic and information structural annotation of parts of the Old High German...

B1 Aja

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.)...

B4 Heliand

Heliand 1, 4 and 5: complete text, status: final, digitalization, translation to Modern German, manually annotated with parts of speech, syntactic categories, grammatical...

A5 Hausa News

This corpus of news articles from the online news service of Deutsche Welle contains 4 texts with a total of 2017 tokens. CLARIN Metadata summary for A5 Hausa News...

B4 Otfrid

Das Referenzkorpus Altdeutsch erfasst und annotiert die ältesten Sprachdenkmäler des Deutschen vom Beginn der kontinuierlichen schriftlichen Überlieferung um 750 bis etwa 1050...

B4 Ludolf

The texts of this corpus, Ludolf von Sudheims Reise ins Heilige Land (Ludolf of Sudheim's Journey to the Holy Land), is a journey diary describing the adventures of a group of...

B1 Foodo

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.)...

B1 Yom

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.)...

22 datasets found