-
CARDIO:DE [V1.1.2]
Version information CARDIO:DE 1.1.2 Added community-contributed annotations from Becker et al. Extending CARDIO:DE: Additional annotation guidelines and evaluation of... -
CO-NNECT
This repository contains our path generation framework Co-NNECT, in which we combine two models for establishing knowledge relations and paths between concepts from sentences,... -
IKAT-EN
A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (English version). The data is further annotated with semantic... -
CoCo-Ex
CoCo-Ex extracts meaningful concepts from natural language texts and maps them to conjunct concept nodes in ConceptNet, utilizing the maximum of relational information stored in... -
LLMs4Implicit-Knowledge-Generation Public
Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statements between two sentences, by (i)... -
IKAT-DE
A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (German version). The data is further annotated with semantic... -
Impact of manipulating word boundaries on the information distributed in morp...
These plots are part of the study "Impact of manipulating word boundaries on the information distributed in morphology and syntax". Each plot represents the word-structure... -
Learning from climate change news: Is the world on the same page?
Climate change challenges countries around the world, and news media are key to the public’s awareness and perception of it. But how are news media approaching climate change... -
ChiSCor: Children's Story Corpus
ChiSCor is a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character... -
Corpora of patient information sheets and consent forms for UK cancer trials ...
Obtaining informed consent is an ethical imperative when conducting research involving human participants. However, participants’ actual level of understanding is often... -
Source code and data for the PhD Thesis "Linguistically-Inspired Neural Coher...
This dataset contains source code and data used in the PhD thesis "Linguistically-Inspired Neural Coherence Modeling". The dataset is split into five repositories: StruSim:... -
Phonologischer Erwerb des Galicischen als Zweitsprache: Eine qualitative Anal...
Dieses Datenpaket enthält Audio- und Begleitdaten aus dem Masterarbeitsprojekt „Der phonologische Erwerb des Galicischen als Zweitsprache: Eine qualitative Analyse... -
Data for the PhD thesis "Modeling Lexical Fields for Translation: a Corpus-B...
This dataset contains in high resolution all graphical visualizations of data analysis provided in my doctoral dissertation. The graphs are organized according to chapters and... -
Heidelberg Bibliography of Translations of Nonfictional Texts [data]
This project, funded by the German Research Foundation, compiles an online bibliography of German translations of nonfictional texts published between 1450 and 1850. It includes... -
Turkology Annual Online – Full bibliographic records
The "Turkologischer Anzeiger/Turkology Annual" (TA), founded by Andreas Tietze (†) and György Hazai (†), is an indispensable systematic bibliography for Turkology and Ottoman... -
CorpusExplorer
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks... -
Heidelberg Bibliography of Translations of Nonfictional Texts [data]
This project, funded by the German Research Foundation, compiles an online bibliography of German translations of nonfictional texts published between 1450 and 1850. It includes... -
Salience of color terms in real texts in a wide cross-linguistic study
This dataset collects the different labels used in different languages of the world for basic word colours, according to Berlin and Kay, based on PanLex. It also provides the...
