-
EXCEPTIUS Corpus
EXCEPTIUS Corpus v1.0, containing the following data: - raw documents for 21 countries at national level - pre-processed data with spacy-udpipe v1.0 - automatically annotated... -
Impact of manipulating word boundaries on the information distributed in morp...
These plots are part of the study "Impact of manipulating word boundaries on the information distributed in morphology and syntax". Each plot represents the word-structure... -
Dataset: tweets and analyses related to the paper 'The (Un)Predictability of ...
This dataset features all the tweetids and labels that were used to model the language of 24 hashtags, and test the performance on predicting the hashtags in unseen tweets. This... -
Data: Timely identification of event start dates from Twitter
This directory features data that is discussed in the paper: F. Kunneman, A. Hürriyetoglu, N. Oostdijk and A. Van den Bosch (2014), Timely identification of event start dates... -
To NER or not to NER? A case study of low-resource deontic modalities in EU l...
Deontic modality (obligation, permission, prohibition) in legal documents can convey critical information, and identification of deontic modalities is often performed using... -
Dataset: input and results related to the paper 'Anticipointment detection in...
This dataset features the training models, emotion classifications and emotion patterns before and after events, related to the paper: F. Kunneman, M. van Mulken and A. Van den... -
Dataset: input and results related to the paper 'Anticipointment detection in...
This dataset features the training models, emotion classifications and emotion patterns before and after events, related to the paper: F. Kunneman, M. van Mulken and A. Van den... -
Data: Timely identification of event start dates from Twitter
This directory features data that is discussed in the paper: F. Kunneman, A. Hürriyetoglu, N. Oostdijk and A. Van den Bosch (2014), Timely identification of event start dates... -
Dataset: tweets and analyses related to the paper 'The (Un)Predictability of ...
This dataset features all the tweetids and labels that were used to model the language of 24 hashtags, and test the performance on predicting the hashtags in unseen tweets. This... -
Dataset: output related to the paper 'Event detection in Twitter: A machine-l...
This dataset features the output of intermediate steps and the final output of the research that is described in the paper: F. Kunneman and A. Van den Bosch (2014), Event... -
Dataset: Events and periodicity analysis related to the paper 'Automatically ...
This dataset features information on all the events that were automatically extracted from Twitter and used as input to periodicity detection, as described in the paper: F.... -
Dataset: tweets and events linked to the paper 'Open-domain extraction of fut...
Input data and output of research conducted in the study described in the paper: F. Kunneman and A. Van den Bosch (2016), Open-domain extraction of future events from Twitter,... -
CorpusExplorer
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks... -
DBS Corpus
The DBS corpus contains 93 multi-document summaries for 293 German documents about 30 education-related topics. We sampled the topics from the Deutscher Bildungsserver (DBS)... -
MDSWriter
MDSWriter is a software for manually creating multi-document summarization corpora and a platform for developing complex annotation tasks spanning multiple steps. If you use or... -
Data Linking Workshop 2023: Computer Vision and Natural Language Processing –...
The humanities meet computer science to create new synergies using computer vision and natural language processing. Aim & Scope Historians are increasingly using... -
3rd Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023)
AI can support research in the Humanities making it easier and more efficient. It is thus essential that AI practitioners and Humanities scholars take a Humanities-centred... -
Data Linking Workshop 2023: Computer Vision and Natural Language Processing –...
The humanities meet computer science to create new synergies using computer vision and natural language processing. Aim & Scope Historians are increasingly using... -
TexPrax
Dataset collected and annotated in the project TexPrax -
Engelsk-svensk guldstandard för ordlänkning (GES)
Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl. Data are created from the...