Dataset - B2FIND

Supplementary materials to "Fundamental principles of linguistic structure ar...

Supplementary materials [Research Data]. The related article is Murphy, E., Leivada, E., Dentella, V., Montero, R., Günther, F., & Marcus, G. (2025). Fundamental principles...

Replication Data for: Davvisámi earutkeahtes oamasteapmi

This data shows the correlation analysis for our study with this description: On the basis of corpus data (9.5M words 1997-2010) we claim that North Saami is developing a...

Replication Data for: A corpus approach to the history of Russian po delimita...

This paper gives an example of how enriched diachronic treebank data can shed new light on an old and conflicted topic, even when that topic is morphological and semantic in...

Diachronic French Reveal Secret frame

This dataset compiles selected sentences from the MCVF and ARTFL-FRANTEXT corpora containing lexical items that evoke the Reveal Secret frame (as described in the ASFALDA French...

Replication Data for: Predicting Russian aspect by frequency across genres

We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address...

Replication data for: Prefix variation in Russian путать

This thesis explores the prefix variation in путать and consists of three case studies: Case study 1 “The choice of prefix under prefix variation”: Is it possible to predict...

The MSC Data Set

From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015)...

Source code and data for the PhD Thesis "Metrics of Graph-Based Meaning Repre...

This dataset contains source code and data used in the PhD thesis "Metrics of Graph-Based Meaning Representations with Applications from Parsing Evaluation to Explainable NLG...

Mapping Czech Verbal Valency to PropBank Argument Labels: LREC2024 - verifica...

Mapping table for the article Hajič et al., 2024: Mapping Czech Verbal Valency to PropBank Argument Labels, in LREC-COLING 2024, as preprocess by the algorithm described in the...

Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)

The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes...

NomVallex 2.0

NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on...

VALLEX 3.0

VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX...

NomVallex 2.5

NomVallex is a manually annotated valency lexicon of Czech nouns and adjectives, adopting the theoretical framework of Functional Generative Description as its theoretical...

NomVallex I.

The NomVallex I. lexicon describes valency of Czech deverbal nouns belonging to three semantic classes, i.e. Communication (dotaz 'question'), Mental Action (plán 'plan') and...

Uniform Meaning Representation

The goal of the Uniform Meaning Representation (UMR) project is to design a meaning representation that can be used to annotate the semantic content of a text. UMR is primarily...

Corpus OVER

Many studies in cognitive linguistics have analysed the semantics of 'over', notably the semantics associated with 'over' as a preposition. Most of them generally conclude that...

Semantically annotated sample of Czech and English conversion pairs of verbs ...

Supplementary files for a comparative study of word-formation without the addition of derivational affixes (conversion) in English and Czech. The two .csv files contain 300...

EngVallex - English Valency Lexicon 2.0

EngVallex 2.0 as a slightly updated version of EngVallex. It is the English counterpart of the PDT-Vallex valency lexicon, using the same view of valency, valency frames and the...

PDT-Vallex: Czech Valency lexicon linked to treebanks

The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague...

Prague Dependency Treebank 3.5

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied...

45 datasets found