Dataset - B2FIND

Dataset of annotated collocation-distractor pairs COLLDIST

The dataset contains 59,598 collocation-distractor pairs for 2,856 headwords. Distractor is defined as an incorrect answer/alternative to collocation, which can be similar to...

Collocations Dictionary of Modern Slovene KSSS 2.2

The database of the Collocations Dictionary of Modern Slovene 2.2 contains 4,425,942 collocations in 78,046 entries. Collocations occur in 81 different syntactic relations....

Collocations Dictionary of Modern Slovene KSSS 2.0

The database of the Collocations Dictionary of Modern Slovene 2.0 contains 4,491,958 collocations in 81,443 entries. Collocations occur in 81 different syntactic relations....

Frequency lists of collocations from the Gigafida 2.1 corpus

Frequency lists of collocations were extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/ske/#dashboard?corpname=gfida21) using specialised...

Replication Data for: Threatening in Russian with or without -sja: grozit' vs...

In order to get a clearer picture of the constructions of grozit’ and grozit’sja, we have put together a database with examples of usages of both words from the Russian National...

Background data for: Sprachliches Place-Making. Eine sprachwissenschaftliche ...

This dataset contains corpus statistical calculations that were used to investigate patterns of linguistic place-making in the German language. Patterns are defined here...

eSSKJ collocations 1.0

The database of eSSKJ Collocations 1.0 contains entries for 1797 headwords (1186 nouns, 140 verbs, 421 adjectives, and 48 adverbs) and 167 multi-word expressions with 3098...

Slovene ontology of semantic types for nouns SLONEST-noun 1.0

SLONEST stands for Slovene Ontologies of Semantic Types. The first subset – SLONEST-noun 1.0 – represents an ontology developed for nouns. SLONEST-noun contains an XML file with...

Annotated collocation candidates for three common syntactic structures in Slo...

This resource contains 713,310 collocation candidates, which were automatically extracted from the Gigafida 2.0 corpus (http://hdl.handle.net/11356/1320) and annotated whether...

Collocation lexicon of Slovene academic discourse Aleks

Aleks is a lexical database with 463 entries typical for general Slovene academic discourse. The entries include typical context examples (collocations and examples of use)...

The Orange workflow for observing collocation trends ColTrend 1.0

The Orange workflow for observing collocation trends ColTrend 1.0 ColTrend is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data...

Automatically constructed multiword lexicon hrMWELex v0.5

The hrMWELex lexicon is an automatically constructed lexicon of Croatian multiword expression candidates (mostly collocations) from the parsed hrWaC 2.0 corpus by using the...

The Orange workflow for observing collocation clusters ColEmbed 1.0

The Orange Workflow for Observing Collocation Clusters ColEmbed 1.0 ColEmbed is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data...

Automatically constructed multiword lexicon slMWELex v0.5

The slMWELex lexicon is an automatically constructed lexicon of Slovene multiword expression candidates (mostly collocations) from the parsed KRES corpus by using the DepMWEx...

Collocations Dictionary of Modern Slovene KSSS 1.0

The database of the Collocations Dictionary of Modern Slovene 1.0 contains entries for 35,862 headwords (18,043 nouns, 5,148 verbs, 10,259 adjectives and 2,412 adverbs) and...

Automatically constructed multiword lexicon srMWELex v0.5

The srMWELex lexicon is an automatically constructed lexicon of Serbian multiword expression candidates (mostly collocations) from the parsed srWaC 1.0 corpus by using the...

Lexical database of Slovene PR terminology TERMIS

TERMIS is a terminology database with 2,000 entries from the field of public relations. The terms in Slovene are explained and translated into English, with typical context...

Slovene lexical database 1.0

Slovene Lexical Database was created between 2008 and 2012 and represents a comprehensive syntactic and semantic description of a selected set of Slovene words. The description...

MWELexicon 1.1

Lexicon of 56,5k multi-word lexical units linked to plWordNet, together with description of their syntactic bahaviour obtained in constraint language (WCCL).

MWELexicon

Lexicon of 55k multi-word lexical units linked to plWordNet, together with description of their syntactic bahaviour obtained in constraint language (WCCL).

20 datasets found