Dataset - B2FIND

Monitor corpus of Slovene Trendi 2026-03

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 60 publishers. Trendi 2026-02 covers the period from January...

Monitor corpus of Slovene Trendi 2026-04

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 61 publishers. Trendi 2026-04 covers the period from January...

Monitor corpus of Slovene Trendi 2026-02

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 60 publishers. Trendi 2026-02 covers the period from January...

Monitor corpus of Slovene Trendi 2026-01

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 60 publishers. Trendi 2026-01 covers the period from January...

Training corpus of spoken Slovenian ROG 1.1

Training corpus of spoken Slovenian ROG 1.1 is an improved version of the ROG 1.0 corpus (http://hdl.handle.net/11356/1992). The main differences between the original and the...

Training corpus of spoken Slovenian ROG 1.0

Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such...

Corpus-grounded evaluation dataset for grammatical question answering GramQA 1.0

The Corpus-grounded evaluation dataset for grammatical question answering (GramQA) consists of 13 grammatical questions inspired by WALS, the World Atlas of Language Structures...

Monitor corpus of Slovene Trendi 2025-12

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-12 covers the period from January...

Monitor corpus of Slovene Trendi 2025-11

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-11 covers the period from January...

Monitor corpus of Slovene Trendi 2025-10

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-10 covers the period from January...

Monitor corpus of Slovene Trendi 2025-09

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-09 covers the period from January...

Monitor corpus of Slovene Trendi 2025-08

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-08 covers the period from January...

Monitor corpus of Slovene Trendi 2025-07

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-07 covers the period from January...

Monitor corpus of Slovene Trendi 2025-06

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-06 covers the period from January...

Monitor corpus of Slovene Trendi 2025-05

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-05 covers the period from January...

Deep Universal Dependencies 2.4

Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional...

Terminal-based CoNLL-file viewer, v2

A simple way of browsing CoNLL format files in your terminal. Fast and text-based. To open a CoNLL file, simply run: ./view_conll sample.conll The output is piped through less,...

Artificial Treebank with Ellipsis

Artificially created treebank of elliptical constructions (gapping), in the annotation style of Universal Dependencies. Data taken from UD 2.1 release, and from large web...

Slavic Forest, Norwegian Wood (models)

Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on...

CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to...

58 datasets found