-
Slovene Web genre identification corpus GINCO 1.0
The Slovene Web genre identification corpus GINCO 1.0 contains web texts, manually annotated with genre, from two Slovene web corpora, the slWaC 2.0 corpus, crawled in 2014, and... -
Croatian web corpus CLASSLA-web.hr 1.0
The Croatian web corpus CLASSLA-web.hr 1.0 is based on the MaCoCu-hr 2.0 web corpus crawl (http://hdl.handle.net/11356/1806), which was additionally cleaned and enriched with... -
Serbian web corpus CLASSLA-web.sr 1.0
The Serbian web corpus CLASSLA-web.sr 1.0 is based on the MaCoCu-sr 1.0 web corpus crawl (http://hdl.handle.net/11356/1807), which was additionally cleaned and enriched with... -
English-Slovenian text genre dataset X-GENRE
The X-GENRE dataset comprises almost 3,000 web texts in English and Slovenian, manually-annotated with genre labels. The dataset allows for automated genre identification and... -
Bosnian web corpus CLASSLA-web.bs 1.0
The Bosnian web corpus CLASSLA-web.bs 1.0 is based on the MaCoCu-bs 1.0 web corpus crawl (http://hdl.handle.net/11356/1808), which was additionally cleaned and enriched with... -
Genre-enriched web corpora MaCoCu-Genre
The genre-enriched MaCoCu-Genre corpus collection comprises web corpora that have been automatically annotated with genre labels. The corpora can be very useful for genre-based... -
Slovenian web corpus CLASSLA-web.sl 1.0
The Slovenian web corpus CLASSLA-web.sl 1.0 is based on the Slovenian MaCoCu-sl 2.0 web corpus crawl (http://hdl.handle.net/11356/1795), which was additionally cleaned and... -
Bulgarian web corpus CLASSLA-web.bg 1.0
The Bulgarian web corpus CLASSLA-web.bg 1.0 is based on the MaCoCu-bg 2.0 web corpus crawl (http://hdl.handle.net/11356/1800), which was additionally cleaned and enriched with... -
Montenegrin web corpus CLASSLA-web.cnr 1.0
The Montenegrin web corpus CLASSLA-web.cnr 1.0 is based on the MaCoCu-cnr 1.0 web corpus crawl (http://hdl.handle.net/11356/1809), which was additionally cleaned and enriched... -
Macedonian web corpus CLASSLA-web.mk 1.0
The Macedonian web corpus CLASSLA-web.mk 1.0 is based on the MaCoCu-mk 2.0 web corpus crawl (http://hdl.handle.net/11356/1801), which was additionally cleaned and enriched with... -
Multilingual text genre classification model X-GENRE
The X-GENRE classifier is a text classification model that can be used for automatic genre identification. The model classifies texts to one of 9 genre labels:...