-
South Slavic web corpus collection CLASSLA-web 2.0
The CLASSLA-web 2.0 collection is a large-scale, comparable set of web corpora covering all seven South Slavic languages: Slovenian, Croatian, Bosnian, Montenegrin, Serbian,... -
Macedonian web corpus CLASSLA-web.mk 1.0
The Macedonian web corpus CLASSLA-web.mk 1.0 is based on the MaCoCu-mk 2.0 web corpus crawl (http://hdl.handle.net/11356/1801), which was additionally cleaned and enriched with... -
Serbian web corpus CLASSLA-web.sr 1.0
The Serbian web corpus CLASSLA-web.sr 1.0 is based on the MaCoCu-sr 1.0 web corpus crawl (http://hdl.handle.net/11356/1807), which was additionally cleaned and enriched with... -
Montenegrin web corpus CLASSLA-web.cnr 1.0
The Montenegrin web corpus CLASSLA-web.cnr 1.0 is based on the MaCoCu-cnr 1.0 web corpus crawl (http://hdl.handle.net/11356/1809), which was additionally cleaned and enriched... -
Croatian web corpus CLASSLA-web.hr 1.0
The Croatian web corpus CLASSLA-web.hr 1.0 is based on the MaCoCu-hr 2.0 web corpus crawl (http://hdl.handle.net/11356/1806), which was additionally cleaned and enriched with... -
Bulgarian web corpus CLASSLA-web.bg 1.0
The Bulgarian web corpus CLASSLA-web.bg 1.0 is based on the MaCoCu-bg 2.0 web corpus crawl (http://hdl.handle.net/11356/1800), which was additionally cleaned and enriched with... -
Bosnian web corpus CLASSLA-web.bs 1.0
The Bosnian web corpus CLASSLA-web.bs 1.0 is based on the MaCoCu-bs 1.0 web corpus crawl (http://hdl.handle.net/11356/1808), which was additionally cleaned and enriched with... -
Slovenian web corpus CLASSLA-web.sl 1.0
The Slovenian web corpus CLASSLA-web.sl 1.0 is based on the Slovenian MaCoCu-sl 2.0 web corpus crawl (http://hdl.handle.net/11356/1795), which was additionally cleaned and... -
English-Slovenian text genre dataset X-GENRE
The X-GENRE dataset comprises almost 3,000 web texts in English and Slovenian, manually-annotated with genre labels. The dataset allows for automated genre identification and... -
Slovene Web genre identification corpus GINCO 1.0
The Slovene Web genre identification corpus GINCO 1.0 contains web texts, manually annotated with genre, from two Slovene web corpora, the slWaC 2.0 corpus, crawled in 2014, and...
