-
Amharic WIC Corpus
Substantially cleaned version of existing morphologically annotated WIC Corpus. -
Tigrinya Web Corpus
Tigrinya web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated. -
AlbNER Named Entity Recognition in Albanian
AlbNER is a Named Entity Recognition corpus of Wikipedia sentences in Albanian, consisting of 900 records. The sentence tokens are manually labeled complying with the CoNLL-2003... -
Somali Web Corpus
Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.