4 datasets found

Keywords: CONLL U

Filter Results
  • Training corpus SUK 1.0

    The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...
  • Training corpus ssj500k 2.3

    The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....
  • Training corpus jos1M 1.2

    The jos1M corpus contains 1 million words of sampled paragraphs from the Gigafida corpus. It is meant to serve as a training corpus for word-level tagging of Slovene. This...
  • Training corpus ssj500k 2.2

    The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....
You can also access this registry using the API (see API Docs).