43 datasets found

Keywords: tokenisation

Filter Results
  • Trankit model for SST 2.15

    This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank...
  • Training corpus ssj500k 2.3

    The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....
  • Training corpus SETimes.SR 1.0

    The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic...
You can also access this registry using the API (see API Docs).