7 datasets found

Keywords: word segmentation

Filter Results
  • Background Data for: What is a Chinese word? Lexical constructionalization in...

    Word is commonly assumed to be the basic linguistic unit, but its definition has actually been controversial in Chinese. The Chinese language is documented in Chinese...
  • ACL word segmentation correction

    The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other...
  • MorfoCzech

    A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann...
  • CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

    CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to...
  • Universal Segmentations 1.0 (UniSegments 1.0)

    Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation...
  • MorfoCzech 1.1

    A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann...
  • ACL word segmentation correction

    The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other...
You can also access this registry using the API (see API Docs).