2 datasets found

Keywords: corpus creation

Filter Results
  • W2C – Web to Corpus – tool

    A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc. A set of 120 corpora collected using this...
  • Smyrna

    Smyrna is a tool for building and searching own Polish corpora from HTML files.
You can also access this registry using the API (see API Docs).