Dataset - B2FIND

Clarin-PL Studio Corpus (EMU;updated phonetics)

Polish speech corpus of read speech recorded in a studio. Contains many speakers, each reading a few dozen different sentences and a list of words with rare phonemes. Useful for...

Big data language model with part of speech tags stemmed in RAW format

Big data language model stemmed with BPE in ARPA format

Big Data language model - second version - RAW

Speech tools plugin for Annotation Pro

This resource describes the Annotation Pro plugin containing various tools for automatic processing of speech data. The initial tool provides only a speech aligner, but more are...

Big Data language model in FastText Skip-gram format.

Big Data language model with grammatical groups - ARPA

Big Data Language model tagged with grammatical groups trained in ARPA format.

Long term archive operating system source code

This submission contains the operating system of the long-term archive, built in the Polish-Japanese Academy of Information Technology for the Clarin-PL project. Basic elements...

Big Data language model - subword - SYLLABED - ARPA

Big data language model based on syllabes in ARPA format.

Big Data language model - second version - ARPA

Big Data language model tagged with POS - RAW.

Big data language model tagged with POS - RAW

Clarin-PL Studio Corpus (EMU)

Polish speech corpus of read speech recorded in a studio. Contains many speakers, each reading a few dozen different sentences and a list of words with rare phonemes. Useful for...

Polish Speech Services

This archive contains the source code and configuration of the speech tools web service available at http://mowa.clarin-pl.eu/mowa. The services provided include: + speech to...

Big data language model stemmed with BPE in RAW format

Big Data language model - subword - BPE - ARPA

Big data language model based on subword units, based on byte pair encoding in ARPA format

Big Data language model in Word2Vec CBOW format.

Big Data language model in FastText CBOW format

Big Data language model - STEMMED - RAW data

Big data language model stemmed in RAW format

38 datasets found