-
Big Data language model - subword - BPE - ARPA
Big data language model based on subword units, based on byte pair encoding in ARPA format -
Cyfry
A small spoken digits corpus in polish. Contains 488 recordings of 25 speakers reading 20 digits (0-9) each. Amounts to around 76 minutes of recordings. Split into train (~72%),... -
Big data language model with part of speech tags stemmed in ARPA format
Big data language model with part of speech tags stemmed in ARPA format -
Big Data language model in Word2Vec CBOW format.
Big Data language model in Word2Vec CBOW format. -
Big Data language model with grammatical groups - RAW
Big Data Language model tagged with grammatical groups in RAW format. -
Big Data language model - subword - SYLLABED - ARPA
Big data language model based on syllabes in ARPA format. -
Big data language model stemmed in ARPA format
Big data language model stemmed in ARPA format. -
Big data language model with part of speech tags stemmed in RAW format
Big data language model with part of speech tags stemmed in RAW format -
Big data language model stemmed with BPE in ARPA format
Big data language model stemmed with BPE in ARPA format -
Korpus nagrań radiowych
A collection of radio 192 recordings, with around 200 speakers, each no longer than 40 minutes long. Audio saved as RAW 16-bit 16 kHz sampling frequency. -
Big Data language model with grammatical groups - ARPA
Big Data Language model tagged with grammatical groups trained in ARPA format. -
Big Data language model tagged with POS - RAW.
Big data language model tagged with POS - RAW -
Big Data language model in FastText CBOW format
Big Data language model in FastText CBOW format -
Speech Recognition System for Polish: Parliamentary Speech
This resource contains dockerized models and scripts of an automatic speech recognition system for Polish trained on Polish Parliament speeches. The system is based on the Kaldi... -
Speech Recognition System for Polish: Polish Film Chronicles
This resource contains dockerized models and scripts of an automatic speech recognition system for Polish trained on recording of the Polish Film Chronicles. The system is based... -
Big Data language model - STEMMED - RAW data
Big data language model stemmed in RAW format -
Transcriptions of the Polish Film Chronicles (Polska Kronika Filmowa) - years...
This is the orthographic transcription of the audio of the Polish Film Chronicles (Polska Kronika Filmowa - PKF) between the years 1945-1962. The transcription is mostly... -
Transkrypcja fonetyczna Kronik RP
This is a phonetic transcription of the "Kroniki RP" data set using the G2P tool available at mowa.clarin-pl.eu. -
Big Data language model - subword - BPE - RAW
Big data language model based on subword units, based on byte pair encoding in RAW format -
Big Data language model in FastText Skip-gram format.
Big Data language model in FastText Skip-gram format.