NeMo Conformer CTC BPE E2E Automated Speech Recognition service RSDO-DS2-ASR-E2E-API 1.1

Dataset

PID

Automated Speech Recognition service for NeMo Conformer CTC BPE E2E models. For more details about building such models, see the official NVIDIA NeMo documentation (https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/intro.html) and NVIDIA NeMo GitHub (https://github.com/NVIDIA/NeMo). A model for automated speech recognition of Slovene speech can be downloaded from http://hdl.handle.net/11356/1740.

The service accepts as input audio files in WAV 16kHz, 16bit PCM, mono format. The maximal accepted audio duration is 300s. Note that transcription of one 300s audio file on cpu will take advantage of all available cores, consume up to 16GB RAM and may take ~180s (on a system with 24 vCPU). See the service README.md for further details.

Identifier
PID	http://hdl.handle.net/11356/1740
Related Identifier	https://rsdo.slovenscina.eu/en/speech-technologies
Related Identifier	https://github.com/clarinsi/Slovene_ASR_e2e
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1740

Provenance
Creator	Lebar Bajec, Iztok; Bajec, Marko; Bajec, Žan
Publisher	Faculty of Computer and Information Science, University of Ljubljana
Publication Year	2022
Rights	Apache License 2.0; https://opensource.org/licenses/Apache-2.0; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Resource Type	toolService
Format	text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 1
Discipline	Linguistics