PunkProse [software]


Punctuation marks support understandability and readability in written language. In spoken language, punctuation of the transcribed speech is influenced by two phenomena: (1) syntax and (2) prosody. We present a software architecture that makes it possible to train punctuation restoration models from any combination of lexical, morphosyntactic, prosodic and acoustic features. Architecture is language independent and feeds on word-segmented data. A dataset compiled from English TED talks is given in http://hdl.handle.net/10230/33981

This software is stored and maintained in the following github repository: https://github.com/alpoktem/punkProse Instructions to use is explained there in detail.

DOI https://doi.org/10.34810/data484
Metadata Access https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data484
Creator Öktem, Alp ORCID logo
Publisher CORA.Repositori de Dades de Recerca
Publication Year 2023
Funding Reference European Commission 645012
Rights Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data484
OpenAccess true
Resource Type Program source code; Dataset
Format text/x-python; text/plain; charset=US-ASCII; application/octet-stream; text/markdown; text/plain; application/x-sh; text/csv; audio/vnd.wave
Size 7228; 1079; 14249; 16427; 1452; 7370; 3099; 3095; 383; 1712; 107; 7381; 356672; 1701; 96; 6406; 144406; 491; 38; 2945; 120872; 2104; 158; 9959; 347018; 3302; 238; 16080; 380042; 4320
Version 1.0
Discipline Other