Punctuation marks support understandability and readability in written language. In spoken language, punctuation of the transcribed speech is influenced by two phenomena: (1) syntax and (2) prosody. We present a software architecture that makes it possible to train punctuation restoration models from any combination of lexical, morphosyntactic, prosodic and acoustic features. Architecture is language independent and feeds on word-segmented data. A dataset compiled from English TED talks is given in http://hdl.handle.net/10230/33981
This software is stored and maintained in the following github repository: https://github.com/alpoktem/punkProse
Instructions to use is explained there in detail.