Transcription textGrids for the audio edition of the British National Corpus 1993

DOI

This collection comprises the Praat TextGrids for time-aligned transcriptions of the Audio BNC sound files. Transcriptions are time-aligned at the word and phoneme levels. The collection reflects the state of our transcriptions at the end date of the project. The files, together with the .wav files to which they relate, are also available from the Audio BNC server, http://bnc.phon.ox.ac.uk/. To use the data deposited in this zipfile: 1) Unzip the zipfile. This yields a large folder of Praat TextGrids. 2) The Praat TextGrids may be viewed using Praat software (freely available from www.praat.org), or using any simple text editor. Praat can also display the TextGrid annotation files time-aligned to the Audio BNC audio .wav files. (These audio files are separately available from http://www.phon.ox.ac.uk/AudioBNC; we do not have the rights to upload them to the UK Data Service.) The syntax of the TextGrid file names combines the alphanumeric filename of the corresponding .wav audio file, the 6-digit conversation number employed in the previously-published BNC transcripts and the 3-character alphanumeric transcription/recording code. Thus, 021A-C0897X0004XX-AAZZP0_000406_KDP_1.TextGrid cross-refers to the .wav file http://bnc.phon.ox.ac.uk/data/021A-C0897X0172XX-ABZZP0.wav, and to conversation 000406 from recording KDP, division () 1. A summary index to all the transcriptions (arranged by three-character BNC code) is given at http://bnc.phon.ox.ac.uk/transcripts-html/ and further details and links about the complete corpus, file naming conventions and on-line locations, is given at http://www.phon.ox.ac.uk/AudioBNC. Publications documenting how this data was collected and prepared, and how we have used it in our research, are available at http://gtr.rcuk.ac.uk/project/CD8C7191-EF60-41B8-BC80-A015ACCEC8EB#tabPublications.In this research project, Professor John Coleman and his co-workers at Oxford University Phonetics Laboratory and the University of Pennsylvania will study how words are joined together in natural, fluent, everyday speech. In particular, they will do detailed acoustic measurements of numerous recordings to see: how English speakers change the last consonants of words to link them up to the next word; when and in what circumstances people "drop" final 't's and 'd's. The recordings they will use are from thousands of naturally-occurring conversations collected in the 1990's for the British National Corpus. In order to search for and find specific portions of speech, they will use automatic speech recognition technologies. This makes it one of the most ambitious applications of speech recognition technology ever attempted, so the methods they will develop should help future work on searching and finding tools for audio-visual data, such as sound libraries, movie databases etc. It will open up the audio recordings from the British National Corpus for other researchers - and anyone interested in English speech, not just academics! - to find whatever they may be looking for in that vast collection of recordings.

The original audio recordings were transcribed in ordinary English spelling by professional audio typists. The typed transcripts were time-aligned to the audio using forced alignment (an application of automatic speech recognition technology). The Praat TextGrids deposited in this collection are the resulting transcription data files.

Identifier
DOI https://doi.org/10.5255/UKDA-SN-851496
Metadata Access https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=dd537122993be872c9bd42c67dde9d3cdfbd384f5fa5b840947d301669c51f30
Provenance
Creator Coleman, J, University of Oxford
Publisher UK Data Service
Publication Year 2014
Funding Reference ESRC
Rights John Coleman, University of Oxford. Ladan Baghai-Ravary, . Rosalind Temple,; The Data Collection is available to any user without the requirement for registration for download/access.
OpenAccess true
Representation
Language English
Resource Type Numeric
Discipline Social Sciences
Spatial Coverage United Kingdom