English Liquid Consonants Ultrasound Tongue Imaging Video Database, 2016-2019

DOI

The English Liquids video corpus is a subset of the ultrasound speech corpus generated from the Changes in Shape, Space and Time, project, and two other ultrasound corpora (Dynamic Dialects, and the PhD research of Dr Hannah King, Université Paris III Sorbonne Nouvelle, Institut du monde anglophone). All three corpora were recorded at the CASL ultrasound recording studio, Speech and Hearing Sciences Division, Queen Margaret University, Edinburgh. Participants were aged 18+ and were recruited via local advertisements. Recordings were made using a Sonix RP ultrasound machine (220fps) with a stabilising headset and headset mounted lip cameras (30fps). Composite (tongue and lip video combined), annotated (with demographic data and stimuli) and concatenated videos were created of single-word utterances containing liquid consonants, played at normal, then reduced speed. These videos show synchronised tongue and lip movement (in most cases profile and front-facing lip views) and demographic information about each speaker is included as subtitles, as is the stimulus. The purpose of the English Liquids corpus is to illustrate, for e.g. speech therapists, their clients, English language teachers and learners, broad place and manner categories of liquid consonant found in English, as well as rarer ones that may result from ongoing sound change. The three UTI corpora mentioned above were surveyed by phonetician Dr. Eleanor Lawson, and liquid consonant videos were chosen based on clarity of UTI video and audio, and also with a view to evidence the same variant used in as many accents as possible, and in all syllable positions. Videos showing the same variant in the same syllabic context, e.g. syllable onset, or coda, or intervocalic position, were composited, annotated and concatenated into a single video to allow comparison of articulatory strategies across speakers. Categories of /l/ evidenced are: clear/palatalised; dark/velarised; vocalised; /l/ with reduced tongue-tip gesture and interdental /l/, using words: lull, real, level; little; lull; feel; girl; muddle. A total of 19 speakers were sampled from: Canada (Ontario); England (Manchester; Newcastle; Sheffield; Southampton; N. Yorkshire); Rep. Ireland (Dublin); N. Ireland (Co. Antrim); Scotland (Fife; S. Lanarkshire; Perthshire. W. Lothian); U.S.A. (Georgia; Maryland; Michigan; N. Carolina; Rhode Island; San Jose), and West Indies (Trinidad). Categories of /r/ evidenced are: labialised /r/; labiodental /r/; bunched /r/; retroflex /r/, tip-up /r/; tapped /r/; trilled /r/ and delayed/devoiced /r/, using the words: agreed; air; arrow; brewed; cure; err; far; fur; girl; greed; hear; hearing; hear it; more; near; nurse; poor; prize; rack; real; read; red; ring; risks; root; run; three; and worm. A total of 36 speakers were sampled from: Canada, (British Columbia; Ontario); England (Chester; Cumbria; Darlington; Isle of Man; Kent; Manchester; Newcastle; Oxford; Plymouth; Sheffield; Southampton; N. Yorkshire); Rep. Ireland (Co. Monaghan; Co. Tipperary; Dublin); N. Ireland (Co. Antrim); N.Z. (Christchurch, West Coast South Island); Scotland (Aberdeenshire; Black Isle; Fife; Edinburgh; S. Lanarkshire; W. Lothian; Perthshire; Renfrewshire;); U.S.A. (Georgia; Los Angeles; Maryland; Michigan; N. Carolina; Oregon; Rhode Island; San Jose), and West Indies (Trinidad).The speech sounds 'L' and 'R' are often grouped together as a class (called 'liquid consonants'), because they are similar in a number of ways. For example, although they function as consonants in speech, they have a vowel-like phonetic quality. The are also among the most complex speech sounds to produce (and may be late acquired by children or hard for adult learners to master). They vary widely in different accents of the same language. Finally, their production can involve the tongue forming multiple constrictions in the vocal tract and they sometimes involve specific movements of the lips as well. Although speakers are not always aware of it, the 'L' sounds at the beginning and end of a word like 'level' do not sound exactly the same. Likewise the R' sounds at the beginning and end of a word like 'roar' (for those so-called 'rhotic' speakers who pronounce an 'R' at the end of 'roar' at all!) do not sound exactly the same. Behind the difference in sound quality is complex variation in (i) the way the articulatory organs synchronise their movements (ii) the strength of the production of the speech sound and (iii) the shape of the tongue when the speech sound is produced. When an 'L' or 'R' at the beginning of a word is pronounced, the speech organ movements involved tend to be more tightly synchronised than for an 'L' or 'R' at the end of a word. Also, 'L' and 'r' at the beginning of words are produced with more effort than they are at the ends of words. Finally, the tongue shapes involved in the production of 'L' and 'R' at the beginning and ends of words can be radically different from one another. These remarkable differences are very hard to measure, but research over many decades has addressed and raised a number of theoretical questions. Variation in these three parameters can cause very noticeable changes in the way 'L' and 'R' sound, explaining why, at the end of words, they seem less like consonants and more like vowels, e.g. making 'foal' and 'foe' sound very similar. The consonant might even disappear altogether, as occurred 200 years ago to 'R' at the end of the words in the RP accent of English. Thus, subtle variation in speech production can result in big changes in the long term. However, not all accents of English show the same patterns, or change at the same rate. While American and Irish English mostly have strong 'R' sounds at the end of words, word-final 'R' is starting to sound very weak and even be lost in some Scottish accents. This project will use a vocal-tract imaging technique, ultrasound tongue imaging (UTI), to directly study the way the tongue moves inside the mouth when it is producing 'L' and 'R', informing theories of speech articulation. The movement of the lips will also be recorded, as they play an important part in the production of English 'L' and 'R' too. We will record differences in the timing of movements of different parts of the tongue and the lips, how extreme the movements are and how different the shape the tongue is when it is producing 'L' and 'R' in different positions within the word. We will also look at what happens to 'L' and 'R' across longer domains too, as it has been shown that the greatest changes in the way these sounds are produced are found when 'L' and 'R' occur at the beginning and end of speech utterances longer than single words. We will study how changes in the movements of the vocal organs correlate with changes in the acoustic speech signal and we will identify which kinds of variation in vocal organ movement are most likely to make 'L' and 'R' sound weak, vowel-like or missing. Our research will focus on three key varieties of a single language in which 'R' is pronounced at the beginnings and ends of words, i.e. Scottish, Irish and American English. We will thus be able to address regional and historical variation within an otherwise well-understood language using novel methods to address theoretical questions relevant to all languages.

Ultrasound tongue imaging - a medical ultrasound probe, placed under the speaker's chin to capture the real-time, sagittal movements of the tongue from root-to-tip during speech production, with synchronised audio recording, and headset-mounted video camera to capture synchronised lip movement (NTSC format). Tongues were imaged using using a Sonix RP medical ultrasound machine, imaging at (220fps) and recorded using Articulate Assistant Advanced software (v2.16.12) (Articulate Instruments) which captured and synchronised the ultrasound data, audio and lip video. A total of 36 speakers (aged: 18+) were selected from three ultrasound corpora, all of which were recorded at the ultrasound recording studio CASL lab, Queen Margaret University, Edinburgh. Speakers came from the following countries and locations (speaker sex is indicated as M/F). Where speakers came from large towns or cities, the town/city is named; however, if they came from a rural area, their county/region is used to avoid making participants identifiable to the public: Canada, (British Columbia F; Ontario F); England (Chester F; Cumbria F; Darlington M; Isle of Man F; Kent F; Manchester F; Newcastle M; Oxford F; Plymouth M; Sheffield M; Southampton M; N. Yorkshire F); Rep. Ireland (Co. Monaghan F; Co. Tipperary F; Dublin F); N. Ireland (Co. Antrim F); N.Z. (Christchurch M, West Coast South Island F); Scotland (Aberdeenshire F; Black Isle F; Fife M; Edinburgh F; S. Lanarkshire F; W. Lothian M; Perthshire M; Renfrewshire M); U.S.A. (Georgia F; Los Angeles F; Maryland M; Michigan F; N. Carolina F; Oregon F; Rhode Island F; San Jose M), and West Indies (Trinidad F). Material from the three corpora was selected based on image/audio quality by Eleanor Lawson, organised into liquid consonant categories and syllable locations. Ultrasound, lip video and audio were combined and annotated with demographic data and word stimulus using, and concatenated AviSynth and VirtualDub softwares. Thereafter .avi videos were converted to MP4 format.

Identifier
DOI https://doi.org/10.5255/UKDA-SN-854794
Metadata Access https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=9eeeb9ae356f9fc973db50dfaf994f6a2948fd43c02b6965cf77ccbc39007643
Provenance
Creator Lawson, E, Queen Margaret University
Publisher UK Data Service
Publication Year 2021
Funding Reference Economic and Social Research Council
Rights Eleanor Lawson, Queen Margaret University; The Data Collection is available to any user without the requirement for registration for download/access.
OpenAccess true
Representation
Resource Type Video
Discipline Humanities; Linguistics
Spatial Coverage United Kingdom; Ireland; U.S.A. Canada; Trinidad; New Zealand; United Kingdom