Replication Data for: A serial founder effect model of phonemic diversity based on phonemic loss in low-density populations

Dataset

DOI

It has been observed that the number of phonemes in languages in use today tends to decrease with increasing distance from Africa. A previous formal model has recently reproduced the observed cline, but under two strong assumptions. Here we tackle the question of whether an alternative explanation for the worldwide phonemic cline is possible, by using alternative assumptions. The answer is affirmative. We show this by formalizing a proposal, following Atkinson, that this pattern may be due to a repeated bottleneck effect and phonemic loss. In our simulations, low-density populations lose phonemes during the Out-of-Africa dispersal of modern humans. Our results reproduce the observed global cline for the number of phonemes. In addition, we also detect a cline of phonemic diversity and reproduce it using our simulation model. We suggest how future work could determine whether the previous model or the new one (or even a combination of them) is valid. Simulations also show that the clines can still be present even 300 kyr after the Out-of-Africa dispersal, which is contrary to some previous claims which were not supported by numerical simulations

The zip file contains the following documents and files: - S1 Text: Supplementary results in DOCX, with different graphic simulations that complement the results mentioned in the published article. Graphics have been calculated from the data collected in the "Language database" - S1 Database in XLSX. It is the Language database that contains the list of phonemes for 359 languages. For each language are provided the number of phonemes and the distance from the origin of the out-of-Africa. For these 359 languages, 908 different phonemes have been found. First, all languages in the dataset were coded in strings of "1" and "0". This leads to a "full" matrix of 359 rows (languages) x 908 columns (phonemes). The presence of a phoneme is marked with a "1" in the corresponding position. The absence of a given phoneme is marked with a "0". Data from this database are used to generate the observed phonetic cline and the simulated phonemic cline, explained in the published article. - S1 Software: SFE (serial founder effect) with phonemic loss program in FORTRAN - S2 Software: Program to compute diversity tF of languages at given distance intervals in FORTRAN

Identifier
DOI	https://doi.org/10.34810/data671
Metadata Access	https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data671

Provenance
Creator	Pérez Losada, Joaquim ; Fort, Joaquim
Publisher	CORA.Repositori de Dades de Recerca
Publication Year	2023
Funding Reference	Agència de Gestió d'Ajuts Universitaris i de Recerca 2017-SGR-243 ; Ministerio de Economía, Indústria y Competividad (MINECO) FIS2016-80200-P ; Institució Catalana de Recerca i Estudis Avançats (ICREA) Academia Humanities award 2014
Rights	CC BY 4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess	true

Representation
Resource Type	Compiled data; Dataset
Format	application/zip; text/plain
Size	3100979; 2272
Version	1.1
Discipline	Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Life Sciences; Social Sciences; Social and Behavioural Sciences; Soil Sciences