Replication Data for: The semantic structuring of minimizing constructions in present-day Netherlandic Dutch: a distribution-based cluster analysis

DOI

Dataset abstract: This dataset contains the data files that were used for the cluster analysis of the Dutch minimizing construction, as described in the publication cited below. In addition to a ReadMe file, it contains three files:

A txt file is provided with the corpus queries that were used to find tokens of the minimizing constructions in the Dutch Web 2014 (nlTenTen14) corpus, available via Sketch Engine (more information about the TenTen corpora: Jakubíček, M., A. Kilgarriff, V. Kovář, P. Rychlý & V. Suchomel (2013). The TenTen corpus family. In: 7th International Corpus Linguistics Conference CL. Lancaster, 125–127). A csv file is provided that forms the input file for the cluster analysis. It contains a list of 5,863 minimizer-predicate combinations, more specifically a list of the predicates that are combined with the minimizers that have a token frequency of at least 10 in my dataset. An R-script is provided with the code to perform the cluster analysis in R.

Article abstract: This paper examines the semantic structuring of a paradigm of 89 minimizers, i.e., nouns that reinforce sentential negation in present-day Netherlandic Dutch, such as meter ‘meter’ in voor geen meter vertrouwen ‘not to trust for a meter’. Cosine distances are computed on the basis of the predicates the minimizers combine with in a sample of 100 tokens downloaded from the Dutch Web corpus 2014 (nlTenTen14) and clustered according to the Partitioning Around Medoids (PAM) algorithm into nine semantic clusters. The clusters largely correspond to semantic categories such as taboo terms or units of money. This suggests that, in general, minimizers belonging to the same semantic domain are combined with a similar (core) set of predicates. Based on the shared predicates per cluster, we detect signs of analogical attraction between minimizers or, conversely, competition. Crucially, low silhouette widths enable us to identify outliers in their respective clusters, for instance, minimizing nouns that exhibit signs of context expansion, as shown by their combination with semantically non-harmonious verbs. As such, this paper provides a synchronic snapshot of the semantic processes involved in (incipient) grammaticalization of minimizing nouns and, more in general, it illustrates how distributional semantics offers a heuristic to analyze the structure of a network of comparable micro-constructions.

R, 4.4.2

R Studio, 2025.05.1

Identifier
DOI https://doi.org/10.18710/GIKMKM
Related Identifier IsCitedBy https://doi.org/10.5117/nedtaa2024.3.003.heed
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/GIKMKM
Provenance
Creator Van den Heede, Margot ORCID logo; Lauwers, Peter ORCID logo
Publisher DataverseNO
Contributor Van den Heede, Margot; Ghent University; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year 2025
Funding Reference Special Research Fund for Concerted Research Actions - Ghent University
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Van den Heede, Margot (Ghent University)
Representation
Resource Type corpus data; Dataset
Format text/plain; text/comma-separated-values; type/x-r-syntax
Size 6170; 13467; 95929; 1369
Version 1.0
Discipline Humanities