In previous work, we reported corpus evidence supporting earlier claims on age and regional variation of a small set of DGS signs mentioned in the literature. With the quantity of annotation available for the DGS corpus growing over time, it should be our aim, however, to not only provide data on “the usual suspects”, but also to detect variation.
While the DGS corpus was designed to be balanced with respect to age, regional distribution and thematic coverage, basic annotation progresses following other criteria. This means that at this point of time we cannot make strong claims about the balancedness of the annotated part of the DGS corpus. As a consequence, it is not possible to define rigid statistical measures to identify types undergoing variation. It is an option, of course, to work on a stable sub-corpus. However, this would substantially reduce the size of the data to be investigated, especially in the case of regional variation.
Instead, we follow a two-step procedure:
We use a heterogeneity index as a means to identify candidates for signs undergoing variation.
For each of these candidate signs, we determine the sign environment. Within these clusters, stronger measures can be applied to verify the variation hypotheses.
In order to compute a heterogeneity measure for the first step, we explored two options:
Standard deviations for each subgroup (age group or region)
Regression analysis: Linear regression over the age of the participants and, correspondingly, two-dimensional linear regression in the case of regional distribution.
Obviously, regression is more sensitive to some distributions than to others. In both cases, we argument that this sensitivity makes sense here: In the case of age, it is a common understanding that the larger the age difference, the more visible differences become, whether they originate in language change or group specifics. In the case of regions, this approach translates to the “axiom of geolinguistics” (the larger the geographical distance, the larger the linguistic distance). While this axiom has to be confirmed especially in the context of sign languages, it is not at odds with DGS variations reported in the literature (often tagging signs as “North”, “South”, “Bavaria” etc.) nor with German history (with respect to the East-West divide).
Experience shows that the two approaches deliver the same candidates for the clear-cut cases, but that regression is less prone to noise, helping to identify the not-so-pronounced cases.
In the presentation, we discuss a number of variation cases (including combinations of age and regional variation) and explain the thresholds we found practical for our analysis.
From our findings, we arrive at a – necessarily very rough – map of geographical areas that seem to be at the heart of many regional variations observed. Eventually this will lead to a map of regional variations (“dialects”) of DGS based on empirical data.