Provenance of annotation: a survey on multiple annotations for applications in computational linguistics and the digital humanities

DOI

Poster presented at the "Workshop on Data Provenance and Annotation in Computational Linguistics 2018" in Prague (co-located with TLT16).

Abstract: 

It is standard practice that corpora provide unambiguous annotations even if the reported annotations result from a set/sequence of annotation decisions. These decisions and the evidence they are based on are often not documented in the final resource and hence lost for the corpus user. Previous work on learning form multiple annotations in terms of human annotator biases (e.g. Beigman and Klebanov 2009, more recently Plank et al. 2014) show the relevance for integrating multiple annotations in corpora for computational linguistics. An additional motivation for representing annotation decisions and keeping multiple annotations in a corpus is given in the context of digital humanities. For example in literary and social science studies, the textual information itself does not always provide enough information for the annotation but the annotators have to fill it in by external knowledge and inferences which are necessarily subjective to a certain degree. The motivation of adding such subjective annotation to a corpus is that the variety in the annotation is used as an additional source for the overall (humanities) interpretation of the data (e.g. Gius and Jacke 2017). In this poster we (i) give a survey on scenarios in which multiple annotations are created, (ii) present a taxonomy on different types of multiple annotations and discuss whether it needs to be extended by the concept of vagueness (especially in the context of digital humanities projects), and (iii) sketch solutions that have been suggested for making this kind of provenance evidence available in corpora.

In automatic annotation there are two scenarios that result in multiple annotations for one item:

If the tool’s output is a probability distribution over possible annotations, it can be reported at least partially as an n-best list
In ensemble approaches the (weighted) results of the individual taggers can be reported together with the winning tag.

In manual annotation we distinguish three (non-exclusive) settings of multiple annotations:

Annotation disagreement by trained annotators, possibly as part of an annotation cycle (MAMA, Pustejovsky & Stubbs 2012) that involves evaluation, discussion, adjudication, and revising the annotation guidelines–or as part of a humanities annotation;
Annotation disagreement by independent and mostly anonymous crowd workers;
Detected ambiguity or vagueness when annotators are asked to provide more than one interpretation (e.g. ambiguous readings; or first to the mind and alternative interpretation).

For the analysis of disagreements and multiple annotations on a conceptual level (for automatic and manual annotation) EAGLES (1996) distinguishes between two types of “descriptively incomplete phenomena”: underspecification where a “distinction between the different values of an attribute is not relevant”, and ambiguity which is a “lack of information where there is uncertainty between two or more alternative descriptions”. Barteld et al. (2014) extend this binary distinction to a three-way model that captures the difference between actual ambiguity, where there is no single best analysis, and uncertainty, which could be overcome by additional information (more context information, more training instances, better guidelines etc.), in addition to underspecification in the sense that the language system does not distinguish a given feature. The latter of which could be resolved by defining a more coarse-grained tagset. We will discuss whether we need to distinguish an additional type of ambiguity in terms of data vagueness relative to an envisaged annotation target, which is the case when annotators need to draw on subjective information such as inferences that involve world knowledge and personal experiences to annotate the data.

Based on the annotation scenarios and the conceptual taxonomy, we will summarize which features need to be taken into account for providing annotation provenance for subsequent corpus users. Approaches of making (evidence of) multiple annotations accessible in corpora in a sustainable way include tagset-internal solutions such as portmanteau tags or hierarchical tagsets (e.g. the PDTB Research Group 2008, Müller et al. 2010), as well as representation-based solutions such as specific XML-inline formats (Ule 2004), and generic XML stand-off formats (Chiarcos et al., 2008, Ide and Romary, 2004, Ide and Suderman, 2007).

References

Barteld, Fabian, Sarah Ihden, Ingrid Schröder, and Heike Zinsmeister. 2014. Annotating descriptively incomplete language phenomena. In Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop, 99-104. Dublin, Ireland.

Beigman, Eyal, and Beata Beigman Klebanov. 2009. Learning with Annotation Noise. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, 280–87.

Chiarcos, Christian, Stefanie Dipper, Michael Götze, Ulf Leser, Anke Lüdeling, Julia Ritz, and Manfred Stede. 2008. A flexible framework for integrating annotations from different tools and tagsets. Traitement Automatique des Langues, 49(2):271–293.

EAGLES. 1996. Recommendations for the morphosyntactic annotation of corpora. EAGLES document EAGTCWG-MAC/R. Technical report.

Gius, Evelyn, and Janina Jacke. 2017. The Hermeneutic Profit of Annotation: On Preventing and Fostering Disagreement in Literary Analysis. International Journal of Humanities and Arts Computing 11 (2):233–54.

Ide, Nancy and Laurent Romary. 2004. International standard for a linguistic annotation framework. Journal of Natural Language Engineering, 10(3-4):211–225.

Ide, Nancy and Keith Suderman. 2007. GrAF: A graph-based format for linguistic annotation. In Proceedings of the Linguistic Annotation Workshop (LAW), pages 1–8, Prague, Czech Republic.

Müller, Antje, Hülscher, Olaf, Roch, Claudia, Keßelmeier, Katja, Stadtfeld, Tobias, Strunk, Jan und Kiss, Tibor. 2010. An annotation schema for preposition senses in German. In Proceedings of the Fourth Linguistic Annotation Workshop (LAW IV), pages 177- 181. Uppsala, Sweden.

The PDTB Research Group. 2008. The PDTB 2.0. Annotation Manual. Technical Report IRCS-08-01. Institute for Research in Cognitive Science, University of Pennsylvania.

Pustejovsky, James and Amber Stubbs. 2012. Natural language annotation for machine learning . O’Reilly, Beijing [a.o.].

Ule, Tylmann. 2004. Markup manual for the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z). Technical report, University of Tübingen

This work has been supported by the Landesforschungsförderung Hamburg (LFF-FV 35).

Identifier
DOI https://doi.org/10.25592/uhhfdm.16178
Related Identifier IsPartOf https://doi.org/10.25592/uhhfdm.16177
Metadata Access https://www.fdr.uni-hamburg.de/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:fdr.uni-hamburg.de:16178
Provenance
Creator Zinsmeister, Heike ORCID logo
Publisher Universität Hamburg
Publication Year 2018
Rights Creative Commons Attribution 4.0 International; Open Access; https://creativecommons.org/licenses/by/4.0/legalcode; info:eu-repo/semantics/openAccess
OpenAccess true
Representation
Language English
Resource Type Poster; Text
Version 1.0
Discipline Humanities; Linguistics