Annotated corpus of Macedonian language-related news articles MetaLangNEWS-Mk

Dataset

PID

A comprehensive corpus of news articles on the topic of language, published in major Macedonian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, ongoing in post-Yugoslav societies. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of standard Macedonian. Transcription into the Latin script was performed according to the standard used for official documents (ICAO Doc 9303). The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. Parallel versions from Slovenia (http://hdl.handle.net/11356/1360), Croatia (http://hdl.handle.net/11356/1369), and Serbia (http://hdl.handle.net/11356/1371) are also available.

Identifier
PID	http://hdl.handle.net/11356/1652
Related Identifier	https://ikss.zrc-sazu.si/en/programi-in-projekti/re-imagining-language-nation-and-collective-identity-in-the-21st-century#v
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1652

Provenance
Creator	Bogetić, Ksenija; Radošević, Petar; Batanović, Vuk
Publisher	ZRC SAZU; Regional Linguistic Data Initiative Centre ReLDI
Publication Year	2022
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Macedonian
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 3
Discipline	Linguistics