CRAC 2026 Empty Nodes Baseline Model

PID

The crac2026_empty_nodes_baseline is a XLM-RoBERTa-large–based multilingual model for CRAC 2026 Empty Nodes Baseline system https://github.com/ufal/crac2026_empty_nodes_baseline for predicting empty nodes in the input CoNLL-U files, trained on CorefUD 1.4 data. It was was used to generate baseline empty nodes prediction in the CRAC 2026 Shared Task on Multilingual Coreference Resolution https://ufal.mff.cuni.cz/corefud/crac26.

The model is language agnostic, so in theory it can be used to predict coreference in any XLM-RoBERTa language.

Compared to the last year CRAC 2025 Empty Nodes Baseline https://github.com/ufal/crac2025_empty_nodes_baseline, this year's baseline predicts all available information for the empty nodes, i.e., including forms, lemmas, UPOS, XPOS, and FEATS columns, in addition to previously predicted word order and dependency relations of the empty nodes.

Instructions for running prediction, training, and intrinsic evaluation are all available in the repository CRAC 2026 Empty Nodes Baseline https://github.com/ufal/crac2026_empty_nodes_baseline.

Identifier
PID http://hdl.handle.net/11234/1-6081
Related Identifier https://github.com/ufal/crac2026_empty_nodes_baseline
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-6081
Provenance
Creator Straka, Milan
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2026
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Catalan; Valencian; Czech; Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic; Spanish; Castilian; Greek, Ancient (to 1453); Hungarian; Polish; Turkish
Resource Type toolService
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics