The crac2026_empty_nodes_baseline is a XLM-RoBERTa-large–based multilingual model for CRAC 2026 Empty Nodes Baseline system https://github.com/ufal/crac2026_empty_nodes_baseline for predicting empty nodes in the input CoNLL-U files, trained on CorefUD 1.4 data. It was was used to generate baseline empty nodes prediction in the CRAC 2026 Shared Task on Multilingual Coreference Resolution https://ufal.mff.cuni.cz/corefud/crac26.
The model is language agnostic, so in theory it can be used to predict coreference in any XLM-RoBERTa language.
Compared to the last year CRAC 2025 Empty Nodes Baseline https://github.com/ufal/crac2025_empty_nodes_baseline, this year's baseline predicts all available information for the empty nodes, i.e., including forms, lemmas, UPOS, XPOS, and FEATS columns, in addition to previously predicted word order and dependency relations of the empty nodes.
Instructions for running prediction, training, and intrinsic evaluation are all available in the repository CRAC 2026 Empty Nodes Baseline https://github.com/ufal/crac2026_empty_nodes_baseline.