Slovenian legal natural language inference dataset SLawNLI

PID

SLawNLI is a human-annotated dataset for Natural Language Inference (NLI) in the Slovenian legal domain. It contains 2,214 examples constructed according to the standard NLI schema (premise, hypothesis, label). The dataset was annotated by four master's students of the Faculty of Law. All examples were hand-validated by a researcher from the Institute of Criminology and a practicing lawyer.

The dataset is derived from four Slovenian laws:

The dataset is provided in JSONL format.

Identifier
PID http://hdl.handle.net/11356/2100
Related Identifier https://www.cjvt.si/llm4dh/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/2100
Provenance
Creator Malenšek, Miha; Krajnc, Saša; Križnar, Primož; Završnik, Aleš; Bajec, Marko; Žitnik, Slavko
Publisher Faculty of Computer and Information Science, University of Ljubljana
Publication Year 2026
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics