Slovenian legal natural language inference dataset SLawNLI

Dataset

PID

SLawNLI is a human-annotated dataset for Natural Language Inference (NLI) in the Slovenian legal domain. It contains 2,214 examples constructed according to the standard NLI schema (premise, hypothesis, label). The dataset was annotated by four master's students of the Faculty of Law. All examples were hand-validated by a researcher from the Institute of Criminology and a practicing lawyer.

The dataset is derived from four Slovenian laws:

Kazenski zakonik (KZ-1) — Criminal Code (https://pisrs.si/pregledPredpisa?id=ZAKO5050)
Stvarnopravni zakonik (SPZ) — Law of Property Code (https://pisrs.si/pregledPredpisa?id=ZAKO3242)
Zakon o varstvu osebnih podatkov (ZVOP-2) — Personal Data Protection Act (https://pisrs.si/pregledPredpisa?id=ZAKO7959)
Obligacijski zakonik (OZ) — Obligations Code (https://pisrs.si/pregledPredpisa?id=ZAKO1263)

The dataset is provided in JSONL format.

Identifier
PID	http://hdl.handle.net/11356/2100
Related Identifier	https://www.cjvt.si/llm4dh/
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/2100

Provenance
Creator	Malenšek, Miha; Krajnc, Saša; Križnar, Primož; Završnik, Aleš; Bajec, Marko; Žitnik, Slavko
Publisher	Faculty of Computer and Information Science, University of Ljubljana
Publication Year	2026
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline	Linguistics