CLARIN - Repositories

DeriNet 1.5

DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational...

Manually Ranked Translation Outputs

Manually ranked outputs of Czech-Slovak translations. Three annotators manually ranked outputs of five MT systems (Česílko, Česílko2, Google Translate and two Moses setups) on...

Medieval Charter Sections Corpus

This package provides an evaluation framework, training and test data for semi-automatic recognition of sections of historical diplomatic manuscripts. The data collection...

Open morphology of Finnish

Omorfi is free and open source project containing various tools and data for handling Finnish texts in a linguistically motivated manner. The main components of this repository...

Linguistic digital repository based on DSpace 5.2

One of the goals of LINDAT/CLARIN Centre for Language Research Infrastructure is to provide technical background to institutions or researchers who wants to share their tools...

MSTperl parser

MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser)...

WMT21 Marian translation models (ca-ro,it,oc)

Marian multilingual translation model from Catalan into Romanian, Italian and Occitan. Primary CUNI submission for WMT21 Multilingual Low-Resource Translation for Indo-European...

Italian Content Words v2

This resource is the second version of an Italian morphological dictionary for content words, encoded in a JSON Lines format text file. It contains correspondences between...

sqad 2.1

Simple question answering database version 2.1 (SQAD_v2.1) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization...

NomadLingo1.0 open

The corpus NomadLingo1.0 contains transcripts of extracts from naturally-occurring conversations which were audio-recorded between November 2023 and April 2024 at social events...

Parlement of Foules, a digital diplomatic edition

A digital edition of the Middle English poem “Parlement of Foules” by Geoffrey Chaucer, featuring a diplomatic transcription of the text found in MS Gg.4.27(1), Cambridge...

google

WordnetLoom 2

Aplikacja do edycji i budowy słowosieci

Core Metadata Schema for Learner Corpora (version 1)

The Core Metadata Schema for Learner Corpora is an extensive revision of Granger & Paquot's (2017) Core Metadata [Schema] for Learner Corpora Draft 1.0 in the field of...

Core Metadata [Schema] for Learner Corpora Draft 1.0

First proposal towards a "Core Metadata [Schema] for Learner Corpora", presented at the "CLARIN workshop on Interoperability of Second Language Resources and Tools", Gothenburg,...

Core Metadata Schema for Learner Corpora (version 2)

This document contains a list of metadata fields that can be used to describe learner corpus data. The core metadata scheme is structured around 8 metadata types: -...

KrdWrd CANOLA Corpus 1.0

The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and...

Code preference in OLL of accommodation in Palma

The file consists of a database in .SAV format (SPSS) of language choice and preference as reflected in the websites of accommodation establishments in the city of Palma de...

Vita Vergilii / digital edition published by digilibLT digital library of lat...

Correzione linguistica Chiara Miglietta Codifica XML Simona Musso HomePage del progetto: https://digiliblt.uniupo.it/ Documentazione: https://digiliblt.uniupo.it/progetto.php

al-qāmūs l-muḥīṭ: a digital Arabic dictionary: letter tāʾ

Dossier letter tāʾ contains: TXT file: part of plain text corresponding of the section of the letter tāʾ XML files without translation: conversion of text into XML resulting...

1,494 datasets found