-
DeriNet 1.5
DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational... -
Manually Ranked Translation Outputs
Manually ranked outputs of Czech-Slovak translations. Three annotators manually ranked outputs of five MT systems (Česílko, Česílko2, Google Translate and two Moses setups) on... -
Medieval Charter Sections Corpus
This package provides an evaluation framework, training and test data for semi-automatic recognition of sections of historical diplomatic manuscripts. The data collection... -
Open morphology of Finnish
Omorfi is free and open source project containing various tools and data for handling Finnish texts in a linguistically motivated manner. The main components of this repository... -
Linguistic digital repository based on DSpace 5.2
One of the goals of LINDAT/CLARIN Centre for Language Research Infrastructure is to provide technical background to institutions or researchers who wants to share their tools... -
MSTperl parser
MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser)... -
WMT21 Marian translation models (ca-ro,it,oc)
Marian multilingual translation model from Catalan into Romanian, Italian and Occitan. Primary CUNI submission for WMT21 Multilingual Low-Resource Translation for Indo-European... -
Italian Content Words v2
This resource is the second version of an Italian morphological dictionary for content words, encoded in a JSON Lines format text file. It contains correspondences between... -
sqad 2.1
Simple question answering database version 2.1 (SQAD_v2.1) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization... -
NomadLingo1.0 open
The corpus NomadLingo1.0 contains transcripts of extracts from naturally-occurring conversations which were audio-recorded between November 2023 and April 2024 at social events... -
Parlement of Foules, a digital diplomatic edition
A digital edition of the Middle English poem “Parlement of Foules” by Geoffrey Chaucer, featuring a diplomatic transcription of the text found in MS Gg.4.27(1), Cambridge... -
google
google -
WordnetLoom 2
Aplikacja do edycji i budowy słowosieci -
Core Metadata Schema for Learner Corpora (version 1)
The Core Metadata Schema for Learner Corpora is an extensive revision of Granger & Paquot's (2017) Core Metadata [Schema] for Learner Corpora Draft 1.0 in the field of... -
Core Metadata [Schema] for Learner Corpora Draft 1.0
First proposal towards a "Core Metadata [Schema] for Learner Corpora", presented at the "CLARIN workshop on Interoperability of Second Language Resources and Tools", Gothenburg,... -
Core Metadata Schema for Learner Corpora (version 2)
This document contains a list of metadata fields that can be used to describe learner corpus data. The core metadata scheme is structured around 8 metadata types: -... -
KrdWrd CANOLA Corpus 1.0
The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and... -
Code preference in OLL of accommodation in Palma
The file consists of a database in .SAV format (SPSS) of language choice and preference as reflected in the websites of accommodation establishments in the city of Palma de... -
Vita Vergilii / digital edition published by digilibLT digital library of lat...
Correzione linguistica Chiara Miglietta Codifica XML Simona Musso HomePage del progetto: https://digiliblt.uniupo.it/ Documentazione: https://digiliblt.uniupo.it/progetto.php -
al-qāmūs l-muḥīṭ: a digital Arabic dictionary: letter tāʾ
Dossier letter tāʾ contains: TXT file: part of plain text corresponding of the section of the letter tāʾ XML files without translation: conversion of text into XML resulting...
