Siganalogies - morphological analogies from Sigmorphon 2016 and 2019

DOI

The siganalogies dataset contains morphological analogies built upon Sigmorphon 2016 and Sigmorphon 2019 in PyTorch.

An analogical proportion is defined as a 4-ary relation written A:B::C:D and which reads "A is to B as C is to D". In this dataset, we manipulate morphological analogies, i.e., on analogies involving character strings, where the transformations between the objects correspond to morphological transformations of words (e.g., conjugation or declension). In our dataset, A, B, C, and D are words. An example in English would be "dog : dogs :: cat : cats".

The dataset contains: (i) a copy of Sigmorphon 2019 and Sigmorphon 2016 extended with Japanese data, (ii) serialized objects, one for each language, containing the indices of the analogies and other relevant data, and (iii) the code necessary to manipulate the dataset and serialized data.

Python, 3.8

PyTorch, 1.10

Identifier
DOI https://doi.org/10.12763/MLCFIE
Metadata Access https://dorel.univ-lorraine.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.12763/MLCFIE
Provenance
Creator Marquer, Esteban ORCID logo; Couceiro, Miguel ORCID logo; Safa Alsaidi ORCID logo; Amandine Decker ORCID logo
Publisher Université de Lorraine
Contributor Marquer, Esteban; Couceiro, Miguel; Miguel Couceiro; Esteban Marquer; Amandine Decker; Safa Alsaidi; Putineath Lay; Pierre-Alexandre Murena
Publication Year 2022
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Marquer, Esteban (LORIA); Couceiro, Miguel (LORIA)
Representation
Resource Type Software; Dataset
Format text/x-python; text/plain; charset=UTF-8; application/octet-stream; text/x-c; text/plain; charset=US-ASCII; text/markdown; application/x-sh; application/pdf
Size 4085; 491901; 29488067; 29487630; 376216; 3222331; 3222025; 20708; 3780; 123589; 123354; 124720; 1188153; 1010156; 971881; 217403; 2215342; 2024467; 1765665; 127394; 1292664; 1101789; 1031022; 4757991; 4757756; 3261751; 511236; 3992151; 3991666; 3261516; 525240; 3518582; 3518079; 394739; 5010604; 5010259; 3524; 3441; 2089; 3429; 15036; 406010; 31364330; 31363765; 735662; 618759; 618535; 459599; 8411767; 8411328; 5446; 5879; 3385; 240528; 1413423; 1412863; 5243; 3597; 3645; 2836; 3606; 552532; 5040113; 5039755; 2315; 3515; 3555; 2293; 3499; 8695; 1828; 1907; 1460; 3679; 2509; 2471; 1426; 2409; 36327; 36600; 24776; 363250; 4998595; 4998016; 3644; 328161; 58060130; 58059692; 338226; 16362513; 16361987; 284611; 79841865; 79841284; 361130; 5302211; 5301917; 3562; 239178; 239010; 91444; 1347246; 1023384; 724899; 147690; 2209970; 1881055; 1183464; 96050; 1434607; 1105692; 767067; 38223748; 38223580; 11133979; 436956; 4431095; 4430650; 11133811; 363564; 8472210; 8471756; 3162; 3147; 2294; 3212; 613647; 613410; 114618; 1270158; 801569; 844746; 157420; 2364729; 1721170; 1260281; 121525; 1827105; 1183546; 975290; 67452145; 67451908; 28874048; 28873811; 191712; 191438; 96636; 926790; 735114; 755352; 158781; 1593334; 1393348; 1277457; 99778; 998262; 798276; 797138; 12208043; 12207769; 8266757; 332532; 17013160; 17012572; 8266483; 1042; 53630; 54098; 32413; 538564; 6865983; 6864802; 5396; 360448; 8887017; 8886684; 731299; 2327792; 2327172; 620120; 619944; 156065; 149716; 120681; 1074740; 275313; 272914; 242379; 2191185; 169074; 166953; 136418; 1347672; 584017; 583847; 26502160; 362105; 8785781; 8785243; 26501990; 1267; 1252; 847; 2589; 616; 882; 35754; 34822; 21344; 357299; 18158259; 18157738; 3616; 388261; 8194077; 8193770; 50985; 238646; 234821; 4316; 4333; 2494; 4472; 220153; 1155099; 1154537; 1495; 1441; 918; 2718; 1352; 1418; 898; 2771; 1992; 1931; 1023; 3995; 1863; 1950; 1008; 3640; 382597; 22833251; 22832611; 3639; 3672; 2751; 3670; 38432; 38411; 25713; 384484; 3873548; 3873044; 3889; 370105; 8114923; 8114534; 1072; 35010; 34754; 23554; 3393; 3627; 3668; 2429; 3469; 270; 3554; 248886; 248726; 296380; 297466; 263689; 2351410; 556330; 557631; 523825; 4455582; 311869; 312612; 278806; 2497990; 3399; 251045; 2640; 250885; 1599896; 3486; 1599736; 1487; 1445; 1052; 3002; 1585; 1561; 1123; 3104; 1155; 1066; 762; 2298; 297946; 297781; 95969; 28254; 22988; 383299; 180270; 83679; 74448; 1428775; 108638; 51090; 41859; 854460; 47304; 47139; 4085007; 4084842; 3576; 3564; 2422; 3544; 827; 3492; 3453; 2500; 3540; 3258; 3352; 2365; 3366; 3607; 3529; 2101; 3388; 33713; 33539; 24007; 3612; 2012; 2039; 1672; 4080; 34587; 34257; 25451; 3444; 4323; 4351; 2969; 4326; 16714; 5894; 568090; 3282587; 3282192; 331597; 8416063; 8415516; 35958; 35929; 25142; 361099; 5553495; 5553100; 3591; 37535; 37405; 22769; 3684; 3882; 16871; 237; 17; 371437; 7749023; 7748461; 854; 53705; 326313; 326108; 104961; 1459295; 1031591; 811467; 152760; 2272037; 1812877; 1213991; 107245; 1596464; 1137304; 853049; 53824; 52094414; 32376; 52094209; 16149113; 531870; 8269940; 8269392; 5367; 16148908; 532170; 6836027; 6835395; 1642; 1546; 1078; 3197; 7072; 23488; 217186; 276689; 16238205; 16237587; 316728; 7351176; 7350626; 38230; 38539; 27530; 383621; 2146344; 2145984; 3785; 251288; 251114; 100735; 1495531; 1240799; 805946; 176508; 2681542; 2418150; 1424914; 104092; 1594401; 1331009; 845196; 38786993; 38786819; 11677777; 375656; 6043209; 6042858; 11677603; 4210; 4163; 3242; 361470; 1769674; 1769410; 4052; 2547; 2389; 1400; 2450; 3698; 3613; 4408; 148664; 148478; 95199; 95184; 73498; 732273; 164699; 165906; 144154; 1315385; 104383; 104618; 82866; 832759; 146661; 146475; 5135306; 419083; 2705126; 2704611; 5135120; 1272; 1311; 795; 2503; 545262; 2128859; 2128419; 6777; 6450; 35228; 76210; 75970; 2610; 2601; 1701; 2561; 352037; 6509819; 6509516; 2770; 2777; 2063; 2740; 5304; 5703; 3965; 5416; 327383; 2944116; 2943671
Version 1.0
Discipline Humanities; Linguistics