This is a large-scale multilingual benchmark for evaluating metalinguistic knowledge (i.e. explicit knowledge about the structure of languages) in large language models using grammatical features from the World Atlas of Language Structures (WALS). The benchmark covers 192 linguistic features across 12 linguistic domains and 2,660 languages and is available in two formats (jsonl files):
- Format 1 (192-question version): One question per feature, under which all languages with a corresponding ground truth value for that feature are listed.
- Format 2 (76,475-question version): One question per feature-language pair with a corresponding ground truth value, fully expanded across all languages.
The original WALS data is licensed under CC BY 4.0. The data has been adapted for use in this benchmark. Source: Dryer, Matthew S. & Haspelmath, Martin (eds.). World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. https://wals.info