: Perfective/imperfective aspect (65A), past tense (66A), future tense (67A), and the perfect (68A).
For more information on the specific data points, you can explore the Official WALS Features List or the WALS-Bench dataset on Hugging Face. WALS roberta sets 37-70.zip
: Leveraging the broad cross-linguistic data in WALS to improve how models handle the hundreds of languages that lack large amounts of training text. : Ordinal (53A) and distributive (54A) numerals, and
: Ordinal (53A) and distributive (54A) numerals, and numeral classifiers (55A). Nominal Syntax (Chapters 58–64) : : : Testing if models like RoBERTa or
: Using the WALS database features as labels to see if a model's internal representations (embeddings) cluster according to known linguistic traits, such as whether a language uses definite articles.
The features in this range are essential for understanding how different languages handle noun and verb structures. :
: Testing if models like RoBERTa or XLM-RoBERTa have "learned" the typological rules of specific languages during pre-training.