This is the official repository for the project "Generating Textual Resources to Foster the Development of Language Technologies for Mayan Languages".
Recent AI advances require large datasets, which are scarce for indigenous Mayan languages spoken in Guatemala, Belize, and southern Mexico. This project aims to:
- Digitize linguistic resources for amnumber of Mayan languages and release them as open NLP data artifacts.
- Create a parallel corpus, FLORES+ Mayas, to benchmark machine translation (MT) for Mayan languages by translating the Spanish side of FLORES+ into K'iche', Kaqchikel, Ixil, Mam, Q'anjob'al, and Q'eqchi'.
The project runs throughout 2025 and is a collaboration between the Transducens Research Group at Universitat d'Alacant and the Proyecto Lingüístico Francisco Marroquín Foundation (FPLFM) from Guatemala.
This project is gratefully funded by the Google Academic Research Awards (GARA).
All developments will be available under open licenses at this repo.