Logotipo del repositorio
  • English
  • Español
  • Euskara
  • Iniciar sesión
    ¿Nuevo usuario? Regístrese aquí¿Ha olvidado su contraseña?
Logotipo del repositorio
  • DeustoTeka
  • Comunidades
  • Todo DSpace
  • Políticas
  • English
  • Español
  • Euskara
  • Iniciar sesión
    ¿Nuevo usuario? Regístrese aquí¿Ha olvidado su contraseña?
  1. Inicio
  2. Buscar por autor

Examinando por Autor "Azime,I.A."

Mostrando 1 - 1 de 1
Resultados por página
Opciones de ordenación
  • Cargando...
    Miniatura
    Ítem
    IrokoBench: a new benchmark for African languages in the age of Large Language Models
    (Association for Computational Linguistics (ACL), 2025) Adelani,D.I.; Ojo,J.; Azime,I.A.; Zhuang,J.Y.; Alabi,J.O.; He,X.; Ochieng,M.; Hooker,S.; Bukula,A.; Lee,E.-S.A.; Chukwuneke,C.; Buzaaba,H.; Sibanda,B.; Kalipe,G.; Mukiibi,J.; Kabongo,S.; Yuehgoh,F.; Setaka,M.; Ndolela,L.; Odu,N.; Mabuya,R.; Muhammad,S.H.; Osei, Salomey; Samb,S.; Guge,T.K.; Sherman,T.V.; Stenetorp,P.
    Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g., African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench-a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference (AfriXNLI), mathematical reasoning (AfriMGSM), and multi-choice knowledge-based question answering (AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings (where test sets are translated into English) across 10 open and six proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages (such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Gemma 2 27B only at 63% of the best-performing proprietary model GPT-4o performance. In addition, machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, such as Gemma 2 27B and LLaMa 3.1 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.
  • Icono ubicación Avda. Universidades 24
    48007 Bilbao
  • Icono ubicación+34 944 139 000
  • ContactoContacto
Rights

Excepto si se señala otra cosa, la licencia del ítem se describe como:
Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License

Software DSpace copyright © 2002-2026 LYRASIS

  • Configuración de cookies
  • Enviar sugerencias