Large Language Models for structured task decomposition in Reinforcement Learning problems with sparse rewards

Ruiz Gonzalez, Unai; Andrés Fernández, Alain; Ser Lorente, Javier del

Large Language Models for structured task decomposition in Reinforcement Learning problems with sparse rewards

dc.contributor.author	Ruiz Gonzalez, Unai
dc.contributor.author	Andrés Fernández, Alain
dc.contributor.author	Ser Lorente, Javier del
dc.date.accessioned	2026-04-17T11:31:54Z
dc.date.available	2026-04-17T11:31:54Z
dc.date.issued	2025-10-22
dc.date.updated	2026-04-17T11:31:54Z
dc.description.abstract	Reinforcement learning (RL) agents face significant challenges in sparse-reward environments, as insufficient exploration of the state space can result in inefficient training or incomplete policy learning. To address this challenge, this work proposes a teacher–student framework for RL that leverages the inherent knowledge of large language models (LLMs) to decompose complex tasks into manageable subgoals. The capabilities of LLMs to comprehend problem structure and objectives, based on textual descriptions, can be harnessed to generate subgoals, similar to the guidance a human supervisor would provide. For this purpose, we introduce the following three subgoal types: positional, representation-based, and language-based. Moreover, we propose an LLM surrogate model to reduce computational overhead and demonstrate that the supervisor can be decoupled once the policy has been learned, further lowering computational costs. Under this framework, we evaluate the performance of three open-source LLMs (namely, Llama, DeepSeek, and Qwen). Furthermore, we assess our teacher–student framework on the MiniGrid benchmark—a collection of procedurally generated environments that demand generalization to previously unseen tasks. Experimental results indicate that our teacher–student framework facilitates more efficient learning and encourages enhanced exploration in complex tasks, resulting in faster training convergence and outperforming recent teacher–student methods designed for sparse-reward environments.	en
dc.description.sponsorship	Alain Andres and Javier Del Ser acknowledge funding support from the Basque Government through its ELKARTEK funding program (KK-2024/00064, IKUN). The work of Javier Del Ser is also supported by the consolidated research group MATHMODE (IT1866-26) funded by the same institution. Alain Andres also acknowledges funding support from IKASLAGUN project (ref. 2024-CIE2-000006-01), funded by Diputación Foral de Gipuzcoa under the program Red Guipuzcoana de Ciencia, Tecnología e Innovación: GIPUZCOA NEXT	en
dc.identifier.citation	Ruiz-Gonzalez, U., Andres, A., & Del Ser, J. (2025). Large Language Models for structured task decomposition in Reinforcement Learning problems with sparse rewards. Machine Learning and Knowledge Extraction, 7(4). https://doi.org/10.3390/MAKE7040126
dc.identifier.doi	10.3390/MAKE7040126
dc.identifier.eissn	2504-4990
dc.identifier.uri	https://hdl.handle.net/20.500.14454/5677
dc.language.iso	eng
dc.publisher	Multidisciplinary Digital Publishing Institute (MDPI)
dc.rights	© 2025 by the authors. Licensee MDPI, Basel, Switzerland
dc.subject.other	Goal-oriented reinforcement learning
dc.subject.other	Sparse-reward environments
dc.subject.other	Teacher–student
dc.title	Large Language Models for structured task decomposition in Reinforcement Learning problems with sparse rewards	en
dc.type	journal article
dcterms.accessRights	open access
oaire.citation.issue	4
oaire.citation.title	Machine Learning and Knowledge Extraction
oaire.citation.volume	7
oaire.licenseCondition	https://creativecommons.org/licenses/by/4.0/
oaire.version	VoR

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: ruiz_large_2025.pdf
Tamaño:: 18.53 MB
Formato:: Adobe Portable Document Format

Descargar

Colecciones

Artículos