Integration of Language and Experience via the Instructed Bandit Task
Abstract
Humans learn by interacting directly with their environments and by communicating via language. In this project, we explore this interaction between language and experiential learning through a novel sequential decision-making task, the ``instructed bandit task" (IBT). In the IBT, agents make choices and receive rewards sampled from an unknown Gaussian distributions, after being given linguistic hints. The IBT assesses how linguistic input and experienced reward values combine to determine choice behavior. We additionally propose a novel Bayesian reinforcement learning model that combines Bayesian updating from experience with propositional constraints that capture the meaning of the linguistic hints. As a point of comparison, we evaluate both human participants and Centaur, a LLaMA-based model fine-tuned to mimic human behavior, on the IBT. Our results show that all agents converge with the Bayesian model, and the granular difference in choice sequences reveal the varied role instruction plays in decision-making tasks.
Keywords
Bibtex entry:
@inproceedings{su2025instructedbandits,
abstract = {Humans learn by interacting directly with their environments and by communicating via language. In this project, we explore this interaction between language and experiential learning through a novel sequential decision-making task, the ``instructed bandit task" (IBT). In the IBT, agents make choices and receive rewards sampled from an unknown Gaussian distributions, after being given linguistic hints. The IBT assesses how linguistic input and experienced reward values combine to determine choice behavior. We additionally propose a novel Bayesian reinforcement learning model that combines Bayesian updating from experience with propositional constraints that capture the meaning of the linguistic hints. As a point of comparison, we evaluate both human participants and Centaur, a LLaMA-based model fine-tuned to mimic human behavior, on the IBT. Our results show that all agents converge with the Bayesian model, and the granular difference in choice sequences reveal the varied role instruction plays in decision-making tasks.},
address = {Austin, TX},
author = {Su, E. and Ho, M. and Gureckis, T.M.},
booktitle = {Proceedings of the 47th Annual Conference of the Cognitive Science Society},
keywords = {reinforcement learning, language, Bayesian cognition, large language models},
organization = {Cognitive Science Society},
publisher = {Cognitive Science Society},
title = {Integration of Language and Experience via the Instructed Bandit Task},
year = {2025}}QR Code:
Download SVG