Learning Efficient Recursive Numeral Systems via Reinforcement Learning

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI:arxiv-2409.07170

Jonathan D. Thomas, Andrea Silvi, Devdatt Dubhashi, Emil Carlsson, Moa Johansson

引用次数: 0

Abstract

The emergence of mathematical concepts, such as number systems, is an understudied area in AI for mathematics and reasoning. It has previously been shown Carlsson et al. (2021) that by using reinforcement learning (RL), agents can derive simple approximate and exact-restricted numeral systems. However, it is a major challenge to show how more complex recursive numeral systems, similar to the one utilised in English, could arise via a simple learning mechanism such as RL. Here, we introduce an approach towards deriving a mechanistic explanation of the emergence of recursive number systems where we consider an RL agent which directly optimizes a lexicon under a given meta-grammar. Utilising a slightly modified version of the seminal meta-grammar of Hurford (1975), we demonstrate that our RL agent can effectively modify the lexicon towards Pareto-optimal configurations which are comparable to those observed within human numeral systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过强化学习学习高效递归数字系统

数学概念（如数字系统）的出现是人工智能数学与推理中一个研究不足的领域。Carlsson 等人（2021 年）曾指出，通过强化学习（RL），代理可以推导出简单的近似和精确受限的数字系统。然而，如何通过 RL 这种简单的学习机制来展示类似英语中使用的更复杂的递归数字系统是一个重大挑战。在这里，我们引入了一种方法，旨在从机制上解释递归数字系统的出现，即我们考虑在给定元语法下直接优化词典的 RL 代理。利用赫尔福德（Hurford，1975 年）开创性元语法的略微修改版本，我们证明了我们的 RL 代理可以有效地修改词库，使其达到帕累托最优配置，这与人类数字系统中观察到的配置相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

IF 1.7 4区医学Eastern Mediterranean Health JournalPub Date : 2012-08-01 DOI: 10.26719/2012.18.8.803

G Heydari, F Talischi, M R Masjedi, H Alguomani, L Joossens, M Ghafari

A decade of tobacco control efforts: Implications for tobacco smoking prevalence in Eastern Mediterranean countries.

IF 3.7 3区综合性期刊PLoS ONEPub Date : 2024-02-23 DOI: 10.1371/journal.pone.0297045

Negar Taheri, Pedram Fattahi, Elnaz Saeedi, Maryam Sayyari, Sepideh Abdi, Mina Khaki, Navid Rahimi, Rouhollah K Motamedi, Fereshte Lotfi, Mojtaba Vand Rajabpour, Saeed Nemati

来源期刊

arXiv - CS - Computation and Language

自引率

0.00%

发文量