Current cases of AI misalignment and their implications for future risks

IF 1.3 1区哲学 Q1 HISTORY & PHILOSOPHY OF SCIENCE Synthese Pub Date : 2023-10-26 DOI:10.1007/s11229-023-04367-0

Leonard Dung

引用次数: 0

Abstract

Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem . Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

当前人工智能失调的案例及其对未来风险的影响

如何构建AI系统，使其能够追求设计师希望它们追求的目标?这就是对齐问题。许多作者都担心，随着时间的推移，随着研究的进步和系统变得越来越强大，不协调可能会导致灾难性的后果，甚至可能导致人类的灭绝或永久丧失权力。在本文中，我基于当前的不一致实例分析了这种风险的严重性。更具体地说，我认为当代大型语言模型和游戏代理有时是不一致的。这些案例表明，偏差往往具有多种特征:偏差可能难以检测、预测和补救，它不依赖于特定的架构或训练范例，它往往会降低系统的有用性，并且它是通过机器学习创建人工智能的默认结果。随后，基于这些特征，我展示了相对于更有能力的系统，AI对齐的风险会放大。更强大的系统在不协调时不仅会造成更大的伤害，而且调整它们应该比调整当前的AI更困难。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Synthese 管理科学-科学史与科学哲学

CiteScore

3.30

自引率

13.30%

发文量

471

审稿时长

1 months

期刊介绍： Synthese is a philosophy journal focusing on contemporary issues in epistemology, philosophy of science, and related fields. More specifically, we divide our areas of interest into four groups: (1) epistemology, methodology, and philosophy of science, all broadly understood. (2) The foundations of logic and mathematics, where ‘logic’, ‘mathematics’, and ‘foundations’ are all broadly understood. (3) Formal methods in philosophy, including methods connecting philosophy to other academic fields. (4) Issues in ethics and the history and sociology of logic, mathematics, and science that contribute to the contemporary studies Synthese focuses on, as described in (1)-(3) above.

期刊最新文献

The individuation of mathematical objects. Concept-formation and deep disagreements in theoretical and practical reasoning. The hybrid account of activities The once and always possible Psychophysical neutrality and its descendants: a brief primer for dual-aspect monism