Current cases of AI misalignment and their implications for future risks

IF 1.3 1区 哲学 Q1 HISTORY & PHILOSOPHY OF SCIENCE Synthese Pub Date : 2023-10-26 DOI:10.1007/s11229-023-04367-0
Leonard Dung
{"title":"Current cases of AI misalignment and their implications for future risks","authors":"Leonard Dung","doi":"10.1007/s11229-023-04367-0","DOIUrl":null,"url":null,"abstract":"Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem . Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.","PeriodicalId":49452,"journal":{"name":"Synthese","volume":"17 1","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthese","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11229-023-04367-0","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HISTORY & PHILOSOPHY OF SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem . Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
当前人工智能失调的案例及其对未来风险的影响
如何构建AI系统,使其能够追求设计师希望它们追求的目标?这就是对齐问题。许多作者都担心,随着时间的推移,随着研究的进步和系统变得越来越强大,不协调可能会导致灾难性的后果,甚至可能导致人类的灭绝或永久丧失权力。在本文中,我基于当前的不一致实例分析了这种风险的严重性。更具体地说,我认为当代大型语言模型和游戏代理有时是不一致的。这些案例表明,偏差往往具有多种特征:偏差可能难以检测、预测和补救,它不依赖于特定的架构或训练范例,它往往会降低系统的有用性,并且它是通过机器学习创建人工智能的默认结果。随后,基于这些特征,我展示了相对于更有能力的系统,AI对齐的风险会放大。更强大的系统在不协调时不仅会造成更大的伤害,而且调整它们应该比调整当前的AI更困难。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Synthese
Synthese 管理科学-科学史与科学哲学
CiteScore
3.30
自引率
13.30%
发文量
471
审稿时长
1 months
期刊介绍: Synthese is a philosophy journal focusing on contemporary issues in epistemology, philosophy of science, and related fields. More specifically, we divide our areas of interest into four groups: (1) epistemology, methodology, and philosophy of science, all broadly understood. (2) The foundations of logic and mathematics, where ‘logic’, ‘mathematics’, and ‘foundations’ are all broadly understood. (3) Formal methods in philosophy, including methods connecting philosophy to other academic fields. (4) Issues in ethics and the history and sociology of logic, mathematics, and science that contribute to the contemporary studies Synthese focuses on, as described in (1)-(3) above.
期刊最新文献
The hybrid account of activities The once and always possible Psychophysical neutrality and its descendants: a brief primer for dual-aspect monism Definite totalities and determinate truth in conceptual structuralism A naturalist approach to social ontology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1