首页 > 最新文献

Ai Magazine最新文献

英文 中文
Open-source AI at scale: Establishing an enterprise AI strategy through modular frameworks 大规模开源人工智能:通过模块化框架建立企业人工智能战略
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-22 DOI: 10.1002/aaai.70032
Serdar Kadıoğlu

We present a comprehensive enterprise AI strategy developed within the AI Center of Excellence at Fidelity Investments, emphasizing the strategic integration of open-source AI frameworks into scalable, modular, and reproducible enterprise-grade solutions. Our approach is structured around five key pillars: learning from offline data, learning from online feedback, intelligent decision-making, automated assistants, and responsible AI practices. Through a suite of 12 open-source libraries, we demonstrate how modular and interoperable tools can collectively enhance scalability, fairness, and explainability in real-world AI deployments. We further illustrate the impact of this strategy through three enterprise case studies. Finally, we distill a set of best deployment practices to guide organizations in implementing modular, open-source AI strategies at scale.

我们提出了由富达投资人工智能卓越中心开发的全面企业人工智能战略,强调将开源人工智能框架战略性地集成到可扩展、模块化和可复制的企业级解决方案中。我们的方法围绕着五个关键支柱:从离线数据中学习、从在线反馈中学习、智能决策、自动助手和负责任的人工智能实践。通过一套12个开源库,我们展示了模块化和可互操作的工具如何在现实世界的人工智能部署中共同增强可扩展性、公平性和可解释性。我们通过三个企业案例研究进一步说明该策略的影响。最后,我们提炼出一组最佳部署实践,以指导组织大规模实施模块化、开源的人工智能战略。
{"title":"Open-source AI at scale: Establishing an enterprise AI strategy through modular frameworks","authors":"Serdar Kadıoğlu","doi":"10.1002/aaai.70032","DOIUrl":"https://doi.org/10.1002/aaai.70032","url":null,"abstract":"<p>We present a comprehensive enterprise AI strategy developed within the AI Center of Excellence at Fidelity Investments, emphasizing the strategic integration of open-source AI frameworks into scalable, modular, and reproducible enterprise-grade solutions. Our approach is structured around five key pillars: learning from offline data, learning from online feedback, intelligent decision-making, automated assistants, and responsible AI practices. Through a suite of 12 open-source libraries, we demonstrate how modular and interoperable tools can collectively enhance scalability, fairness, and explainability in real-world AI deployments. We further illustrate the impact of this strategy through three enterprise case studies. Finally, we distill a set of best deployment practices to guide organizations in implementing modular, open-source AI strategies at scale.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70032","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145111174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal AI Teacher: Integrating Edge Computing and Reasoning Models for Enhanced Student Error Analysis 多模态人工智能教师:集成边缘计算和推理模型以增强学生错误分析
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-21 DOI: 10.1002/aaai.70030
Tianlong Xu, Yi-Fan Zhang, Zhendong Chu, Qingsong Wen

This paper extends our previously published work on the virtual AI teacher (VATE) system, presented at IAAI-25. VATE is designed to autonomously analyze and correct student errors in mathematical problem-solving using advanced large language models (LLMs). By incorporating student draft images as a primary input for reasoning, the system provides fine-grained error cause analysis and supports real-time, multi-round AI—student dialogues. In this extended version, we introduce a new snap-to-solve module for handling low-reasoning tasks using edge-deployed LLMs, enabling faster and partially offline interaction. We also include expanded benchmarking experiments, including human expert evaluations and ablation studies, to assess model performance and learning outcomes. Deployed on the Squirrel AI platform, VATE demonstrates high accuracy (78.3%) in error analysis and improves student learning efficiency, with strong user satisfaction. These results suggest that VATE is a scalable, cost-effective solution with the potential to transform educational practices.

本文扩展了我们之前在iai -25上发表的关于虚拟人工智能教师(VATE)系统的工作。VATE旨在使用先进的大型语言模型(llm)自主分析和纠正学生在数学问题解决方面的错误。通过将学生草稿图像作为推理的主要输入,该系统提供了细粒度的错误原因分析,并支持实时、多轮人工智能学生对话。在这个扩展版本中,我们引入了一个新的快照解决模块,用于处理使用边缘部署llm的低推理任务,从而实现更快的部分离线交互。我们还包括扩展的基准实验,包括人类专家评估和消融研究,以评估模型的性能和学习结果。VATE部署在Squirrel AI平台上,误差分析准确率高达78.3%,提高了学生的学习效率,用户满意度高。这些结果表明,VATE是一种可扩展的、具有成本效益的解决方案,具有改变教育实践的潜力。
{"title":"Multimodal AI Teacher: Integrating Edge Computing and Reasoning Models for Enhanced Student Error Analysis","authors":"Tianlong Xu,&nbsp;Yi-Fan Zhang,&nbsp;Zhendong Chu,&nbsp;Qingsong Wen","doi":"10.1002/aaai.70030","DOIUrl":"https://doi.org/10.1002/aaai.70030","url":null,"abstract":"<p>This paper extends our previously published work on the virtual AI teacher (VATE) system, presented at IAAI-25. VATE is designed to autonomously analyze and correct student errors in mathematical problem-solving using advanced large language models (LLMs). By incorporating student draft images as a primary input for reasoning, the system provides fine-grained error cause analysis and supports real-time, multi-round AI—student dialogues. In this extended version, we introduce a new snap-to-solve module for handling low-reasoning tasks using edge-deployed LLMs, enabling faster and partially offline interaction. We also include expanded benchmarking experiments, including human expert evaluations and ablation studies, to assess model performance and learning outcomes. Deployed on the Squirrel AI platform, VATE demonstrates high accuracy (78.3%) in error analysis and improves student learning efficiency, with strong user satisfaction. These results suggest that VATE is a scalable, cost-effective solution with the potential to transform educational practices.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145102329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated vulnerability evaluation with large language models and vulnerability ontologies 使用大型语言模型和漏洞本体的自动化漏洞评估
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-15 DOI: 10.1002/aaai.70031
Rikhiya Ghosh, Hans-Martin von Stockhausen, Martin Schmitt, George Marica Vasile, Sanjeev Kumar Karn, Oladimeji Farri

The National Vulnerability Database (NVD) publishes over a thousand new vulnerabilities monthly, with a projected 25 percent increase in 2024, highlighting the crucial need for rapid vulnerability identification to mitigate cybersecurity attacks and save costs and resources. In this work, we propose using large language models (LLMs) to learn vulnerability evaluation from historical assessments of medical device vulnerabilities in a single manufacturer's portfolio. We highlight the effectiveness and challenges of using LLMs for automatic vulnerability evaluation and introduce a method to enrich historical data with cybersecurity ontologies, enabling the system to understand new vulnerabilities without retraining the LLM. Our LLM system integrates with the in-house application—Cybersecurity Management System (CSMS)—to help Siemens Healthineers (SHS) product cybersecurity experts efficiently assess the vulnerabilities in our products. Also, we present a comprehensive set of experiments that helps showcase the properties of the LLM and dataset, the various guardrails we have implemented to safeguard the system in production, and the guidelines for efficient integration of LLMs into the cybersecurity tool.

国家漏洞数据库(NVD)每月发布1000多个新漏洞,预计到2024年将增加25%,这凸显了快速识别漏洞以减轻网络安全攻击并节省成本和资源的关键需求。在这项工作中,我们建议使用大型语言模型(llm)从单个制造商组合中的医疗设备漏洞的历史评估中学习漏洞评估。我们强调了使用LLM进行自动漏洞评估的有效性和挑战,并引入了一种使用网络安全本体丰富历史数据的方法,使系统能够在不重新训练LLM的情况下理解新的漏洞。我们的法学硕士系统集成了内部应用程序网络安全管理系统(csm),以帮助西门子医疗(SHS)产品网络安全专家有效地评估我们产品中的漏洞。此外,我们还提供了一组全面的实验,有助于展示LLM和数据集的属性,我们为保护生产中的系统而实施的各种护栏,以及将LLM有效集成到网络安全工具中的指导方针。
{"title":"Automated vulnerability evaluation with large language models and vulnerability ontologies","authors":"Rikhiya Ghosh,&nbsp;Hans-Martin von Stockhausen,&nbsp;Martin Schmitt,&nbsp;George Marica Vasile,&nbsp;Sanjeev Kumar Karn,&nbsp;Oladimeji Farri","doi":"10.1002/aaai.70031","DOIUrl":"https://doi.org/10.1002/aaai.70031","url":null,"abstract":"<p>The National Vulnerability Database (NVD) publishes over a thousand new vulnerabilities monthly, with a projected 25 percent increase in 2024, highlighting the crucial need for rapid vulnerability identification to mitigate cybersecurity attacks and save costs and resources. In this work, we propose using large language models (LLMs) to learn vulnerability evaluation from historical assessments of medical device vulnerabilities in a single manufacturer's portfolio. We highlight the effectiveness and challenges of using LLMs for automatic vulnerability evaluation and introduce a method to enrich historical data with cybersecurity ontologies, enabling the system to understand new vulnerabilities without retraining the LLM. Our LLM system integrates with the in-house application—Cybersecurity Management System (CSMS)—to help Siemens Healthineers (SHS) product cybersecurity experts efficiently assess the vulnerabilities in our products. Also, we present a comprehensive set of experiments that helps showcase the properties of the LLM and dataset, the various guardrails we have implemented to safeguard the system in production, and the guidelines for efficient integration of LLMs into the cybersecurity tool.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70031","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OnAIR: Applications of the NASA on-board artificial intelligence research platform OnAIR: NASA机载人工智能研究平台的应用
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-15 DOI: 10.1002/aaai.70020
Evana Gizzi, Connor Firth, Caleb Adams, James Berck, P. Timothy Chase Jr, Christian Cassamajor-Paul, Rachael Chertok, Lily Clough, Jonathan Davis, Melissa De La Cruz, Matthew Dosberg, Alan Gibson, Jonathan Hammer, Ibrahim Haroon, Michael A. Johnson, Brian Kempa, James Marshall, Patrick Maynard, Brett McKinney, Leyton McKinney, Michael Monaghan, Robin Onsay, Hayley Owens, Sam Pedrotty, Daniel Rogers, Mahmooda Sultana, Jivko Sinapov, Bethany Theiling, Aaron Woodard, Caroline Zouloumian, Connor Williams

Infusing artificial intelligence algorithms into production aerospace systems can be challenging due to costs, timelines, and a risk-averse industry. We introduce the Onboard Artificial Intelligence Research (OnAIR) platform, an open-source software pipeline and cognitive architecture tool that enables full life cycle AI research for on-board intelligent systems. We begin with a description and user walk-through of the OnAIR tool. Next, we describe four use cases of OnAIR for both research and deployed onboard applications, detailing their use of OnAIR and the benefits it provided to the development and function of each respective scenario. Lastly, we describe two upcoming planned deployments which will leverage OnAIR for crucial mission outcomes. We conclude with remarks on future work and goals for the forward progression of OnAIR as a tool to enable a larger AI and aerospace research community.

由于成本、时间和厌恶风险的行业,将人工智能算法注入生产航空航天系统可能具有挑战性。我们推出车载人工智能研究(OnAIR)平台,这是一个开源软件管道和认知架构工具,可实现车载智能系统的全生命周期人工智能研究。我们从OnAIR工具的描述和用户演练开始。接下来,我们将描述用于研究和部署机载应用程序的OnAIR的四个用例,详细介绍它们对OnAIR的使用以及它为每个各自场景的开发和功能提供的好处。最后,我们描述了两个即将到来的计划部署,它们将利用OnAIR实现关键的任务结果。最后,我们对OnAIR作为一种工具的未来工作和目标进行了评论,以实现更大的人工智能和航空航天研究界。
{"title":"OnAIR: Applications of the NASA on-board artificial intelligence research platform","authors":"Evana Gizzi,&nbsp;Connor Firth,&nbsp;Caleb Adams,&nbsp;James Berck,&nbsp;P. Timothy Chase Jr,&nbsp;Christian Cassamajor-Paul,&nbsp;Rachael Chertok,&nbsp;Lily Clough,&nbsp;Jonathan Davis,&nbsp;Melissa De La Cruz,&nbsp;Matthew Dosberg,&nbsp;Alan Gibson,&nbsp;Jonathan Hammer,&nbsp;Ibrahim Haroon,&nbsp;Michael A. Johnson,&nbsp;Brian Kempa,&nbsp;James Marshall,&nbsp;Patrick Maynard,&nbsp;Brett McKinney,&nbsp;Leyton McKinney,&nbsp;Michael Monaghan,&nbsp;Robin Onsay,&nbsp;Hayley Owens,&nbsp;Sam Pedrotty,&nbsp;Daniel Rogers,&nbsp;Mahmooda Sultana,&nbsp;Jivko Sinapov,&nbsp;Bethany Theiling,&nbsp;Aaron Woodard,&nbsp;Caroline Zouloumian,&nbsp;Connor Williams","doi":"10.1002/aaai.70020","DOIUrl":"https://doi.org/10.1002/aaai.70020","url":null,"abstract":"<p>Infusing artificial intelligence algorithms into production aerospace systems can be challenging due to costs, timelines, and a risk-averse industry. We introduce the Onboard Artificial Intelligence Research (OnAIR) platform, an open-source software pipeline and cognitive architecture tool that enables full life cycle AI research for on-board intelligent systems. We begin with a description and user walk-through of the OnAIR tool. Next, we describe four use cases of OnAIR for both research and deployed onboard applications, detailing their use of OnAIR and the benefits it provided to the development and function of each respective scenario. Lastly, we describe two upcoming planned deployments which will leverage OnAIR for crucial mission outcomes. We conclude with remarks on future work and goals for the forward progression of OnAIR as a tool to enable a larger AI and aerospace research community.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70020","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing generative recommender systems for government subsidy programs with a new RQ-VAE model: Wello and the Korean government case 用新的RQ-VAE模型为政府补贴项目开发生成式推荐系统:Wello和韩国政府案例
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-08 DOI: 10.1002/aaai.70029
Ji Won Kim, Jae Hong Park, Yuri Anna Kim, Sang Jun Lee

According to an industry survey, many people miss opportunities to apply for government subsidy programs because they do not know how to apply. People also need to search manually and check whether these programs are suitable for them. To address this issue, our study developed a new generative recommender system with both users' information and government subsidy documents. Within our recommender system framework, we modify the existing Residual Quantization Variational Auto-Encoder (RQ-VAE) model to capture deep and abstract information from subsidy documents. Using semantic IDs generated for approximately 185,610 user click-stream histories and 240,000 documents, we train our recommender system to predict the semantic IDs of the next subsidy policy documents in which a user might be interested. In 2024, we successfully deployed our generative recommender system in Wello, a Korean Gov-Tech startup. In collaboration with the Korean government, our generative recommender system helped enhance program effectiveness by saving $7.8 million in unused funds and achieved $27.4 million in advertising efficiency gains. Also, Wello observed a 68% improvement in Click-Through-Ratio (CTR), increasing from 41.4% in the third quarter of 2024 to 69.6% in the fourth quarter of 2024. We thus anticipate that our generative recommender system will have a significant impact on both individuals and the government.

根据一项行业调查,许多人错过了申请政府补贴计划的机会,因为他们不知道如何申请。人们还需要手动搜索,检查这些节目是否适合自己。为了解决这个问题,我们的研究开发了一个新的生成式推荐系统,其中包含了用户信息和政府补贴文件。在我们的推荐系统框架中,我们修改了现有的残差量化变分自编码器(RQ-VAE)模型,以从补贴文件中捕获深度和抽象的信息。使用为大约185,610个用户点击流历史和240,000个文档生成的语义id,我们训练我们的推荐系统来预测用户可能感兴趣的下一个补贴政策文档的语义id。在2024年,我们成功地在韩国政府科技创业公司Wello部署了我们的生成式推荐系统。通过与韩国政府合作,我们的生成式推荐系统帮助提高了项目的有效性,节省了780万美元的未使用资金,并实现了2740万美元的广告效率收益。此外,Wello观察到点击率(CTR)提高了68%,从2024年第三季度的41.4%上升到2024年第四季度的69.6%。因此,我们预计我们的生成式推荐系统将对个人和政府产生重大影响。
{"title":"Developing generative recommender systems for government subsidy programs with a new RQ-VAE model: Wello and the Korean government case","authors":"Ji Won Kim,&nbsp;Jae Hong Park,&nbsp;Yuri Anna Kim,&nbsp;Sang Jun Lee","doi":"10.1002/aaai.70029","DOIUrl":"https://doi.org/10.1002/aaai.70029","url":null,"abstract":"<p>According to an industry survey, many people miss opportunities to apply for government subsidy programs because they do not know how to apply. People also need to search manually and check whether these programs are suitable for them. To address this issue, our study developed a new generative recommender system with both users' information and government subsidy documents. Within our recommender system framework, we modify the existing Residual Quantization Variational Auto-Encoder (RQ-VAE) model to capture deep and abstract information from subsidy documents. Using semantic IDs generated for approximately 185,610 user click-stream histories and 240,000 documents, we train our recommender system to predict the semantic IDs of the next subsidy policy documents in which a user might be interested. In 2024, we successfully deployed our generative recommender system in Wello, a Korean Gov-Tech startup. In collaboration with the Korean government, our generative recommender system helped enhance program effectiveness by saving $7.8 million in unused funds and achieved $27.4 million in advertising efficiency gains. Also, Wello observed a 68% improvement in Click-Through-Ratio (CTR), increasing from 41.4% in the third quarter of 2024 to 69.6% in the fourth quarter of 2024. We thus anticipate that our generative recommender system will have a significant impact on both individuals and the government. </p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation and incident prevention in an enterprise AI assistant 企业AI助手的评估与事件预防
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-08 DOI: 10.1002/aaai.70028
Akash V. Maharaj, David Arbour, Daniel Lee, Uttaran Bhattacharya, Anup Rao, Austin Zane, Avi Feller, Kun Qian, Sajjadur Rahman, Yunyao Li

Enterprise AI Assistants are increasingly deployed in domains where accuracy is paramount, making each erroneous output a potentially significant incident. This paper presents a comprehensive framework for monitoring, benchmarking, and continuously improving such complex, multi-component systems under active development by multiple teams. Our approach encompasses three key elements: (1) a hierarchical “severity” framework for incident detection that identifies and categorizes errors while attributing component-specific error rates, facilitating targeted improvements; (2) a scalable and principled methodology for benchmark construction, evaluation, and deployment, designed to accommodate multiple development teams, mitigate overfitting risks, and assess the downstream impact of system modifications; and (3) a continual improvement strategy leveraging multidimensional evaluation, enabling the identification and implementation of diverse enhancement opportunities. By adopting this holistic framework, organizations can systematically enhance the reliability and performance of their AI Assistants, ensuring their efficacy in critical enterprise environments. We conclude by discussing how this multifaceted approach opens avenues for various classes of enhancements, including human-AI collaborative evaluation, paving the way for more robust and trustworthy AI systems.

企业人工智能助手越来越多地部署在准确性至关重要的领域,使得每个错误的输出都可能成为重大事件。本文提出了一个全面的框架,用于监控、基准测试和持续改进由多个团队积极开发的这种复杂的多组件系统。我们的方法包含三个关键要素:(1)用于事件检测的分层“严重性”框架,该框架可以识别和分类错误,同时归因于特定组件的错误率,促进有针对性的改进;(2)用于基准构建、评估和部署的可扩展和原则性方法,旨在适应多个开发团队,减轻过度拟合风险,并评估系统修改的下游影响;(3)利用多维评价的持续改进策略,使识别和实施各种改进机会成为可能。通过采用这一整体框架,组织可以系统地提高其人工智能助手的可靠性和性能,确保其在关键企业环境中的有效性。最后,我们讨论了这种多方面的方法如何为各种类型的增强开辟道路,包括人类-人工智能协作评估,为更强大和值得信赖的人工智能系统铺平道路。
{"title":"Evaluation and incident prevention in an enterprise AI assistant","authors":"Akash V. Maharaj,&nbsp;David Arbour,&nbsp;Daniel Lee,&nbsp;Uttaran Bhattacharya,&nbsp;Anup Rao,&nbsp;Austin Zane,&nbsp;Avi Feller,&nbsp;Kun Qian,&nbsp;Sajjadur Rahman,&nbsp;Yunyao Li","doi":"10.1002/aaai.70028","DOIUrl":"https://doi.org/10.1002/aaai.70028","url":null,"abstract":"<p>Enterprise AI Assistants are increasingly deployed in domains where accuracy is paramount, making each erroneous output a potentially significant incident. This paper presents a comprehensive framework for monitoring, benchmarking, and continuously improving such complex, multi-component systems under active development by multiple teams. Our approach encompasses three key elements: (1) a hierarchical “severity” framework for incident detection that identifies and categorizes errors while attributing component-specific error rates, facilitating targeted improvements; (2) a scalable and principled methodology for benchmark construction, evaluation, and deployment, designed to accommodate multiple development teams, mitigate overfitting risks, and assess the downstream impact of system modifications; and (3) a continual improvement strategy leveraging multidimensional evaluation, enabling the identification and implementation of diverse enhancement opportunities. By adopting this holistic framework, organizations can systematically enhance the reliability and performance of their AI Assistants, ensuring their efficacy in critical enterprise environments. We conclude by discussing how this multifaceted approach opens avenues for various classes of enhancements, including human-AI collaborative evaluation, paving the way for more robust and trustworthy AI systems. </p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the special issue on innovative applications of artificial intelligence (IAAI 2025) 人工智能创新应用特刊(IAAI 2025)简介
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-08 DOI: 10.1002/aaai.70027
Serdar Kadıoğlu, Sean McGregor, Jan Seyler

This year's innovative applications of AI special issue features AI systems deployed in real-world settings, from enterprise platforms to public services, demonstrating both technical rigor and measurable benefits for organizations and society. The eight selected articles span enterprise reliability, cybersecurity, aerospace, education, healthcare logistics, government services, and scalable AI strategy. Collectively, these works illustrate how AI is progressing from research prototypes to systems that organizations now rely on for critical decisions, offering lessons learned for both researchers and practitioners.

今年的人工智能创新应用特刊展示了在现实环境中部署的人工智能系统,从企业平台到公共服务,展示了技术的严谨性和对组织和社会的可衡量效益。入选的八篇文章涵盖了企业可靠性、网络安全、航空航天、教育、医疗保健物流、政府服务和可扩展的人工智能战略。总的来说,这些作品说明了人工智能如何从研究原型发展到组织现在依赖的关键决策系统,为研究人员和从业者提供了经验教训。
{"title":"Introduction to the special issue on innovative applications of artificial intelligence (IAAI 2025)","authors":"Serdar Kadıoğlu,&nbsp;Sean McGregor,&nbsp;Jan Seyler","doi":"10.1002/aaai.70027","DOIUrl":"https://doi.org/10.1002/aaai.70027","url":null,"abstract":"<p>This year's <i>innovative applications of AI</i> special issue features AI systems deployed in real-world settings, from enterprise platforms to public services, demonstrating both technical rigor and measurable benefits for organizations and society. The eight selected articles span enterprise reliability, cybersecurity, aerospace, education, healthcare logistics, government services, and scalable AI strategy. Collectively, these works illustrate how AI is progressing from research prototypes to systems that organizations now rely on for critical decisions, offering lessons learned for both researchers and practitioners.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent advances in finetuning multimodal large language models 多模态大语言模型调优的最新进展
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-03 DOI: 10.1002/aaai.70025
Zhen Wang, Lin Li, Long Chen

Finetuning serves as the critical adaptation mechanism for multimodal large language models, bridging their pretrained knowledge with specialized downstream task requirements. This paper reviews recent finetuning advances across three key dimensions: (1) efficiency-oriented methods that reduce resource costs; (2) capability-specific techniques enhancing specialized multimodal skills; and (3) task-unifying approaches that bridge understanding and generation. We demonstrate how these directions transform multimodal large language models from versatile foundations into adaptive, human-aligned systems, providing researchers with a structured roadmap for developing next-generation multimodal AI.

微调是多模态大型语言模型的关键适应机制,将其预训练的知识与专门的下游任务需求连接起来。本文回顾了最近在三个关键维度上的微调进展:(1)降低资源成本的效率导向方法;(2)针对能力的技术,提高专业的多模式技能;(3)架起理解和生成的桥梁的任务统一方法。我们展示了这些方向如何将多模态大型语言模型从通用基础转变为自适应的、与人类一致的系统,为研究人员提供了开发下一代多模态人工智能的结构化路线图。
{"title":"Recent advances in finetuning multimodal large language models","authors":"Zhen Wang,&nbsp;Lin Li,&nbsp;Long Chen","doi":"10.1002/aaai.70025","DOIUrl":"https://doi.org/10.1002/aaai.70025","url":null,"abstract":"<p>Finetuning serves as the critical adaptation mechanism for multimodal large language models, bridging their pretrained knowledge with specialized downstream task requirements. This paper reviews recent finetuning advances across three key dimensions: (1) efficiency-oriented methods that reduce resource costs; (2) capability-specific techniques enhancing specialized multimodal skills; and (3) task-unifying approaches that bridge understanding and generation. We demonstrate how these directions transform multimodal large language models from versatile foundations into adaptive, human-aligned systems, providing researchers with a structured roadmap for developing next-generation multimodal AI.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward robust, interactive, and human-aligned AI systems 朝着健壮的、交互式的、与人类一致的人工智能系统发展
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-29 DOI: 10.1002/aaai.70024
Daniel S. Brown

Ensuring that AI systems do what we, as humans, actually want them to do is one of the biggest open research challenges in AI alignment and safety. My research seeks to directly address this challenge by enabling AI systems to interact with humans to learn aligned and robust behaviors. The way robots and other AI systems behave is often the result of optimizing a reward function. However, manually designing good reward functions is highly challenging and error-prone, even for domain experts. Although reward functions are often difficult to manually specify, human feedback in the form of demonstrations or preferences is often much easier to obtain but can be difficult to interpret due to ambiguity and noise. Thus, it is critical that AI systems take into account epistemic uncertainty over the human's true intent. As part of the AAAI New Faculty Highlight Program, I will give an overview of my research progress along the following fundamental research areas: (1) efficiently quantifying uncertainty over human intent, (2) directly optimizing behavior to be robust to uncertainty over human intent, and (3) actively querying for additional human input to reduce uncertainty over human intent.

确保人工智能系统做我们人类真正希望它们做的事情,是人工智能校准和安全领域最大的开放研究挑战之一。我的研究旨在通过使人工智能系统与人类互动来学习一致和稳健的行为,从而直接解决这一挑战。机器人和其他人工智能系统的行为方式通常是优化奖励函数的结果。然而,手动设计良好的奖励功能是非常具有挑战性和容易出错的,即使对领域专家也是如此。虽然奖励功能通常很难手动指定,但以演示或偏好形式出现的人类反馈通常更容易获得,但由于模糊性和噪音,很难解释。因此,人工智能系统考虑到人类真实意图的认知不确定性是至关重要的。作为AAAI新教师亮点计划的一部分,我将概述我在以下基础研究领域的研究进展:(1)有效量化人类意图的不确定性,(2)直接优化行为以对人类意图的不确定性具有鲁棒性,(3)积极查询额外的人类输入以减少人类意图的不确定性。
{"title":"Toward robust, interactive, and human-aligned AI systems","authors":"Daniel S. Brown","doi":"10.1002/aaai.70024","DOIUrl":"https://doi.org/10.1002/aaai.70024","url":null,"abstract":"<p>Ensuring that AI systems do what we, as humans, actually want them to do is one of the biggest open research challenges in AI alignment and safety. My research seeks to directly address this challenge by enabling AI systems to interact with humans to learn aligned and robust behaviors. The way robots and other AI systems behave is often the result of optimizing a reward function. However, manually designing good reward functions is highly challenging and error-prone, even for domain experts. Although reward functions are often difficult to manually specify, human feedback in the form of demonstrations or preferences is often much easier to obtain but can be difficult to interpret due to ambiguity and noise. Thus, it is critical that AI systems take into account epistemic uncertainty over the human's true intent. As part of the AAAI New Faculty Highlight Program, I will give an overview of my research progress along the following fundamental research areas: (1) efficiently quantifying uncertainty over human intent, (2) directly optimizing behavior to be robust to uncertainty over human intent, and (3) actively querying for additional human input to reduce uncertainty over human intent.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multisensory machine intelligence 多感官机器智能
IF 3.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-26 DOI: 10.1002/aaai.70026
Ruohan Gao

The future of artificial intelligence demands a paradigm shift toward multisensory perception—to systems that can digest ongoing multisensory observations, that can discover structure in unlabeled raw sensory data, and that can intelligently fuse useful information from different sensory modalities for decision-making. While we humans naturally perceive the world by looking, listening, touching, smelling, and tasting, traditional forms of machine intelligence mostly focus on a single sensory modality, particularly vision. Therefore, my research, which I refer to as multisensory machine intelligence, seeks to bridge this gap by empowering machines to emulate and enhance human capabilities in seeing, hearing, and feeling, ultimately enabling them to comprehensively perceive, understand, and interact with multisensory world.

人工智能的未来需要向多感官感知的范式转变,即能够消化正在进行的多感官观察的系统,能够在未标记的原始感官数据中发现结构,并且能够智能地融合来自不同感官模式的有用信息以进行决策。虽然我们人类自然地通过看、听、摸、嗅和品尝来感知世界,但传统形式的机器智能主要集中在单一的感官形态上,尤其是视觉。因此,我的研究,我称之为多感官机器智能,试图通过赋予机器模仿和增强人类在视觉、听觉和感觉方面的能力来弥合这一差距,最终使它们能够全面感知、理解多感官世界并与之互动。
{"title":"Multisensory machine intelligence","authors":"Ruohan Gao","doi":"10.1002/aaai.70026","DOIUrl":"https://doi.org/10.1002/aaai.70026","url":null,"abstract":"<p>The future of artificial intelligence demands a paradigm shift toward multisensory perception—to systems that can digest ongoing multisensory observations, that can discover structure in unlabeled raw sensory data, and that can intelligently fuse useful information from different sensory modalities for decision-making. While we humans naturally perceive the world by looking, listening, touching, smelling, and tasting, traditional forms of machine intelligence mostly focus on a single sensory modality, particularly vision. Therefore, my research, which I refer to as multisensory machine intelligence, seeks to bridge this gap by empowering machines to emulate and enhance human capabilities in seeing, hearing, and feeling, ultimately enabling them to comprehensively perceive, understand, and interact with multisensory world.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 3","pages":""},"PeriodicalIF":3.2,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70026","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144897785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Ai Magazine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1