How good are large language models at product risk assessment?

IF 3 3区医学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Risk Analysis Pub Date : 2024-06-08 DOI:10.1111/risa.14351

Zachary A Collier, Richard J Gruss, Alan S Abrahams

{"title":"How good are large language models at product risk assessment?","authors":"Zachary A Collier, Richard J Gruss, Alan S Abrahams","doi":"10.1111/risa.14351","DOIUrl":null,"url":null,"abstract":"<p><p>Product safety professionals must assess the risks to consumers associated with the foreseeable uses and misuses of products. In this study, we investigate the utility of generative artificial intelligence (AI), specifically large language models (LLMs) such as ChatGPT, across a number of tasks involved in the product risk assessment process. For a set of six consumer products, prompts were developed related to failure mode identification, the construction and population of a failure mode and effects analysis (FMEA) table, risk mitigation identification, and guidance to product designers, users, and regulators. These prompts were input into ChatGPT and the outputs were recorded. A survey was administered to product safety professionals to ascertain the quality of the outputs. We found that ChatGPT generally performed better at divergent thinking tasks such as brainstorming potential failure modes and risk mitigations. However, there were errors and inconsistencies in some of the results, and the guidance provided was perceived as overly generic, occasionally outlandish, and not reflective of the depth of knowledge held by a subject matter expert. When tested against a sample of other LLMs, similar patterns in strengths and weaknesses were demonstrated. Despite these challenges, a role for LLMs may still exist in product risk assessment to assist in ideation, while experts may shift their focus to critical review of AI-generated content.</p>","PeriodicalId":21472,"journal":{"name":"Risk Analysis","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Risk Analysis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/risa.14351","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Product safety professionals must assess the risks to consumers associated with the foreseeable uses and misuses of products. In this study, we investigate the utility of generative artificial intelligence (AI), specifically large language models (LLMs) such as ChatGPT, across a number of tasks involved in the product risk assessment process. For a set of six consumer products, prompts were developed related to failure mode identification, the construction and population of a failure mode and effects analysis (FMEA) table, risk mitigation identification, and guidance to product designers, users, and regulators. These prompts were input into ChatGPT and the outputs were recorded. A survey was administered to product safety professionals to ascertain the quality of the outputs. We found that ChatGPT generally performed better at divergent thinking tasks such as brainstorming potential failure modes and risk mitigations. However, there were errors and inconsistencies in some of the results, and the guidance provided was perceived as overly generic, occasionally outlandish, and not reflective of the depth of knowledge held by a subject matter expert. When tested against a sample of other LLMs, similar patterns in strengths and weaknesses were demonstrated. Despite these challenges, a role for LLMs may still exist in product risk assessment to assist in ideation, while experts may shift their focus to critical review of AI-generated content.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型语言模型在产品风险评估方面有多大作用？

产品安全专业人员必须评估与产品的可预见使用和误用相关的消费者风险。在这项研究中，我们调查了生成式人工智能（AI），特别是大型语言模型（LLMs）（如 ChatGPT）在产品风险评估过程中所涉及的一系列任务中的实用性。针对一组六种消费品，我们开发了与失效模式识别、失效模式和效应分析表（FMEA）的构建和汇总、风险缓解识别以及对产品设计师、用户和监管机构的指导相关的提示。这些提示信息被输入 ChatGPT 并记录输出结果。我们对产品安全专业人员进行了调查，以确定输出结果的质量。我们发现，ChatGPT 在发散性思维任务（如头脑风暴潜在失效模式和风险缓解措施）方面普遍表现较好。但是，有些结果存在错误和不一致，所提供的指导被认为过于笼统，有时过于离谱，不能反映主题专家所掌握的知识深度。在与其他法律硕士样本进行测试时，也发现了类似的优缺点。尽管存在这些挑战，但法律硕士在产品风险评估中仍可发挥作用，协助构思，而专家则可将重点转向对人工智能生成的内容进行批判性审查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Risk Analysis 数学-数学跨学科应用

CiteScore

7.50

自引率

10.50%

发文量

183

审稿时长

4.2 months

期刊介绍： Published on behalf of the Society for Risk Analysis, Risk Analysis is ranked among the top 10 journals in the ISI Journal Citation Reports under the social sciences, mathematical methods category, and provides a focal point for new developments in the field of risk analysis. This international peer-reviewed journal is committed to publishing critical empirical research and commentaries dealing with risk issues. The topics covered include: • Human health and safety risks • Microbial risks • Engineering • Mathematical modeling • Risk characterization • Risk communication • Risk management and decision-making • Risk perception, acceptability, and ethics • Laws and regulatory policy • Ecological risks.