Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks

Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu, Christian Berger
{"title":"Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks","authors":"Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu, Christian Berger","doi":"arxiv-2408.01433","DOIUrl":null,"url":null,"abstract":"Today's advanced driver assistance systems (ADAS), like adaptive cruise\ncontrol or rear collision warning, are finding broader adoption across vehicle\nclasses. Integrating such advanced, multimodal Large Language Models (LLMs) on\nboard a vehicle, which are capable of processing text, images, audio, and other\ndata types, may have the potential to greatly enhance passenger comfort. Yet,\nan LLM's hallucinations are still a major challenge to be addressed. In this\npaper, we systematically assessed potential hallucination detection strategies\nfor such LLMs in the context of object detection in vision-based data on the\nexample of pedestrian detection and localization. We evaluate three\nhallucination detection strategies applied to two state-of-the-art LLMs, the\nproprietary GPT-4V and the open LLaVA, on two datasets (Waymo/US and PREPER\nCITY/Sweden). Our results show that these LLMs can describe a traffic situation\nto an impressive level of detail but are still challenged for further analysis\nactivities such as object localization. We evaluate and extend hallucination\ndetection approaches when applying these LLMs to video sequences in the example\nof pedestrian detection. Our experiments show that, at the moment, the\nstate-of-the-art proprietary LLM performs much better than the open LLM.\nFurthermore, consistency enhancement techniques based on voting, such as the\nBest-of-Three (BO3) method, do not effectively reduce hallucinations in LLMs\nthat tend to exhibit high false negatives in detecting pedestrians. However,\nextending the hallucination detection by including information from the past\nhelps to improve results.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Today's advanced driver assistance systems (ADAS), like adaptive cruise control or rear collision warning, are finding broader adoption across vehicle classes. Integrating such advanced, multimodal Large Language Models (LLMs) on board a vehicle, which are capable of processing text, images, audio, and other data types, may have the potential to greatly enhance passenger comfort. Yet, an LLM's hallucinations are still a major challenge to be addressed. In this paper, we systematically assessed potential hallucination detection strategies for such LLMs in the context of object detection in vision-based data on the example of pedestrian detection and localization. We evaluate three hallucination detection strategies applied to two state-of-the-art LLMs, the proprietary GPT-4V and the open LLaVA, on two datasets (Waymo/US and PREPER CITY/Sweden). Our results show that these LLMs can describe a traffic situation to an impressive level of detail but are still challenged for further analysis activities such as object localization. We evaluate and extend hallucination detection approaches when applying these LLMs to video sequences in the example of pedestrian detection. Our experiments show that, at the moment, the state-of-the-art proprietary LLM performs much better than the open LLM. Furthermore, consistency enhancement techniques based on voting, such as the Best-of-Three (BO3) method, do not effectively reduce hallucinations in LLMs that tend to exhibit high false negatives in detecting pedestrians. However, extending the hallucination detection by including information from the past helps to improve results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在感知任务中评估和提高法律硕士的可信度
如今,自适应巡航控制或后方碰撞预警等先进驾驶辅助系统(ADAS)正在被各类车辆广泛采用。在汽车上集成这种先进的多模态大型语言模型(LLM),能够处理文本、图像、音频和其他数据类型,可能会大大提高乘客的舒适度。然而,LLM 的幻觉仍然是一个有待解决的重大挑战。在本文中,我们以行人检测和定位为例,系统地评估了在基于视觉数据的物体检测背景下,针对此类 LLM 的潜在幻觉检测策略。我们在两个数据集(Waymo/美国和 PREPERCITY/瑞典)上评估了应用于两种最先进 LLM(专有 GPT-4V 和开放 LLaVA)的三种幻觉检测策略。我们的研究结果表明,这些 LLMs 对交通状况的描述达到了令人印象深刻的详细程度,但在进一步的分析活动(如物体定位)中仍面临挑战。我们以行人检测为例,评估并扩展了将这些 LLMs 应用于视频序列的幻觉检测方法。我们的实验表明,目前最先进的专有 LLM 比开放 LLM 的性能要好得多。此外,基于投票的一致性增强技术,如三选一(BO3)方法,并不能有效减少 LLM 中的幻觉,在检测行人时往往会出现较高的假阴性。不过,通过加入过去的信息来扩展幻觉检测,有助于改善结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pennsieve - A Collaborative Platform for Translational Neuroscience and Beyond Analysing Attacks on Blockchain Systems in a Layer-based Approach Exploring Utility in a Real-World Warehouse Optimization Problem: Formulation Based on Quantun Annealers and Preliminary Results High Definition Map Mapping and Update: A General Overview and Future Directions Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1