OmniQuery:从上下文角度增强捕捉到的多模态记忆,实现个人问题解答

Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma
{"title":"OmniQuery:从上下文角度增强捕捉到的多模态记忆,实现个人问题解答","authors":"Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma","doi":"arxiv-2409.08250","DOIUrl":null,"url":null,"abstract":"People often capture memories through photos, screenshots, and videos. While\nexisting AI-based tools enable querying this data using natural language, they\nmostly only support retrieving individual pieces of information like certain\nobjects in photos and struggle with answering more complex queries that involve\ninterpreting interconnected memories like event sequences. We conducted a\none-month diary study to collect realistic user queries and generated a\ntaxonomy of necessary contextual information for integrating with captured\nmemories. We then introduce OmniQuery, a novel system that is able to answer\ncomplex personal memory-related questions that require extracting and inferring\ncontextual information. OmniQuery augments single captured memories through\nintegrating scattered contextual information from multiple interconnected\nmemories, retrieves relevant memories, and uses a large language model (LLM) to\ncomprehensive answers. In human evaluations, we show the effectiveness of\nOmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG\nsystem, winning or tying in 74.5% of the time.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering\",\"authors\":\"Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma\",\"doi\":\"arxiv-2409.08250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People often capture memories through photos, screenshots, and videos. While\\nexisting AI-based tools enable querying this data using natural language, they\\nmostly only support retrieving individual pieces of information like certain\\nobjects in photos and struggle with answering more complex queries that involve\\ninterpreting interconnected memories like event sequences. We conducted a\\none-month diary study to collect realistic user queries and generated a\\ntaxonomy of necessary contextual information for integrating with captured\\nmemories. We then introduce OmniQuery, a novel system that is able to answer\\ncomplex personal memory-related questions that require extracting and inferring\\ncontextual information. OmniQuery augments single captured memories through\\nintegrating scattered contextual information from multiple interconnected\\nmemories, retrieves relevant memories, and uses a large language model (LLM) to\\ncomprehensive answers. In human evaluations, we show the effectiveness of\\nOmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG\\nsystem, winning or tying in 74.5% of the time.\",\"PeriodicalId\":501541,\"journal\":{\"name\":\"arXiv - CS - Human-Computer Interaction\",\"volume\":\"64 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Human-Computer Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人们经常通过照片、截图和视频来捕捉记忆。虽然现有的基于人工智能的工具可以使用自然语言查询这些数据,但它们几乎只支持检索单个信息,如照片中的某些物体,而难以回答涉及解释事件序列等相互关联的记忆的更复杂查询。我们进行了为期一个月的日记研究,以收集真实的用户查询,并生成了与捕获记忆整合的必要上下文信息分类标准。然后,我们介绍了 OmniQuery,这是一个新颖的系统,能够回答需要提取和推断上下文信息的复杂个人记忆相关问题。OmniQuery 通过整合多个相互连接的记忆中分散的上下文信息来增强单个捕获的记忆,检索相关记忆,并使用大型语言模型(LLM)来综合回答问题。在人类评估中,我们展示了 OmniQuery 的有效性,其准确率高达 71.5%,而且它的表现优于传统的 RAG 系统,在 74.5% 的情况下获胜或打成平手。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
People often capture memories through photos, screenshots, and videos. While existing AI-based tools enable querying this data using natural language, they mostly only support retrieving individual pieces of information like certain objects in photos and struggle with answering more complex queries that involve interpreting interconnected memories like event sequences. We conducted a one-month diary study to collect realistic user queries and generated a taxonomy of necessary contextual information for integrating with captured memories. We then introduce OmniQuery, a novel system that is able to answer complex personal memory-related questions that require extracting and inferring contextual information. OmniQuery augments single captured memories through integrating scattered contextual information from multiple interconnected memories, retrieves relevant memories, and uses a large language model (LLM) to comprehensive answers. In human evaluations, we show the effectiveness of OmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG system, winning or tying in 74.5% of the time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Equimetrics -- Applying HAR principles to equestrian activities AI paintings vs. Human Paintings? Deciphering Public Interactions and Perceptions towards AI-Generated Paintings on TikTok From Data Stories to Dialogues: A Randomised Controlled Trial of Generative AI Agents and Data Storytelling in Enhancing Data Visualisation Comprehension Exploring Gaze Pattern in Autistic Children: Clustering, Visualization, and Prediction Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1