Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma
{"title":"OmniQuery:从上下文角度增强捕捉到的多模态记忆,实现个人问题解答","authors":"Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma","doi":"arxiv-2409.08250","DOIUrl":null,"url":null,"abstract":"People often capture memories through photos, screenshots, and videos. While\nexisting AI-based tools enable querying this data using natural language, they\nmostly only support retrieving individual pieces of information like certain\nobjects in photos and struggle with answering more complex queries that involve\ninterpreting interconnected memories like event sequences. We conducted a\none-month diary study to collect realistic user queries and generated a\ntaxonomy of necessary contextual information for integrating with captured\nmemories. We then introduce OmniQuery, a novel system that is able to answer\ncomplex personal memory-related questions that require extracting and inferring\ncontextual information. OmniQuery augments single captured memories through\nintegrating scattered contextual information from multiple interconnected\nmemories, retrieves relevant memories, and uses a large language model (LLM) to\ncomprehensive answers. In human evaluations, we show the effectiveness of\nOmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG\nsystem, winning or tying in 74.5% of the time.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering\",\"authors\":\"Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma\",\"doi\":\"arxiv-2409.08250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People often capture memories through photos, screenshots, and videos. While\\nexisting AI-based tools enable querying this data using natural language, they\\nmostly only support retrieving individual pieces of information like certain\\nobjects in photos and struggle with answering more complex queries that involve\\ninterpreting interconnected memories like event sequences. We conducted a\\none-month diary study to collect realistic user queries and generated a\\ntaxonomy of necessary contextual information for integrating with captured\\nmemories. We then introduce OmniQuery, a novel system that is able to answer\\ncomplex personal memory-related questions that require extracting and inferring\\ncontextual information. OmniQuery augments single captured memories through\\nintegrating scattered contextual information from multiple interconnected\\nmemories, retrieves relevant memories, and uses a large language model (LLM) to\\ncomprehensive answers. In human evaluations, we show the effectiveness of\\nOmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG\\nsystem, winning or tying in 74.5% of the time.\",\"PeriodicalId\":501541,\"journal\":{\"name\":\"arXiv - CS - Human-Computer Interaction\",\"volume\":\"64 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Human-Computer Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
People often capture memories through photos, screenshots, and videos. While
existing AI-based tools enable querying this data using natural language, they
mostly only support retrieving individual pieces of information like certain
objects in photos and struggle with answering more complex queries that involve
interpreting interconnected memories like event sequences. We conducted a
one-month diary study to collect realistic user queries and generated a
taxonomy of necessary contextual information for integrating with captured
memories. We then introduce OmniQuery, a novel system that is able to answer
complex personal memory-related questions that require extracting and inferring
contextual information. OmniQuery augments single captured memories through
integrating scattered contextual information from multiple interconnected
memories, retrieves relevant memories, and uses a large language model (LLM) to
comprehensive answers. In human evaluations, we show the effectiveness of
OmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG
system, winning or tying in 74.5% of the time.