AgAsk:帮助农民从科学文献中回答问题的代理

IF 1.6 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE International Journal on Digital Libraries Pub Date : 2023-06-19 DOI:10.1007/s00799-023-00369-y

Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon

{"title":"AgAsk:帮助农民从科学文献中回答问题的代理","authors":"Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon","doi":"10.1007/s00799-023-00369-y","DOIUrl":null,"url":null,"abstract":"Abstract Decisions in agriculture are increasingly data-driven. However, valuable agricultural knowledge is often locked away in free-text reports, manuals and journal articles. Specialised search systems are needed that can mine agricultural information to provide relevant answers to users’ questions. This paper presents AgAsk—an agent able to answer natural language agriculture questions by mining scientific documents. We carefully survey and analyse farmers’ information needs. On the basis of these needs, we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question. We implement and evaluate a number of information retrieval models to answer farmers questions, including two state-of-the-art neural ranking models. We show that neural rankers are highly effective at matching passages to questions in this context. Finally, we propose a deployment architecture for AgAsk that includes a client based on the Telegram messaging platform and retrieval model deployed on commodity hardware. The test collection we provide is intended to stimulate more research in methods to match natural language to answers in scientific documents. While the retrieval models were evaluated in the agriculture domain, they are generalisable and of interest to others working on similar problems. The test collection is available at: https://github.com/ielab/agvaluate .","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"127 1","pages":"0"},"PeriodicalIF":1.6000,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AgAsk: an agent to help answer farmer’s questions from scientific documents\",\"authors\":\"Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon\",\"doi\":\"10.1007/s00799-023-00369-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Decisions in agriculture are increasingly data-driven. However, valuable agricultural knowledge is often locked away in free-text reports, manuals and journal articles. Specialised search systems are needed that can mine agricultural information to provide relevant answers to users’ questions. This paper presents AgAsk—an agent able to answer natural language agriculture questions by mining scientific documents. We carefully survey and analyse farmers’ information needs. On the basis of these needs, we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question. We implement and evaluate a number of information retrieval models to answer farmers questions, including two state-of-the-art neural ranking models. We show that neural rankers are highly effective at matching passages to questions in this context. Finally, we propose a deployment architecture for AgAsk that includes a client based on the Telegram messaging platform and retrieval model deployed on commodity hardware. The test collection we provide is intended to stimulate more research in methods to match natural language to answers in scientific documents. While the retrieval models were evaluated in the agriculture domain, they are generalisable and of interest to others working on similar problems. The test collection is available at: https://github.com/ielab/agvaluate .\",\"PeriodicalId\":44974,\"journal\":{\"name\":\"International Journal on Digital Libraries\",\"volume\":\"127 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00799-023-00369-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00799-023-00369-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

农业决策越来越多地由数据驱动。然而，宝贵的农业知识往往被锁在自由文本报告、手册和期刊文章中。需要专门的搜索系统来挖掘农业信息，为用户的问题提供相关的答案。本文提出了一种能够通过挖掘科学文献来回答自然语言农业问题的智能体asask。我们认真调查和分析农民的信息需求。在这些需求的基础上，我们发布了一个信息检索测试集，包括真实问题，大量科学文献的片段，以及表明哪些段落与每个问题相关的基础真相相关性评估。我们实施和评估了一些信息检索模型来回答农民的问题，包括两个最先进的神经排序模型。我们表明，在这种情况下，神经排序器在匹配段落和问题方面非常有效。最后，我们提出了一个AgAsk的部署体系结构，其中包括基于Telegram消息平台的客户端和部署在商用硬件上的检索模型。我们提供的测试集旨在激发更多的研究方法，将自然语言与科学文献中的答案相匹配。虽然这些检索模型是在农业领域进行评估的，但它们是可推广的，并且对研究类似问题的其他人很感兴趣。测试集可在:https://github.com/ielab/agvaluate上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AgAsk: an agent to help answer farmer’s questions from scientific documents

Abstract Decisions in agriculture are increasingly data-driven. However, valuable agricultural knowledge is often locked away in free-text reports, manuals and journal articles. Specialised search systems are needed that can mine agricultural information to provide relevant answers to users’ questions. This paper presents AgAsk—an agent able to answer natural language agriculture questions by mining scientific documents. We carefully survey and analyse farmers’ information needs. On the basis of these needs, we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question. We implement and evaluate a number of information retrieval models to answer farmers questions, including two state-of-the-art neural ranking models. We show that neural rankers are highly effective at matching passages to questions in this context. Finally, we propose a deployment architecture for AgAsk that includes a client based on the Telegram messaging platform and retrieval model deployed on commodity hardware. The test collection we provide is intended to stimulate more research in methods to match natural language to answers in scientific documents. While the retrieval models were evaluated in the agriculture domain, they are generalisable and of interest to others working on similar problems. The test collection is available at: https://github.com/ielab/agvaluate .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal on Digital Libraries

CiteScore

4.30

自引率

6.70%

发文量

期刊介绍： The International Journal on Digital Libraries (IJDL) examines the theory and practice of acquisition definition organization management preservation and dissemination of digital information via global networking. It covers all aspects of digital libraries (DLs) from large-scale heterogeneous data and information management & access to linking and connectivity to security privacy and policies to its application use and evaluation.The scope of IJDL includes but is not limited to: The FAIR principle and the digital libraries infrastructure Findable: Information access and retrieval; semantic search; data and information exploration; information navigation; smart indexing and searching; resource discovery Accessible: visualization and digital collections; user interfaces; interfaces for handicapped users; HCI and UX in DLs; Security and privacy in DLs; multimodal access Interoperable: metadata (definition management curation integration); syntactic and semantic interoperability; linked data Reusable: reproducibility; Open Science; sustainability profitability repeatability of research results; confidentiality and privacy issues in DLs Digital Library Architectures including heterogeneous and dynamic data management; data and repositories Acquisition of digital information: authoring environments for digital objects; digitization of traditional content Digital Archiving and Preservation Digital Preservation and curation Digital archiving Web Archiving Archiving and preservation Strategies AI for Digital Libraries Machine Learning for DLs Data Mining in DLs NLP for DLs Applications of Digital Libraries Digital Humanities Open Data and their reuse Scholarly DLs (incl. bibliometrics altmetrics) Epigraphy and Paleography Digital Museums Future trends in Digital Libraries Definition of DLs in a ubiquitous digital library world Datafication of digital collections Interaction and user experience (UX) in DLs Information visualization Collection understanding Privacy and security Multimodal user interfaces Accessibility (or "Access for users with disabilities") UX studies