推进神经信息检索的终身学习：定义、数据集、框架和实证评估

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Sciences Pub Date : 2024-08-22 DOI:10.1016/j.ins.2024.121368

{"title":"推进神经信息检索的终身学习：定义、数据集、框架和实证评估","authors":"","doi":"10.1016/j.ins.2024.121368","DOIUrl":null,"url":null,"abstract":"<div><p>Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for neural information retrieval (NIR) tasks, a well-defined task definition is still lacking, and it is unclear how typical learning strategies perform in this context. To address this challenge, a systematic task definition of continual NIR is presented, along with a multiple-topic dataset that simulates continuous information retrieval. A comprehensive continual neural information retrieval framework consisting of typical retrieval models and continual learning strategies is then proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval and enhance performance on previously learned tasks. The results also indicate that embedding-based retrieval models experience a decline in their continual learning performance as the topic shift distance and dataset volume of new tasks increase. In contrast, pretraining-based models do not show any such correlation. Adopting suitable learning strategies can mitigate the effects of topic shift and data augmentation in continual neural information retrieval.</p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0020025524012829/pdfft?md5=4615ff1900c782ca16ffa814bfc2c2dc&pid=1-s2.0-S0020025524012829-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Advancing continual lifelong learning in neural information retrieval: Definition, dataset, framework, and empirical evaluation\",\"authors\":\"\",\"doi\":\"10.1016/j.ins.2024.121368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for neural information retrieval (NIR) tasks, a well-defined task definition is still lacking, and it is unclear how typical learning strategies perform in this context. To address this challenge, a systematic task definition of continual NIR is presented, along with a multiple-topic dataset that simulates continuous information retrieval. A comprehensive continual neural information retrieval framework consisting of typical retrieval models and continual learning strategies is then proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval and enhance performance on previously learned tasks. The results also indicate that embedding-based retrieval models experience a decline in their continual learning performance as the topic shift distance and dataset volume of new tasks increase. In contrast, pretraining-based models do not show any such correlation. Adopting suitable learning strategies can mitigate the effects of topic shift and data augmentation in continual neural information retrieval.</p></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0020025524012829/pdfft?md5=4615ff1900c782ca16ffa814bfc2c2dc&pid=1-s2.0-S0020025524012829-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025524012829\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524012829","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

持续学习指的是机器学习模型学习和适应新信息的能力，而不会影响其在先前学习任务中的表现。虽然已有多项研究对神经信息检索（NIR）任务的持续学习方法进行了调查，但目前仍缺乏一个明确的任务定义，也不清楚典型的学习策略在这种情况下的表现如何。为了应对这一挑战，本文介绍了持续神经信息检索的系统任务定义，以及模拟持续信息检索的多主题数据集。然后提出了一个由典型检索模型和持续学习策略组成的综合持续神经信息检索框架。经验评估表明，所提出的框架可以成功地防止神经信息检索中的灾难性遗忘，并提高先前所学任务的性能。结果还表明，随着新任务的主题转移距离和数据集数量的增加，基于嵌入的检索模型的持续学习性能会下降。相比之下，基于预训练的模型则没有表现出这种相关性。采用合适的学习策略可以减轻持续神经信息检索中主题转移和数据增加的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Advancing continual lifelong learning in neural information retrieval: Definition, dataset, framework, and empirical evaluation

Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for neural information retrieval (NIR) tasks, a well-defined task definition is still lacking, and it is unclear how typical learning strategies perform in this context. To address this challenge, a systematic task definition of continual NIR is presented, along with a multiple-topic dataset that simulates continuous information retrieval. A comprehensive continual neural information retrieval framework consisting of typical retrieval models and continual learning strategies is then proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval and enhance performance on previously learned tasks. The results also indicate that embedding-based retrieval models experience a decline in their continual learning performance as the topic shift distance and dataset volume of new tasks increase. In contrast, pretraining-based models do not show any such correlation. Adopting suitable learning strategies can mitigate the effects of topic shift and data augmentation in continual neural information retrieval.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.