Sebastian Bruch, Claudio Lucchese, Franco Maria Nardini
As information retrieval researchers, we not only develop algorithmic solutions to hard problems, but we also insist on a proper, multifaceted evaluation of ideas. The literature on the fundamental topic of retrieval and ranking, for instance, has a rich history of studying the effectiveness of indexes, retrieval algorithms, and complex machine learning rankers, while at the same time quantifying their computational costs, from creation and training to application and inference. This is evidenced, for example, by more than a decade of research on efficient training and inference of large decision forest models in Learning to Rank (LtR). As we move towards even more complex, deep learning models in a wide range of applications, questions on efficiency have once again resurfaced with renewed urgency. Indeed, efficiency is no longer limited to time and space; instead it has found new, challenging dimensions that stretch to resource-, sample- and energy-efficiency with ramifications for researchers, users, and the environment.
This monograph takes a step towards promoting the study of efficiency in the era of neural information retrieval by offering a comprehensive survey of the literature on efficiency and effectiveness in ranking, and to a limited extent, retrieval. This monograph was inspired by the parallels that exist between the challenges in neural network-based ranking solutions and their predecessors, decision forest-based LtR models, as well as the connections between the solutions the literature to date has to offer. We believe that by understanding the fundamentals underpinning these algorithmic and data structure solutions for containing the contentious relationship between efficiency and effectiveness, one can better identify future directions and more efficiently determine the merits of ideas. We also present what we believe to be important research directions in the forefront of efficiency and effectiveness in retrieval and ranking.
{"title":"Efficient and Effective Tree-based and Neural Learning to Rank","authors":"Sebastian Bruch, Claudio Lucchese, Franco Maria Nardini","doi":"10.1561/1500000071","DOIUrl":"https://doi.org/10.1561/1500000071","url":null,"abstract":"<p>As information retrieval researchers, we not only develop algorithmic solutions to hard problems, but we also insist on a proper, multifaceted evaluation of ideas. The literature on the fundamental topic of retrieval and ranking, for instance, has a rich history of studying the effectiveness of indexes, retrieval algorithms, and complex machine learning rankers, while at the same time quantifying their computational costs, from creation and training to application and inference. This is evidenced, for example, by more than a decade of research on efficient training and inference of large decision forest models in Learning to Rank (LtR). As we move towards even more complex, deep learning models in a wide range of applications, questions on efficiency have once again resurfaced with renewed urgency. Indeed, efficiency is no longer limited to time and space; instead it has found new, challenging dimensions that stretch to resource-, sample- and energy-efficiency with ramifications for researchers, users, and the environment.<p>This monograph takes a step towards promoting the study of efficiency in the era of neural information retrieval by offering a comprehensive survey of the literature on efficiency and effectiveness in ranking, and to a limited extent, retrieval. This monograph was inspired by the parallels that exist between the challenges in neural network-based ranking solutions and their predecessors, decision forest-based LtR models, as well as the connections between the solutions the literature to date has to offer. We believe that by understanding the fundamentals underpinning these algorithmic and data structure solutions for containing the contentious relationship between efficiency and effectiveness, one can better identify future directions and more efficiently determine the merits of ideas. We also present what we believe to be important research directions in the forefront of efficiency and effectiveness in retrieval and ranking.</p></p>","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"72 4","pages":""},"PeriodicalIF":10.4,"publicationDate":"2023-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49697987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The introduction of Quantum Theory (QT) provides a unified mathematical framework for Information Retrieval (IR). Compared with the classical IR framework, the quantuminspired IR framework is based on user-centered modeling methods to model non-classical cognitive phenomena in human relevance judgment in the IR process. With the increase of data and computing resources, neural IR methods have been applied to the text matching and understanding task of IR. Neural networks have a strong learning ability of effective representation and generalization of matching patterns from raw data. However, these methods show some unavoidable defects, such as the inability to model user cognitive phenomena, large number of model parameters and the “black box” characteristics of network structure. These problems greatly limit the development of neural IR and related fields. Although the quantum-inspired retrieval framework can theoretically solve the above problems, it is faced with problems such as poor model efficiency and difficulty in integrating with neural network, which lead to a huge gap between QT and neural network modeling.
This review gives a systematic introduction to quantuminspired neural IR, including quantum-inspired neural language representation, matching and understanding. This is not only helpful to non-classical phenomena modeling in IR but also to break the theoretical bottleneck of neural networks and design more transparent neural IR models. We introduce the language representation method based on QT and the quantum-inspired text matching and decision making model under neural network, which shows its theoretical advantages in document ranking, relevance matching, multimodal IR, and can be integrated with neural networks to jointly promote the development of IR. The latest progress of quantum language understanding is introduced and further topics on QT and language modeling provide readers with more materials for thinking.
{"title":"Quantum-Inspired Neural Language Representation, Matching and Understanding","authors":"Peng Zhang, Hui Gao, Jing Zhang, Dawei Song","doi":"10.1561/1500000091","DOIUrl":"https://doi.org/10.1561/1500000091","url":null,"abstract":"<p>The introduction of Quantum Theory (QT) provides a unified mathematical framework for Information Retrieval (IR). Compared with the classical IR framework, the quantuminspired IR framework is based on user-centered modeling methods to model non-classical cognitive phenomena in human relevance judgment in the IR process. With the increase of data and computing resources, neural IR methods have been applied to the text matching and understanding task of IR. Neural networks have a strong learning ability of effective representation and generalization of matching patterns from raw data. However, these methods show some unavoidable defects, such as the inability to model user cognitive phenomena, large number of model parameters and the “black box” characteristics of network structure. These problems greatly limit the development of neural IR and related fields. Although the quantum-inspired retrieval framework can theoretically solve the above problems, it is faced with problems such as poor model efficiency and difficulty in integrating with neural network, which lead to a huge gap between QT and neural network modeling.<p>This review gives a systematic introduction to quantuminspired neural IR, including quantum-inspired neural language representation, matching and understanding. This is not only helpful to non-classical phenomena modeling in IR but also to break the theoretical bottleneck of neural networks and design more transparent neural IR models. We introduce the language representation method based on QT and the quantum-inspired text matching and decision making model under neural network, which shows its theoretical advantages in document ranking, relevance matching, multimodal IR, and can be integrated with neural networks to jointly promote the development of IR. The latest progress of quantum language understanding is introduced and further topics on QT and language modeling provide readers with more materials for thinking.</p></p>","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"54 45","pages":""},"PeriodicalIF":10.4,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49698420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Lex, Dominik Kowald, Paul Seitlinger, Thi Ngoc Trang Tran, A. Felfernig, M. Schedl
Personalized recommender systems have become indispensable in today’s online world. Most of today’s recommendation algorithms are data-driven and based on behavioral data. While such systems can produce useful recommendations, they are often uninterpretable, black-box models, which do not incorporate the underlying cognitive reasons for user behavior in the algorithms’ design. The aim of this survey is to present a thorough review of the state of the art of recommender systems that leverage psychological constructs and theories to model and predict user behavior and improve the recommendation process. We call such systems psychology-informed recommender systems. The survey identifies three categories of psychology-informed recommender systems: cognition-inspired, personality-aware, and affectaware recommender systems. Moreover, for each category, Elisabeth Lex, Dominik Kowald, Paul Seitlinger, Thi Ngoc Trang Tran, Alexander Felfernig and Markus Schedl (2021), “Psychology-informed Recommender Systems”, Foundations and Trends® in Information Retrieval: Vol. 15, No. 2, pp 134–242. DOI: 10.1561/1500000090. Full text available at: http://dx.doi.org/10.1561/1500000090
个性化推荐系统在当今的网络世界中已经不可或缺。今天的大多数推荐算法都是数据驱动的,基于行为数据。虽然这样的系统可以产生有用的建议,但它们通常是不可解释的黑箱模型,没有在算法设计中纳入用户行为的潜在认知原因。本调查的目的是对推荐系统的现状进行全面的回顾,这些系统利用心理学结构和理论来建模和预测用户行为,并改进推荐过程。我们称这种系统为基于心理的推荐系统。该调查确定了三类基于心理学的推荐系统:认知启发型、个性感知型和情感感知型推荐系统。此外,对于每个类别,Elisabeth Lex, Dominik Kowald, Paul Seitlinger, Thi Ngoc Trang Tran, Alexander felferning和Markus Schedl(2021),“心理通知推荐系统”,信息检索的基础和趋势®:第15卷,第2期,第134-242页。DOI: 10.1561 / 1500000090。全文可在:http://dx.doi.org/10.1561/1500000090
{"title":"Psychology-informed Recommender Systems","authors":"E. Lex, Dominik Kowald, Paul Seitlinger, Thi Ngoc Trang Tran, A. Felfernig, M. Schedl","doi":"10.1561/1500000090","DOIUrl":"https://doi.org/10.1561/1500000090","url":null,"abstract":"Personalized recommender systems have become indispensable in today’s online world. Most of today’s recommendation algorithms are data-driven and based on behavioral data. While such systems can produce useful recommendations, they are often uninterpretable, black-box models, which do not incorporate the underlying cognitive reasons for user behavior in the algorithms’ design. The aim of this survey is to present a thorough review of the state of the art of recommender systems that leverage psychological constructs and theories to model and predict user behavior and improve the recommendation process. We call such systems psychology-informed recommender systems. The survey identifies three categories of psychology-informed recommender systems: cognition-inspired, personality-aware, and affectaware recommender systems. Moreover, for each category, Elisabeth Lex, Dominik Kowald, Paul Seitlinger, Thi Ngoc Trang Tran, Alexander Felfernig and Markus Schedl (2021), “Psychology-informed Recommender Systems”, Foundations and Trends® in Information Retrieval: Vol. 15, No. 2, pp 134–242. DOI: 10.1561/1500000090. Full text available at: http://dx.doi.org/10.1561/1500000090","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"10 1","pages":"134-242"},"PeriodicalIF":10.4,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87436257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Bendersky, Xuanhui Wang, Marc Najork, Donald Metzler
Email has been an essential communication medium for many years. As a result, the information accumulated in our mailboxes has become valuable for all of our personal and professional activities. For years, researchers have been developing interfaces, models and algorithms to facilitate search, discovery and organization of email data. In this survey, we attempt to bring together these diverse research directions, and provide both a historical background, as well as a comprehensive overview of the recent advances in the field. In particular, we lay out all the components needed in the design of a privacy-centric email search engine, including search interface, indexing, document and query understanding, retrieval, ranking and evaluation. We also go beyond search, presenting recent work on intelligent task assistance in email. Finally, we discuss some emerging trends and future directions in email search and discovery research.
{"title":"Search and Discovery in Personal Email Collections","authors":"Michael Bendersky, Xuanhui Wang, Marc Najork, Donald Metzler","doi":"10.1561/1500000069","DOIUrl":"https://doi.org/10.1561/1500000069","url":null,"abstract":"<p>Email has been an essential communication medium for many years. As a result, the information accumulated in our mailboxes has become valuable for all of our personal and professional activities. For years, researchers have been developing interfaces, models and algorithms to facilitate search, discovery and organization of email data. In this survey, we attempt to bring together these diverse research directions, and provide both a historical background, as well as a comprehensive overview of the recent advances in the field. In particular, we lay out all the components needed in the design of a privacy-centric email search engine, including search interface, indexing, document and query understanding, retrieval, ranking and evaluation. We also go beyond search, presenting recent work on intelligent task assistance in email. Finally, we discuss some emerging trends and future directions in email search and discovery research.</p>","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"318 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2021-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael D. Ekstrand, Anubrata Das, R. Burke, Fernando Diaz
Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that have been developed for studying other machine learning systems. While fair information access shares many commonalities with fair classification, the multistakeholder nature of information access applications, the rank-based problem setting, the centrality of personalization in many cases, and the role of user response complicate the problem of identifying precisely what types and operationalizations of fairness may be relevant, let alone measuring or promoting them. In this monograph, we present a taxonomy of the various dimensions of fair information access and survey the literature to date on this new and rapidly-growing topic. We preface this with brief introductions to information access and algorithmic fairness, to facilitate use of this work by scholars with experience in one (or neither) of these fields who wish to learn about their intersection. We conclude with several open problems in fair information access, along with some suggestions for how to approach research in this space.
{"title":"Fairness in Information Access Systems","authors":"Michael D. Ekstrand, Anubrata Das, R. Burke, Fernando Diaz","doi":"10.1561/1500000079","DOIUrl":"https://doi.org/10.1561/1500000079","url":null,"abstract":"Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that have been developed for studying other machine learning systems. While fair information access shares many commonalities with fair classification, the multistakeholder nature of information access applications, the rank-based problem setting, the centrality of personalization in many cases, and the role of user response complicate the problem of identifying precisely what types and operationalizations of fairness may be relevant, let alone measuring or promoting them. In this monograph, we present a taxonomy of the various dimensions of fair information access and survey the literature to date on this new and rapidly-growing topic. We preface this with brief introductions to information access and algorithmic fairness, to facilitate use of this work by scholars with experience in one (or neither) of these fields who wish to learn about their intersection. We conclude with several open problems in fair information access, along with some suggestions for how to approach research in this space.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"1 1","pages":"1-177"},"PeriodicalIF":10.4,"publicationDate":"2021-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89800926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang Liu, Ying-Hsang Liu, Jingjing Liu, R. Bierig
{"title":"Search Interface Design and Evaluation","authors":"Chang Liu, Ying-Hsang Liu, Jingjing Liu, R. Bierig","doi":"10.1561/1500000073","DOIUrl":"https://doi.org/10.1561/1500000073","url":null,"abstract":"","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"77 1","pages":"243-416"},"PeriodicalIF":10.4,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87093182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this survey, we provide an overview of the literature on knowledge graphs (KGs) in the context of information retrieval (IR). Modern IR systems can benefit from information available in KGs in multiple ways, independent of whether the KGs are publicly available or proprietary ones. We provide an overview of the components required when building IR systems that leverage KGs and use a task-oriented organization of the material that we discuss. As an understanding of the intersection of IR and KGs is beneficial to many researchers and practitioners, we consider prior work from two complementary angles: leveraging KGs for information retrieval and enriching KGs using IR techniques. We start by discussing how KGs can be employed to support IR tasks, including document and entity retrieval. We then proceed by describing how IR—and language technology in general—can be utilized for the construction and completion of KGs. This includes tasks such as entity recognition, typing, and relation extraction. We discuss common issues that appear across the tasks that we consider and identify future directions for addressing them. We also provide pointers to datasets and other resources that should be useful for both newcomers and experienced researchers in the area. Ridho Reinanda, Edgar Meij and Maarten de Rijke (2020), “Knowledge Graphs: An Information Retrieval Perspective”, Foundations and Trends® in Information Retrieval: Vol. 14, No. 4, pp 289–444. DOI: 10.1561/1500000063. Full text available at: http://dx.doi.org/10.1561/1500000063
在本调查中,我们概述了知识图在信息检索(IR)背景下的文献。现代红外系统可以以多种方式受益于kg中提供的信息,而不依赖于kg是公开可用的还是专有的。我们提供了构建利用kg的IR系统所需组件的概述,并使用我们讨论的材料的面向任务的组织。由于理解IR和KGs的交集对许多研究人员和从业者都是有益的,我们从两个互补的角度来考虑之前的工作:利用KGs进行信息检索和使用IR技术丰富KGs。我们首先讨论如何使用kg来支持IR任务,包括文档和实体检索。然后,我们继续描述ir和一般语言技术如何用于构建和完成kg,这包括实体识别、输入和关系提取等任务。我们讨论在我们考虑的任务中出现的常见问题,并确定解决这些问题的未来方向。我们还提供了指向数据集和其他资源的指针,这些资源对该领域的新手和经验丰富的研究人员都很有用。Ridho Reinanda, Edgar Meij和Maarten de Rijke(2020),“知识图谱:信息检索视角”,信息检索的基础和趋势®:第14卷,第4期,第289-444页。DOI: 10.1561 / 1500000063。全文可在:http://dx.doi.org/10.1561/1500000063
{"title":"Knowledge Graphs: An Information Retrieval Perspective","authors":"Ridho Reinanda, E. Meij, M. de Rijke","doi":"10.1561/1500000063","DOIUrl":"https://doi.org/10.1561/1500000063","url":null,"abstract":"In this survey, we provide an overview of the literature on knowledge graphs (KGs) in the context of information retrieval (IR). Modern IR systems can benefit from information available in KGs in multiple ways, independent of whether the KGs are publicly available or proprietary ones. We provide an overview of the components required when building IR systems that leverage KGs and use a task-oriented organization of the material that we discuss. As an understanding of the intersection of IR and KGs is beneficial to many researchers and practitioners, we consider prior work from two complementary angles: leveraging KGs for information retrieval and enriching KGs using IR techniques. We start by discussing how KGs can be employed to support IR tasks, including document and entity retrieval. We then proceed by describing how IR—and language technology in general—can be utilized for the construction and completion of KGs. This includes tasks such as entity recognition, typing, and relation extraction. We discuss common issues that appear across the tasks that we consider and identify future directions for addressing them. We also provide pointers to datasets and other resources that should be useful for both newcomers and experienced researchers in the area. Ridho Reinanda, Edgar Meij and Maarten de Rijke (2020), “Knowledge Graphs: An Information Retrieval Perspective”, Foundations and Trends® in Information Retrieval: Vol. 14, No. 4, pp 289–444. DOI: 10.1561/1500000063. Full text available at: http://dx.doi.org/10.1561/1500000063","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"3 1","pages":"289-444"},"PeriodicalIF":10.4,"publicationDate":"2020-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77053977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}