Product reviews can provide rich information about the opinions users have of products. However, it is nontrivial to effectively infer user preference and item characteristics from reviews due to the complicated semantic understanding. Existing methods usually learn features for users and items from reviews in single static fashions and cannot fully capture user preference and item features. In this article, we propose a neural review-based recommendation approach that aims to learn comprehensive representations of users/items under a three-tier attention framework. We design a review encoder to learn review features from words via a word-level attention, an aspect encoder to learn aspect features via a review-level attention, and a user/item encoder to learn the final representations of users/items via an aspect-level attention. In word- and review-level attentions, we adopt the context-aware mechanism to indicate importance of words and reviews dynamically instead of static attention weights. In addition, the attentions in the word and review levels are of multiple paradigms to learn multiple features effectively, which could indicate the diversity of user/item features. Furthermore, we propose a personalized aspect-level attention module in user/item encoder to learn the final comprehensive features. Extensive experiments are conducted and the results in rating prediction validate the effectiveness of our method.
{"title":"Toward Comprehensive User and Item Representations via Three-tier Attention Network","authors":"Hongtao Liu, Wenjun Wang, Qiyao Peng, Nannan Wu, Fangzhao Wu, Pengfei Jiao","doi":"10.1145/3446341","DOIUrl":"https://doi.org/10.1145/3446341","url":null,"abstract":"Product reviews can provide rich information about the opinions users have of products. However, it is nontrivial to effectively infer user preference and item characteristics from reviews due to the complicated semantic understanding. Existing methods usually learn features for users and items from reviews in single static fashions and cannot fully capture user preference and item features. In this article, we propose a neural review-based recommendation approach that aims to learn comprehensive representations of users/items under a three-tier attention framework. We design a review encoder to learn review features from words via a word-level attention, an aspect encoder to learn aspect features via a review-level attention, and a user/item encoder to learn the final representations of users/items via an aspect-level attention. In word- and review-level attentions, we adopt the context-aware mechanism to indicate importance of words and reviews dynamically instead of static attention weights. In addition, the attentions in the word and review levels are of multiple paradigms to learn multiple features effectively, which could indicate the diversity of user/item features. Furthermore, we propose a personalized aspect-level attention module in user/item encoder to learn the final comprehensive features. Extensive experiments are conducted and the results in rating prediction validate the effectiveness of our method.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"20 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88602113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyong Cheng, Fan Liu, Shenghan Mei, Yangyang Guo, Lei Zhu, Liqiang Nie
Item-based collaborative filtering (ICF) enjoys the advantages of high recommendation accuracy and ease in online penalization and thus is favored by the industrial recommender systems. ICF recommends items to a target user based on their similarities to the previously interacted items of the user. Great progresses have been achieved for ICF in recent years by applying advanced machine learning techniques (e.g., deep neural networks) to learn the item similarity from data. The early methods simply treat all the historical items equally and recently proposed methods attempt to distinguish the different importance of historical items when recommending a target item. Despite the progress, we argue that those ICF models neglect the diverse intents of users on adopting items (e.g., watching a movie because of the director, leading actors, or the visual effects). As a result, they fail to estimate the item similarity on a finer-grained level to predict the user’s preference to an item, resulting in sub-optimal recommendation. In this work, we propose a general feature-level attention method for ICF models. The key of our method is to distinguish the importance of different factors when computing the item similarity for a prediction. To demonstrate the effectiveness of our method, we design a light attention neural network to integrate both item-level and feature-level attention for neural ICF models. It is model-agnostic and easy-to-implement. We apply it to two baseline ICF models and evaluate its effectiveness on six public datasets. Extensive experiments show the feature-level attention enhanced models consistently outperform their counterparts, demonstrating the potential of differentiating user intents on the feature-level for ICF recommendation models.
{"title":"Feature-Level Attentive ICF for Recommendation","authors":"Zhiyong Cheng, Fan Liu, Shenghan Mei, Yangyang Guo, Lei Zhu, Liqiang Nie","doi":"10.1145/3490477","DOIUrl":"https://doi.org/10.1145/3490477","url":null,"abstract":"Item-based collaborative filtering (ICF) enjoys the advantages of high recommendation accuracy and ease in online penalization and thus is favored by the industrial recommender systems. ICF recommends items to a target user based on their similarities to the previously interacted items of the user. Great progresses have been achieved for ICF in recent years by applying advanced machine learning techniques (e.g., deep neural networks) to learn the item similarity from data. The early methods simply treat all the historical items equally and recently proposed methods attempt to distinguish the different importance of historical items when recommending a target item. Despite the progress, we argue that those ICF models neglect the diverse intents of users on adopting items (e.g., watching a movie because of the director, leading actors, or the visual effects). As a result, they fail to estimate the item similarity on a finer-grained level to predict the user’s preference to an item, resulting in sub-optimal recommendation. In this work, we propose a general feature-level attention method for ICF models. The key of our method is to distinguish the importance of different factors when computing the item similarity for a prediction. To demonstrate the effectiveness of our method, we design a light attention neural network to integrate both item-level and feature-level attention for neural ICF models. It is model-agnostic and easy-to-implement. We apply it to two baseline ICF models and evaluate its effectiveness on six public datasets. Extensive experiments show the feature-level attention enhanced models consistently outperform their counterparts, demonstrating the potential of differentiating user intents on the feature-level for ICF recommendation models.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"20 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91100017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negative sequential patterns (NSPs) capture more informative and actionable knowledge than classic positive sequential patterns (PSPs) due to the involvement of both occurring and nonoccurring behaviors and events, which can contribute to many relevant applications. However, NSP mining is nontrivial, as it involves fundamental challenges requiring distinct theoretical foundations and is not directly addressable by PSP mining. In the very limited research reported on NSP mining, a negative element constraint (NEC) is incorporated to only consider the NSPs composed of specific forms of elements (containing either positive or negative items), which results in many valuable NSPs being missed. Here, we loosen the NEC (called loose negative element constraint (LNEC)) to include partial negative elements containing both positive and negative items, which enables the discovery of more flexible patterns but incorporates significant new learning challenges, such as representing and mining complete NSPs. Accordingly, we formalize the LNEC-based NSP mining problem and propose a novel vertical NSP mining framework, VM-NSP, to efficiently mine the complete set of NSPs by a vertical representation (VR) of each sequence. An efficient bitmap-based vertical NSP mining algorithm, bM-NSP, introduces a bitmap hash table--based VR and a prefix-based negative sequential candidate generation strategy to optimize the discovery performance. VM-NSP and its implementation bM-NSP form the first VR-based approach for complete NSP mining with LNEC. Theoretical analyses and experiments confirm the performance superiority of bM-NSP on synthetic and real-life datasets w.r.t. diverse data factors, which substantially expands existing NSP mining methods toward flexible NSP discovery.
{"title":"VM-NSP","authors":"Wei Wang, Longbing Cao","doi":"10.1145/3440874","DOIUrl":"https://doi.org/10.1145/3440874","url":null,"abstract":"Negative sequential patterns (NSPs) capture more informative and actionable knowledge than classic positive sequential patterns (PSPs) due to the involvement of both occurring and nonoccurring behaviors and events, which can contribute to many relevant applications. However, NSP mining is nontrivial, as it involves fundamental challenges requiring distinct theoretical foundations and is not directly addressable by PSP mining. In the very limited research reported on NSP mining, a negative element constraint (NEC) is incorporated to only consider the NSPs composed of specific forms of elements (containing either positive or negative items), which results in many valuable NSPs being missed. Here, we loosen the NEC (called loose negative element constraint (LNEC)) to include partial negative elements containing both positive and negative items, which enables the discovery of more flexible patterns but incorporates significant new learning challenges, such as representing and mining complete NSPs. Accordingly, we formalize the LNEC-based NSP mining problem and propose a novel vertical NSP mining framework, VM-NSP, to efficiently mine the complete set of NSPs by a vertical representation (VR) of each sequence. An efficient bitmap-based vertical NSP mining algorithm, bM-NSP, introduces a bitmap hash table--based VR and a prefix-based negative sequential candidate generation strategy to optimize the discovery performance. VM-NSP and its implementation bM-NSP form the first VR-based approach for complete NSP mining with LNEC. Theoretical analyses and experiments confirm the performance superiority of bM-NSP on synthetic and real-life datasets w.r.t. diverse data factors, which substantially expands existing NSP mining methods toward flexible NSP discovery.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"25 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90352334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequential recommendation, such as next-basket recommender systems (NBRS), which model users’ sequential behaviors and the relevant context/session, has recently attracted much attention from the research community. Existing session-based NBRS involve session representation and inter-basket relations but ignore their hybrid couplings with the intra-basket items, often producing irrelevant or similar items in the next basket. In addition, they do not predict next-baskets (more than one next basket recommended). Interactive recommendation further involves user feedback on the recommended basket. The existing work on next-item recommendation involves positive feedback on selected items but ignores negative feedback on unselected ones. Here, we introduce a new setting—interactive sequential basket recommendation, which iteratively predicts next baskets by learning the intra-/inter-basket couplings between items and both positive and negative user feedback on recommended baskets. A hierarchical attentive encoder-decoder model (HAEM) continuously recommends next baskets one after another during sequential interactions with users after analyzing the item relations both within a basket and between adjacent sequential baskets (i.e., intra-/inter-basket couplings) and incorporating the user selection and unselection (i.e., positive/negative) feedback on the recommended baskets to refine NBRS. HAEM comprises a basket encoder and a sequence decoder to model intra-/inter-basket couplings and a prediction decoder to sequentially predict next-baskets by interactive feedback-based refinement. Empirical analysis shows that HAEM significantly outperforms the state-of-the-art baselines for NBRS and session-based recommenders for accurate and novel recommendation. We also show the effect of continuously refining sequential basket recommendation by including unselection feedback during interactive recommendation.
{"title":"Interactive Sequential Basket Recommendation by Learning Basket Couplings and Positive/Negative Feedback","authors":"Wei Wang, Longbing Cao","doi":"10.1145/3444368","DOIUrl":"https://doi.org/10.1145/3444368","url":null,"abstract":"Sequential recommendation, such as next-basket recommender systems (NBRS), which model users’ sequential behaviors and the relevant context/session, has recently attracted much attention from the research community. Existing session-based NBRS involve session representation and inter-basket relations but ignore their hybrid couplings with the intra-basket items, often producing irrelevant or similar items in the next basket. In addition, they do not predict next-baskets (more than one next basket recommended). Interactive recommendation further involves user feedback on the recommended basket. The existing work on next-item recommendation involves positive feedback on selected items but ignores negative feedback on unselected ones. Here, we introduce a new setting—interactive sequential basket recommendation, which iteratively predicts next baskets by learning the intra-/inter-basket couplings between items and both positive and negative user feedback on recommended baskets. A hierarchical attentive encoder-decoder model (HAEM) continuously recommends next baskets one after another during sequential interactions with users after analyzing the item relations both within a basket and between adjacent sequential baskets (i.e., intra-/inter-basket couplings) and incorporating the user selection and unselection (i.e., positive/negative) feedback on the recommended baskets to refine NBRS. HAEM comprises a basket encoder and a sequence decoder to model intra-/inter-basket couplings and a prediction decoder to sequentially predict next-baskets by interactive feedback-based refinement. Empirical analysis shows that HAEM significantly outperforms the state-of-the-art baselines for NBRS and session-based recommenders for accurate and novel recommendation. We also show the effect of continuously refining sequential basket recommendation by including unselection feedback during interactive recommendation.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"7 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87317530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the dramatic expansion of international markets, consumers write reviews in different languages, which poses a new challenge for Recommender Systems (RSs) dealing with this increasing amount of multilingual information. Recent studies that leverage deep-learning techniques for review-aware RSs have demonstrated their effectiveness in modelling fine-grained user-item interactions through the aspects of reviews. However, most of these models can neither take full advantage of the contextual information from multilingual reviews nor discriminate the inherent ambiguity of words originated from the user’s different tendency in writing. To this end, we propose a novel Multilingual Review-aware Deep Recommendation Model (MrRec) for rating prediction tasks. MrRec mainly consists of two parts: (1) Multilingual aspect-based sentiment analysis module (MABSA), which aims to jointly extract aligned aspects and their associated sentiments in different languages simultaneously with only requiring overall review ratings. (2) Multilingual recommendation module that learns aspect importances of both the user and item with considering different contributions of multiple languages and estimates aspect utility via a dual interactive attention mechanism integrated with aspect-specific sentiments from MABSA. Finally, overall ratings can be inferred by a prediction layer adopting the aspect utility value and aspect importance as inputs. Extensive experimental results on nine real-world datasets demonstrate the superior performance and interpretability of our model.
{"title":"Multilingual Review-aware Deep Recommender System via Aspect-based Sentiment Analysis","authors":"Peng Liu, Lemei Zhang, J. Gulla","doi":"10.1145/3432049","DOIUrl":"https://doi.org/10.1145/3432049","url":null,"abstract":"With the dramatic expansion of international markets, consumers write reviews in different languages, which poses a new challenge for Recommender Systems (RSs) dealing with this increasing amount of multilingual information. Recent studies that leverage deep-learning techniques for review-aware RSs have demonstrated their effectiveness in modelling fine-grained user-item interactions through the aspects of reviews. However, most of these models can neither take full advantage of the contextual information from multilingual reviews nor discriminate the inherent ambiguity of words originated from the user’s different tendency in writing. To this end, we propose a novel Multilingual Review-aware Deep Recommendation Model (MrRec) for rating prediction tasks. MrRec mainly consists of two parts: (1) Multilingual aspect-based sentiment analysis module (MABSA), which aims to jointly extract aligned aspects and their associated sentiments in different languages simultaneously with only requiring overall review ratings. (2) Multilingual recommendation module that learns aspect importances of both the user and item with considering different contributions of multiple languages and estimates aspect utility via a dual interactive attention mechanism integrated with aspect-specific sentiments from MABSA. Finally, overall ratings can be inferred by a prediction layer adopting the aspect utility value and aspect importance as inputs. Extensive experimental results on nine real-world datasets demonstrate the superior performance and interpretability of our model.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"80 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2021-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85518261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Adomavicius, J. Bockstedt, S. Curley, Jingjing Zhang
Prior research has shown a robust effect of personalized product recommendations on user preference judgments for items. Specifically, the display of system-predicted preference ratings as item recommendations has been shown in multiple studies to bias users’ preference ratings after item consumption in the direction of the predicted rating. Top-N lists represent another common approach for presenting item recommendations in recommender systems. Through three controlled laboratory experiments, we show that top-N lists do not induce a discernible bias in user preference judgments. This result is robust, holding for both lists of personalized item recommendations and lists of items that are top-rated based on averages of aggregate user ratings. Adding numerical ratings to the list items does generate a bias, consistent with earlier studies. Thus, in contexts where preference biases are of concern to an online retailer or platform, top-N lists, without numerical predicted ratings, would be a promising format for displaying item recommendations.
{"title":"Effects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings","authors":"G. Adomavicius, J. Bockstedt, S. Curley, Jingjing Zhang","doi":"10.1145/3430028","DOIUrl":"https://doi.org/10.1145/3430028","url":null,"abstract":"Prior research has shown a robust effect of personalized product recommendations on user preference judgments for items. Specifically, the display of system-predicted preference ratings as item recommendations has been shown in multiple studies to bias users’ preference ratings after item consumption in the direction of the predicted rating. Top-N lists represent another common approach for presenting item recommendations in recommender systems. Through three controlled laboratory experiments, we show that top-N lists do not induce a discernible bias in user preference judgments. This result is robust, holding for both lists of personalized item recommendations and lists of items that are top-rated based on averages of aggregate user ratings. Adding numerical ratings to the list items does generate a bias, consistent with earlier studies. Thus, in contexts where preference biases are of concern to an online retailer or platform, top-N lists, without numerical predicted ratings, would be a promising format for displaying item recommendations.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"19 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75781742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Aliannejadi, Hamed Zamani, F. Crestani, W. Bruce Croft
Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users’ lives. This article addresses two research problems that are vital for developing effective personal mobile assistants: target apps selection and recommendation. The former is the key component of a unified mobile search system: a system that addresses the users’ information needs for all the apps installed on their devices with a unified mode of access. The latter, instead, predicts the next apps that the users would want to launch. Here we focus on context-aware models to leverage the rich contextual information available to mobile devices. We design an in situ study to collect thousands of mobile queries enriched with mobile sensor data (now publicly available for research purposes). With the aid of this dataset, we study the user behavior in the context of these tasks and propose a family of context-aware neural models that take into account the sequential, temporal, and personal behavior of users. We study several state-of-the-art models and show that the proposed models significantly outperform the baselines.
{"title":"Context-aware Target Apps Selection and Recommendation for Enhancing Personal Mobile Assistants","authors":"Mohammad Aliannejadi, Hamed Zamani, F. Crestani, W. Bruce Croft","doi":"10.1145/3447678","DOIUrl":"https://doi.org/10.1145/3447678","url":null,"abstract":"Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users’ lives. This article addresses two research problems that are vital for developing effective personal mobile assistants: target apps selection and recommendation. The former is the key component of a unified mobile search system: a system that addresses the users’ information needs for all the apps installed on their devices with a unified mode of access. The latter, instead, predicts the next apps that the users would want to launch. Here we focus on context-aware models to leverage the rich contextual information available to mobile devices. We design an in situ study to collect thousands of mobile queries enriched with mobile sensor data (now publicly available for research purposes). With the aid of this dataset, we study the user behavior in the context of these tasks and propose a family of context-aware neural models that take into account the sequential, temporal, and personal behavior of users. We study several state-of-the-art models and show that the proposed models significantly outperform the baselines.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"38 1","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2021-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81994469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryen W. White, E. Nouri, James Woffinden-Luey, Mark J. Encarnación, S. Jauhar
Information systems, such as task management applications and digital assistants, can help people keep track of tasks of different types and different time durations, ranging from a few minutes to days or weeks. Helping people better manage their tasks and their time are core capabilities of assistive technologies, situated within a broader context of supporting more effective information access and use. Throughout the course of a day, there are typically many short time periods of downtime (e.g., five minutes or less) available to individuals. Microtasks are simple tasks that can be tackled in such short amounts of time. Identifying microtasks in task lists could help people utilize these periods of low activity to make progress on their task backlog. We define actionable tasks as self-contained tasks that need to be completed or acted on. However, not all to-do tasks are actionable. Many task lists are collections of miscellaneous items that can be completed at any time (e.g., books to read, movies to watch), notes (e.g., names, addresses), or the individual items are constituents in a list that is itself a task (e.g., a grocery list). In this article, we introduce the novel challenge of microtask detection, and we present machine-learned models for automatically determining which tasks are actionable and which of these actionable tasks are microtasks. Experiments show that our models can accurately identify actionable tasks, accurately detect actionable microtasks, and that we can combine these models to generate a solution that scales microtask detection to all tasks. We discuss our findings in detail, along with their limitations. These findings have implications for the design of systems to help people make the most of their time.
{"title":"Microtask Detection","authors":"Ryen W. White, E. Nouri, James Woffinden-Luey, Mark J. Encarnación, S. Jauhar","doi":"10.1145/3432290","DOIUrl":"https://doi.org/10.1145/3432290","url":null,"abstract":"Information systems, such as task management applications and digital assistants, can help people keep track of tasks of different types and different time durations, ranging from a few minutes to days or weeks. Helping people better manage their tasks and their time are core capabilities of assistive technologies, situated within a broader context of supporting more effective information access and use. Throughout the course of a day, there are typically many short time periods of downtime (e.g., five minutes or less) available to individuals. Microtasks are simple tasks that can be tackled in such short amounts of time. Identifying microtasks in task lists could help people utilize these periods of low activity to make progress on their task backlog. We define actionable tasks as self-contained tasks that need to be completed or acted on. However, not all to-do tasks are actionable. Many task lists are collections of miscellaneous items that can be completed at any time (e.g., books to read, movies to watch), notes (e.g., names, addresses), or the individual items are constituents in a list that is itself a task (e.g., a grocery list). In this article, we introduce the novel challenge of microtask detection, and we present machine-learned models for automatically determining which tasks are actionable and which of these actionable tasks are microtasks. Experiments show that our models can accurately identify actionable tasks, accurately detect actionable microtasks, and that we can combine these models to generate a solution that scales microtask detection to all tasks. We discuss our findings in detail, along with their limitations. These findings have implications for the design of systems to help people make the most of their time.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"11 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74067627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We critically re-examine the Saerens-Latinne-Decaestecker (SLD) algorithm, a well-known method for estimating class prior probabilities (“priors”) and adjusting posterior probabilities (“posteriors”) in scenarios characterized by distribution shift, i.e., difference in the distribution of the priors between the training and the unlabelled documents. Given a machine learned classifier and a set of unlabelled documents for which the classifier has returned posterior probabilities and estimates of the prior probabilities, SLD updates them both in an iterative, mutually recursive way, with the goal of making both more accurate; this is of key importance in downstream tasks such as single-label multiclass classification and cost-sensitive text classification. Since its publication, SLD has become the standard algorithm for improving the quality of the posteriors in the presence of distribution shift, and SLD is still considered a top contender when we need to estimate the priors (a task that has become known as “quantification”). However, its real effectiveness in improving the quality of the posteriors has been questioned. We here present the results of systematic experiments conducted on a large, publicly available dataset, across multiple amounts of distribution shift and multiple learners. Our experiments show that SLD improves the quality of the posterior probabilities and of the estimates of the prior probabilities, but only when the number of classes in the classification scheme is very small and the classifier is calibrated. As the number of classes grows, or as we use non-calibrated classifiers, SLD converges more slowly (and often does not converge at all), performance degrades rapidly, and the impact of SLD on the quality of the prior estimates and of the posteriors becomes negative rather than positive.
我们批判性地重新审视了saerens - latin - decaestecker (SLD)算法,这是一种在分布移位(即训练和未标记文档之间的先验分布差异)的情况下估计类先验概率(“先验”)和调整后验概率(“后验”)的著名方法。给定一个机器学习分类器和一组未标记的文档,其中分类器已经返回后验概率和先验概率的估计,SLD以迭代,相互递归的方式更新它们,目的是使两者更准确;这在诸如单标签多类分类和成本敏感文本分类等下游任务中至关重要。自发表以来,SLD已经成为在存在分布移位的情况下提高后验质量的标准算法,并且当我们需要估计先验(一项被称为“量化”的任务)时,SLD仍然被认为是首选的竞争者。然而,它在提高后壁质量方面的真正有效性一直受到质疑。我们在这里展示了在一个大型的、公开可用的数据集上进行的系统实验的结果,该数据集跨越了多个分布位移量和多个学习器。我们的实验表明,SLD提高了后验概率和先验概率估计的质量,但只有在分类方案中的类数非常小并且分类器经过校准的情况下。随着类数量的增长,或者当我们使用未校准的分类器时,SLD收敛得更慢(通常根本不收敛),性能迅速下降,并且SLD对先前估计和后验质量的影响变为负的而不是正的。
{"title":"A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment","authors":"Andrea Esuli, Alessio Molinari, F. Sebastiani","doi":"10.1145/3433164","DOIUrl":"https://doi.org/10.1145/3433164","url":null,"abstract":"We critically re-examine the Saerens-Latinne-Decaestecker (SLD) algorithm, a well-known method for estimating class prior probabilities (“priors”) and adjusting posterior probabilities (“posteriors”) in scenarios characterized by distribution shift, i.e., difference in the distribution of the priors between the training and the unlabelled documents. Given a machine learned classifier and a set of unlabelled documents for which the classifier has returned posterior probabilities and estimates of the prior probabilities, SLD updates them both in an iterative, mutually recursive way, with the goal of making both more accurate; this is of key importance in downstream tasks such as single-label multiclass classification and cost-sensitive text classification. Since its publication, SLD has become the standard algorithm for improving the quality of the posteriors in the presence of distribution shift, and SLD is still considered a top contender when we need to estimate the priors (a task that has become known as “quantification”). However, its real effectiveness in improving the quality of the posteriors has been questioned. We here present the results of systematic experiments conducted on a large, publicly available dataset, across multiple amounts of distribution shift and multiple learners. Our experiments show that SLD improves the quality of the posterior probabilities and of the estimates of the prior probabilities, but only when the number of classes in the classification scheme is very small and the classifier is calibrated. As the number of classes grows, or as we use non-calibrated classifiers, SLD converges more slowly (and often does not converge at all), performance degrades rapidly, and the impact of SLD on the quality of the prior estimates and of the posteriors becomes negative rather than positive.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"59 1","pages":"1 - 34"},"PeriodicalIF":0.0,"publicationDate":"2020-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86988785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We examine the “goodness” of ranked retrieval evaluation measures in terms of how well they align with users’ Search Engine Result Page (SERP) preferences for web search. The SERP preferences cover 1,127 topic-SERP-SERP triplets extracted from the NTCIR-9 INTENT task, reflecting the views of 15 different assessors. Each assessor made two SERP preference judgements for each triplet: one in terms of relevance and the other in terms of diversity. For each evaluation measure, we compute the Agreement Rate (AR) of each triplet: the proportion of assessors that agree with the measure’s SERP preference. We then compare the mean ARs of the measures as well as those of best/median/worst assessors using Tukey HSD tests. Our first experiment compares traditional ranked retrieval measures based on the SERP relevance preferences: we find that normalised Discounted Cumulative Gain (nDCG) and intentwise Rank-biased Utility (iRBU) perform best in that they are the only measures that are statistically indistinguishable from our best assessor; nDCG also statistically significantly outperforms our median assessor. Our second experiment utilises 119,646 document preferences that we collected for a subset of the above topic-SERP-SERP triplets (containing 894 triplets) to compare preference-based evaluation measures as well as traditional ones. Again, we evaluate them based on the SERP relevance preferences. The results suggest that measures such as wpref5 are the most promising among the preference-based measures considered, although they underperform the best traditional measures such as nDCG on average. Our third experiment compares diversified search measures based on the SERP diversity preferences as well as the SERP relevance preferences, and it shows that D♯-measures are clearly the most reliable: in particular, D♯-nDCG and D♯-RBP statistically significantly outperform the median assessor and all intent-aware measures; they also outperform the recently proposed RBU on average. Also, in terms of agreement with SERP diversity preferences, D♯-nDCG statistically significantly outperforms RBU. Hence, if IR researchers want to use evaluation measures that align well with users’ SERP preferences, then we recommend nDCG and iRBU for traditional search, and D♯-measures such as D♯-nDCG for diversified search. As for document preference-based measures that we have examined, we do not have a strong reason to recommended them over traditional measures like nDCG, since they align slightly less well with users’ SERP preferences despite their quadratic assessment cost.
{"title":"Retrieval Evaluation Measures that Agree with Users’ SERP Preferences","authors":"T. Sakai, Zhaohao Zeng","doi":"10.1145/3431813","DOIUrl":"https://doi.org/10.1145/3431813","url":null,"abstract":"We examine the “goodness” of ranked retrieval evaluation measures in terms of how well they align with users’ Search Engine Result Page (SERP) preferences for web search. The SERP preferences cover 1,127 topic-SERP-SERP triplets extracted from the NTCIR-9 INTENT task, reflecting the views of 15 different assessors. Each assessor made two SERP preference judgements for each triplet: one in terms of relevance and the other in terms of diversity. For each evaluation measure, we compute the Agreement Rate (AR) of each triplet: the proportion of assessors that agree with the measure’s SERP preference. We then compare the mean ARs of the measures as well as those of best/median/worst assessors using Tukey HSD tests. Our first experiment compares traditional ranked retrieval measures based on the SERP relevance preferences: we find that normalised Discounted Cumulative Gain (nDCG) and intentwise Rank-biased Utility (iRBU) perform best in that they are the only measures that are statistically indistinguishable from our best assessor; nDCG also statistically significantly outperforms our median assessor. Our second experiment utilises 119,646 document preferences that we collected for a subset of the above topic-SERP-SERP triplets (containing 894 triplets) to compare preference-based evaluation measures as well as traditional ones. Again, we evaluate them based on the SERP relevance preferences. The results suggest that measures such as wpref5 are the most promising among the preference-based measures considered, although they underperform the best traditional measures such as nDCG on average. Our third experiment compares diversified search measures based on the SERP diversity preferences as well as the SERP relevance preferences, and it shows that D♯-measures are clearly the most reliable: in particular, D♯-nDCG and D♯-RBP statistically significantly outperform the median assessor and all intent-aware measures; they also outperform the recently proposed RBU on average. Also, in terms of agreement with SERP diversity preferences, D♯-nDCG statistically significantly outperforms RBU. Hence, if IR researchers want to use evaluation measures that align well with users’ SERP preferences, then we recommend nDCG and iRBU for traditional search, and D♯-measures such as D♯-nDCG for diversified search. As for document preference-based measures that we have examined, we do not have a strong reason to recommended them over traditional measures like nDCG, since they align slightly less well with users’ SERP preferences despite their quadratic assessment cost.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"407 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2020-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84871815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}