Can machines trace human knowledge like humans? Knowledge tracing (KT) is a fundamental task in a wide range of applications in education, such as massive open online courses (MOOCs), intelligent tutoring systems, educational games, and learning management systems. It models dynamics in a student's knowledge states in relation to different learning concepts through their interactions with learning activities. Recently, several attempts have been made to use deep learning models for tackling the KT problem. Although these deep learning models have shown promising results, they have limitations: either lack the ability to go deeper to trace how specific concepts in a knowledge state are mastered by a student, or fail to capture long-term dependencies in an exercise sequence. In this paper, we address these limitations by proposing a novel deep learning model for knowledge tracing, namely Sequential Key-Value Memory Networks (SKVMN). This model unifies the strengths of recurrent modelling capacity and memory capacity of the existing deep learning KT models for modelling student learning. We have extensively evaluated our proposed model on five benchmark datasets. The experimental results show that (1) SKVMN outperforms the state-of-the-art KT models on all datasets, (2) SKVMN can better discover the correlation between latent concepts and questions, and (3) SKVMN can trace the knowledge state of students dynamics, and a leverage sequential dependencies in an exercise sequence for improved predication accuracy.
{"title":"Knowledge Tracing with Sequential Key-Value Memory Networks","authors":"Ghodai M. Abdelrahman, Qing Wang","doi":"10.1145/3331184.3331195","DOIUrl":"https://doi.org/10.1145/3331184.3331195","url":null,"abstract":"Can machines trace human knowledge like humans? Knowledge tracing (KT) is a fundamental task in a wide range of applications in education, such as massive open online courses (MOOCs), intelligent tutoring systems, educational games, and learning management systems. It models dynamics in a student's knowledge states in relation to different learning concepts through their interactions with learning activities. Recently, several attempts have been made to use deep learning models for tackling the KT problem. Although these deep learning models have shown promising results, they have limitations: either lack the ability to go deeper to trace how specific concepts in a knowledge state are mastered by a student, or fail to capture long-term dependencies in an exercise sequence. In this paper, we address these limitations by proposing a novel deep learning model for knowledge tracing, namely Sequential Key-Value Memory Networks (SKVMN). This model unifies the strengths of recurrent modelling capacity and memory capacity of the existing deep learning KT models for modelling student learning. We have extensively evaluated our proposed model on five benchmark datasets. The experimental results show that (1) SKVMN outperforms the state-of-the-art KT models on all datasets, (2) SKVMN can better discover the correlation between latent concepts and questions, and (3) SKVMN can trace the knowledge state of students dynamics, and a leverage sequential dependencies in an exercise sequence for improved predication accuracy.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75499182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 2A: Question Answering","authors":"C. Shah","doi":"10.1145/3349678","DOIUrl":"https://doi.org/10.1145/3349678","url":null,"abstract":"","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78401389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, Xiaoyu Zhu
Estimating click-through rate (CTR) accurately has an essential impact on improving user experience and revenue in sponsored search. For CTR prediction model, it is necessary to make out user's real-time search intention. Most of the current work is to mine their intentions based on users' real-time behaviors. However, it is difficult to capture the intention when user behaviors are sparse, causing thebehavior sparsity problem. Moreover, it is difficult for user to jump out of their specific historical behaviors for possible interest exploration, namelyweak generalization problem. We propose a new approach Graph Intention Network (GIN) based on co-occurrence commodity graph to mine user intention. By adopting multi-layered graph diffusion, GIN enriches user behaviors to solve the behavior sparsity problem. By introducing co-occurrence relationship of commodities to explore the potential preferences, the weak generalization problem is also alleviated. To the best of our knowledge, the GIN method is the first to introduce graph learning for user intention mining in CTR prediction and propose end-to-end joint training of graph learning and CTR prediction tasks in sponsored search. At present, GIN has achieved excellent offline results on the real-world data of the e-commerce platform outperforming existing deep learning models, and has been running stable tests online and achieved significant CTR improvements.
{"title":"Graph Intention Network for Click-through Rate Prediction in Sponsored Search","authors":"Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, Xiaoyu Zhu","doi":"10.1145/3331184.3331283","DOIUrl":"https://doi.org/10.1145/3331184.3331283","url":null,"abstract":"Estimating click-through rate (CTR) accurately has an essential impact on improving user experience and revenue in sponsored search. For CTR prediction model, it is necessary to make out user's real-time search intention. Most of the current work is to mine their intentions based on users' real-time behaviors. However, it is difficult to capture the intention when user behaviors are sparse, causing thebehavior sparsity problem. Moreover, it is difficult for user to jump out of their specific historical behaviors for possible interest exploration, namelyweak generalization problem. We propose a new approach Graph Intention Network (GIN) based on co-occurrence commodity graph to mine user intention. By adopting multi-layered graph diffusion, GIN enriches user behaviors to solve the behavior sparsity problem. By introducing co-occurrence relationship of commodities to explore the potential preferences, the weak generalization problem is also alleviated. To the best of our knowledge, the GIN method is the first to introduce graph learning for user intention mining in CTR prediction and propose end-to-end joint training of graph learning and CTR prediction tasks in sponsored search. At present, GIN has achieved excellent offline results on the real-world data of the e-commerce platform outperforming existing deep learning models, and has been running stable tests online and achieved significant CTR improvements.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"146 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77660718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also intuitive explanations of the results for users or system designers, which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc. This is even more important in personalized search and recommendation scenarios, where users would like to know why a particular product, web page, news report, or friend suggestion exists in his or her own search and recommendation lists. The tutorial focuses on the research and application of explainable recommendation and search algorithms, as well as their application in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating explainable recommendation and search methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.
{"title":"SIGIR 2019 Tutorial on Explainable Recommendation and Search","authors":"Yongfeng Zhang, Jiaxin Mao, Qingyao Ai","doi":"10.1145/3331184.3331390","DOIUrl":"https://doi.org/10.1145/3331184.3331390","url":null,"abstract":"Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also intuitive explanations of the results for users or system designers, which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc. This is even more important in personalized search and recommendation scenarios, where users would like to know why a particular product, web page, news report, or friend suggestion exists in his or her own search and recommendation lists. The tutorial focuses on the research and application of explainable recommendation and search algorithms, as well as their application in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating explainable recommendation and search methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80220252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User preferences in the form of absolute feedback, s.a., ratings, are widely exploited in Recommender Systems (RSs). Recent research has explored the usage of preferences expressed with pairwise comparisons, which signal relative feedback. It has been shown that pairwise comparisons can be effectively combined with ratings, but, it is important to fine tune the technique that leverages both types of feedback. Previous approaches train a single model by converting ratings into pairwise comparisons, and then use only that type of data. However, we claim that these two types of preferences reveal different information about users interests and should be exploited differently. Hence, in this work, we develop a ranking technique that separately exploits absolute and relative preferences in a hybrid model. In particular, we propose a joint loss function which is computed on both absolute and relative preferences of users. Our proposed ranking model uses pairwise comparisons data to predict the user's preference order between pairs of items and uses ratings to push high rated (relevant) items to the top of the ranking. Experimental results on three different data sets demonstrate that the proposed technique outperforms competitive baseline algorithms on popular ranking-oriented evaluation metrics.
{"title":"Item Recommendation by Combining Relative and Absolute Feedback Data","authors":"Saikishore Kalloori, Tianyu Li, F. Ricci","doi":"10.1145/3331184.3331295","DOIUrl":"https://doi.org/10.1145/3331184.3331295","url":null,"abstract":"User preferences in the form of absolute feedback, s.a., ratings, are widely exploited in Recommender Systems (RSs). Recent research has explored the usage of preferences expressed with pairwise comparisons, which signal relative feedback. It has been shown that pairwise comparisons can be effectively combined with ratings, but, it is important to fine tune the technique that leverages both types of feedback. Previous approaches train a single model by converting ratings into pairwise comparisons, and then use only that type of data. However, we claim that these two types of preferences reveal different information about users interests and should be exploited differently. Hence, in this work, we develop a ranking technique that separately exploits absolute and relative preferences in a hybrid model. In particular, we propose a joint loss function which is computed on both absolute and relative preferences of users. Our proposed ranking model uses pairwise comparisons data to predict the user's preference order between pairs of items and uses ratings to push high rated (relevant) items to the top of the ranking. Experimental results on three different data sets demonstrate that the proposed technique outperforms competitive baseline algorithms on popular ranking-oriented evaluation metrics.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"521 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79020049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hrishikesh Ganu, Mithun Ghosh, Freddy Jose, S. Roshan
We describe a human-in-the loop system - AgentBuddy, that is helping Intuit improve the quality of search it offers to its internal Customer Care Agents (CCAs). AgentBuddy aims to reduce the cognitive effort on part of the CCAs while at the same time boosting the quality of our legacy federated search system. Under the hood, it leverages bandit algorithms to improve federated search and other ML models like LDA, Siamese networks to help CCAs zero in on high quality search results. An intuitive UI designed ground up working with the users (CCAs) is another key feature of the system. AgentBuddy has been deployed internally and initial results from User Acceptance Trials indicate a 4x lift in quality of highlights compared to the incumbent system.
{"title":"AgentBuddy: an IR System based on Bandit Algorithms to Reduce Cognitive Load for Customer Care Agents","authors":"Hrishikesh Ganu, Mithun Ghosh, Freddy Jose, S. Roshan","doi":"10.1145/3331184.3331408","DOIUrl":"https://doi.org/10.1145/3331184.3331408","url":null,"abstract":"We describe a human-in-the loop system - AgentBuddy, that is helping Intuit improve the quality of search it offers to its internal Customer Care Agents (CCAs). AgentBuddy aims to reduce the cognitive effort on part of the CCAs while at the same time boosting the quality of our legacy federated search system. Under the hood, it leverages bandit algorithms to improve federated search and other ML models like LDA, Siamese networks to help CCAs zero in on high quality search results. An intuitive UI designed ground up working with the users (CCAs) is another key feature of the system. AgentBuddy has been deployed internally and initial results from User Acceptance Trials indicate a 4x lift in quality of highlights compared to the incumbent system.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84955967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunqiu Shao, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma
Compared to general web search, image search engines present results in a significantly different way, which leads to changes in user behavior patterns, and thus creates challenges for the existing evaluation mechanisms. In this paper, we pay attention to the context factor in the image search scenario. On the basis of a mean-variance analysis, we investigate the effects of context and find that evaluation metrics align with user satisfaction better when the returned image results have high variance. Furthermore, assuming that the image results a user has examined might affect her following judgments, we propose the Context-Aware Gain (CAG), a novel evaluation metric that incorporates the contextual effects within the well-known gain-discount framework. Our experiment results show that, with a proper combination of discount functions, the proposed context-aware evaluation metric can significantly improve the performances of offline metrics for image search evaluation, considering user satisfaction as the golden standard.
{"title":"Towards Context-Aware Evaluation for Image Search","authors":"Yunqiu Shao, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma","doi":"10.1145/3331184.3331343","DOIUrl":"https://doi.org/10.1145/3331184.3331343","url":null,"abstract":"Compared to general web search, image search engines present results in a significantly different way, which leads to changes in user behavior patterns, and thus creates challenges for the existing evaluation mechanisms. In this paper, we pay attention to the context factor in the image search scenario. On the basis of a mean-variance analysis, we investigate the effects of context and find that evaluation metrics align with user satisfaction better when the returned image results have high variance. Furthermore, assuming that the image results a user has examined might affect her following judgments, we propose the Context-Aware Gain (CAG), a novel evaluation metric that incorporates the contextual effects within the well-known gain-discount framework. Our experiment results show that, with a proper combination of discount functions, the proposed context-aware evaluation metric can significantly improve the performances of offline metrics for image search evaluation, considering user satisfaction as the golden standard.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85005093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In test collection based evaluation of IR systems, score standardization has been proposed to compare systems across collections and minimize the effect of outlier runs on specific topics. The underlying idea is to account for the difficulty of topics, so that systems are scored relative to it. Webber et al. first proposed standardization through a non-linear transformation with the standard normal distribution, and recently Sakai proposed a simple linear transformation. In this paper, we show that both approaches are actually special cases of a simple standardization which assumes specific distributions for the per-topic scores. From this viewpoint, we argue that a transformation based on the empirical distribution is the most appropriate choice for this kind of standardization. Through a series of experiments on TREC data, we show the benefits of our proposal in terms of score stability and statistical test behavior.
{"title":"A New Perspective on Score Standardization","authors":"Julián Urbano, Harlley Lima, A. Hanjalic","doi":"10.1145/3331184.3331315","DOIUrl":"https://doi.org/10.1145/3331184.3331315","url":null,"abstract":"In test collection based evaluation of IR systems, score standardization has been proposed to compare systems across collections and minimize the effect of outlier runs on specific topics. The underlying idea is to account for the difficulty of topics, so that systems are scored relative to it. Webber et al. first proposed standardization through a non-linear transformation with the standard normal distribution, and recently Sakai proposed a simple linear transformation. In this paper, we show that both approaches are actually special cases of a simple standardization which assumes specific distributions for the per-topic scores. From this viewpoint, we argue that a transformation based on the empirical distribution is the most appropriate choice for this kind of standardization. Through a series of experiments on TREC data, we show the benefits of our proposal in terms of score stability and statistical test behavior.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76632062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When used to assess the accuracy of system rankings, Kendall's tau and other rank correlation measures conflate bias and variance as sources of error. We derive from tau a distance between rankings in Euclidean space, from which we can determine the magnitude of bias, variance, and error. Using bootstrap estimation, we show that shallow pooling has substantially higher bias and insubstantially lower variance than probability-proportional-to-size sampling, coupled with the recently released dynAP estimator.
{"title":"Quantifying Bias and Variance of System Rankings","authors":"G. Cormack, Maura R. Grossman","doi":"10.1145/3331184.3331356","DOIUrl":"https://doi.org/10.1145/3331184.3331356","url":null,"abstract":"When used to assess the accuracy of system rankings, Kendall's tau and other rank correlation measures conflate bias and variance as sources of error. We derive from tau a distance between rankings in Euclidean space, from which we can determine the magnitude of bias, variance, and error. Using bootstrap estimation, we show that shallow pooling has substantially higher bias and insubstantially lower variance than probability-proportional-to-size sampling, coupled with the recently released dynAP estimator.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"739 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81978024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work describes an estimator from which unbiased measurements of precision, rank-biased precision, and cumulative gain may be derived from a uniform or non-uniform sample of relevance assessments. Adversarial testing supports the theory that our estimator yields unbiased low-variance measurements from sparse samples, even when used to measure results that are qualitatively different from those returned by known information retrieval methods. Our results suggest that test collections using sampling to select documents for relevance assessment yield more accurate measurements than test collections using pooling, especially for the results of retrieval methods not contributing to the pool.
{"title":"Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures","authors":"G. Cormack, Maura R. Grossman","doi":"10.1145/3331184.3331355","DOIUrl":"https://doi.org/10.1145/3331184.3331355","url":null,"abstract":"This work describes an estimator from which unbiased measurements of precision, rank-biased precision, and cumulative gain may be derived from a uniform or non-uniform sample of relevance assessments. Adversarial testing supports the theory that our estimator yields unbiased low-variance measurements from sparse samples, even when used to measure results that are qualitatively different from those returned by known information retrieval methods. Our results suggest that test collections using sampling to select documents for relevance assessment yield more accurate measurements than test collections using pooling, especially for the results of retrieval methods not contributing to the pool.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80782033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}