F. M. Nardini, Roberto Trani, Rossano Venturini, F. M. Nardini, Roberto Trani
Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.
{"title":"Fast Filtering of Search Results Sorted by Attribute","authors":"F. M. Nardini, Roberto Trani, Rossano Venturini, F. M. Nardini, Roberto Trani","doi":"10.1145/3477982","DOIUrl":"https://doi.org/10.1145/3477982","url":null,"abstract":"Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"26 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75900500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The task of personalized product search aims at retrieving a ranked list of products given a user’s input query and his/her purchase history. To address this task, we propose the PSAM model, a Personalized, Sequential, Attentive and Metric-aware (PSAM) model, that learns the semantic representations of three different categories of entities, i.e., users, queries, and products, based on user sequential purchase historical data and the corresponding sequential queries. Specifically, a query-based attentive LSTM (QA-LSTM) model and an attention mechanism are designed to infer users dynamic embeddings, which is able to capture their short-term and long-term preferences. To obtain more fine-grained embeddings of the three categories of entities, a metric-aware objective is deployed in our model to force the inferred embeddings subject to the triangle inequality, which is a more realistic distance measurement for product search. Experiments conducted on four benchmark datasets show that our PSAM model significantly outperforms the state-of-the-art product search baselines in terms of effectiveness by up to 50.9% improvement under NDCG@20. Our visualization experiments further illustrate that the learned product embeddings are able to distinguish different types of products.
{"title":"Personalized, Sequential, Attentive, Metric-Aware Product Search","authors":"Yaoxin Pan, Shangsong Liang, Jiaxin Ren, Zaiqiao Meng, Qiang Zhang","doi":"10.1145/3473337","DOIUrl":"https://doi.org/10.1145/3473337","url":null,"abstract":"The task of personalized product search aims at retrieving a ranked list of products given a user’s input query and his/her purchase history. To address this task, we propose the PSAM model, a Personalized, Sequential, Attentive and Metric-aware (PSAM) model, that learns the semantic representations of three different categories of entities, i.e., users, queries, and products, based on user sequential purchase historical data and the corresponding sequential queries. Specifically, a query-based attentive LSTM (QA-LSTM) model and an attention mechanism are designed to infer users dynamic embeddings, which is able to capture their short-term and long-term preferences. To obtain more fine-grained embeddings of the three categories of entities, a metric-aware objective is deployed in our model to force the inferred embeddings subject to the triangle inequality, which is a more realistic distance measurement for product search. Experiments conducted on four benchmark datasets show that our PSAM model significantly outperforms the state-of-the-art product search baselines in terms of effectiveness by up to 50.9% improvement under NDCG@20. Our visualization experiments further illustrate that the learned product embeddings are able to distinguish different types of products.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"2 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80691027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we study the task of user profiling in question answering communities (QACs). Previous user profiling algorithms suffer from a number of defects: they regard users and words as atomic units, leading to the mismatch between them; they are designed for other applications but not for QACs; and some semantic profiling algorithms do not co-embed users and words, leading to making the affinity measurement between them difficult. To improve the profiling performance, we propose a neural Flow-based Constrained Co-embedding Model, abbreviated as FCCM. FCCM jointly co-embeds the vector representations of both users and words in QACs such that the affinities between them can be semantically measured. Specifically, FCCM extends the standard variational auto-encoder model to enforce the inferred embeddings of users and words subject to the voting constraint, i.e., given a question and the users who answer this question in the community, representations of the users whose answers receive more votes are closer to the representations of the words associated with these answers, compared with representations of whose receiving fewer votes. In addition, FCCM integrates normalizing flow into the variational auto-encoder framework to avoid the assumption that the distributions of the embeddings are Gaussian, making the inferred embeddings fit the real distributions of the data better. Experimental results on a Chinese Zhihu question answering dataset demonstrate the effectiveness of our proposed FCCM model for the task of user profiling in QACs.
{"title":"Profiling Users for Question Answering Communities via Flow-Based Constrained Co-Embedding Model","authors":"Shangsong Liang, Yupeng Luo, Zaiqiao Meng","doi":"10.1145/3470565","DOIUrl":"https://doi.org/10.1145/3470565","url":null,"abstract":"In this article, we study the task of user profiling in question answering communities (QACs). Previous user profiling algorithms suffer from a number of defects: they regard users and words as atomic units, leading to the mismatch between them; they are designed for other applications but not for QACs; and some semantic profiling algorithms do not co-embed users and words, leading to making the affinity measurement between them difficult. To improve the profiling performance, we propose a neural Flow-based Constrained Co-embedding Model, abbreviated as FCCM. FCCM jointly co-embeds the vector representations of both users and words in QACs such that the affinities between them can be semantically measured. Specifically, FCCM extends the standard variational auto-encoder model to enforce the inferred embeddings of users and words subject to the voting constraint, i.e., given a question and the users who answer this question in the community, representations of the users whose answers receive more votes are closer to the representations of the words associated with these answers, compared with representations of whose receiving fewer votes. In addition, FCCM integrates normalizing flow into the variational auto-encoder framework to avoid the assumption that the distributions of the embeddings are Gaussian, making the inferred embeddings fit the real distributions of the data better. Experimental results on a Chinese Zhihu question answering dataset demonstrate the effectiveness of our proposed FCCM model for the task of user profiling in QACs.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"447 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91001067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Citation count prediction is an important task for estimating the future impact of research papers. Most of the existing works utilize the information extracted from the paper itself. In this article, we focus on how to utilize another kind of useful data signal (i.e., peer review text) to improve both the performance and interpretability of the prediction models. Specially, we propose a novel aspect-aware capsule network for citation count prediction based on review text. It contains two major capsule layers, namely the feature capsule layer and the aspect capsule layer, with two different routing approaches, respectively. Feature capsules encode the local semantics from review sentences as the input of aspect capsule layer, whereas aspect capsules aim to capture high-level semantic features that will be served as final representations for prediction. Besides the predictive capacity, we also enhance the model interpretability with two strategies. First, we use the topic distribution of the review text to guide the learning of aspect capsules so that each aspect capsule can represent a specific aspect in the review. Then, we use the learned aspect capsules to generate readable text for explaining the predicted citation count. Extensive experiments on two real-world datasets have demonstrated the effectiveness of the proposed model in both performance and interpretability.
{"title":"Interpretable Aspect-Aware Capsule Network for Peer Review Based Citation Count Prediction","authors":"Siqing Li, Yaliang Li, Wayne Xin Zhao, Bolin Ding, Ji-rong Wen","doi":"10.1145/3466640","DOIUrl":"https://doi.org/10.1145/3466640","url":null,"abstract":"Citation count prediction is an important task for estimating the future impact of research papers. Most of the existing works utilize the information extracted from the paper itself. In this article, we focus on how to utilize another kind of useful data signal (i.e., peer review text) to improve both the performance and interpretability of the prediction models. Specially, we propose a novel aspect-aware capsule network for citation count prediction based on review text. It contains two major capsule layers, namely the feature capsule layer and the aspect capsule layer, with two different routing approaches, respectively. Feature capsules encode the local semantics from review sentences as the input of aspect capsule layer, whereas aspect capsules aim to capture high-level semantic features that will be served as final representations for prediction. Besides the predictive capacity, we also enhance the model interpretability with two strategies. First, we use the topic distribution of the review text to guide the learning of aspect capsules so that each aspect capsule can represent a specific aspect in the review. Then, we use the learned aspect capsules to generate readable text for explaining the predicted citation count. Extensive experiments on two real-world datasets have demonstrated the effectiveness of the proposed model in both performance and interpretability.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"51 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80401688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dan Li, Tong Xu, Peilun Zhou, Weidong He, Y. Hao, Yi Zheng, Enhong Chen
Person search has long been treated as a crucial and challenging task to support deeper insight in personalized summarization and personality discovery. Traditional methods, e.g., person re-identification and face recognition techniques, which profile video characters based on visual information, are often limited by relatively fixed poses or small variation of viewpoints and suffer from more realistic scenes with high motion complexity (e.g., movies). At the same time, long videos such as movies often have logical story lines and are composed of continuously developmental plots. In this situation, different persons usually meet on a specific occasion, in which informative social cues are performed. We notice that these social cues could semantically profile their personality and benefit person search task in two aspects. First, persons with certain relationships usually co-occur in short intervals; in case one of them is easier to be identified, the social relation cues extracted from their co-occurrences could further benefit the identification for the harder ones. Second, social relations could reveal the association between certain scenes and characters (e.g., classmate relationship may only exist among students), which could narrow down candidates into certain persons with a specific relationship. In this way, high-level social relation cues could improve the effectiveness of person search. Along this line, in this article, we propose a social context-aware framework, which fuses visual and social contexts to profile persons in more semantic perspectives and better deal with person search task in complex scenarios. Specifically, we first segment videos into several independent scene units and abstract out social contexts within these scene units. Then, we construct inner-personal links through a graph formulation operation for each scene unit, in which both visual cues and relation cues are considered. Finally, we perform a relation-aware label propagation to identify characters’ occurrences, combining low-level semantic cues (i.e., visual cues) and high-level semantic cues (i.e., relation cues) to further enhance the accuracy. Experiments on real-world datasets validate that our solution outperforms several competitive baselines.
{"title":"Social Context-aware Person Search in Videos via Multi-modal Cues","authors":"Dan Li, Tong Xu, Peilun Zhou, Weidong He, Y. Hao, Yi Zheng, Enhong Chen","doi":"10.1145/3480967","DOIUrl":"https://doi.org/10.1145/3480967","url":null,"abstract":"Person search has long been treated as a crucial and challenging task to support deeper insight in personalized summarization and personality discovery. Traditional methods, e.g., person re-identification and face recognition techniques, which profile video characters based on visual information, are often limited by relatively fixed poses or small variation of viewpoints and suffer from more realistic scenes with high motion complexity (e.g., movies). At the same time, long videos such as movies often have logical story lines and are composed of continuously developmental plots. In this situation, different persons usually meet on a specific occasion, in which informative social cues are performed. We notice that these social cues could semantically profile their personality and benefit person search task in two aspects. First, persons with certain relationships usually co-occur in short intervals; in case one of them is easier to be identified, the social relation cues extracted from their co-occurrences could further benefit the identification for the harder ones. Second, social relations could reveal the association between certain scenes and characters (e.g., classmate relationship may only exist among students), which could narrow down candidates into certain persons with a specific relationship. In this way, high-level social relation cues could improve the effectiveness of person search. Along this line, in this article, we propose a social context-aware framework, which fuses visual and social contexts to profile persons in more semantic perspectives and better deal with person search task in complex scenarios. Specifically, we first segment videos into several independent scene units and abstract out social contexts within these scene units. Then, we construct inner-personal links through a graph formulation operation for each scene unit, in which both visual cues and relation cues are considered. Finally, we perform a relation-aware label propagation to identify characters’ occurrences, combining low-level semantic cues (i.e., visual cues) and high-level semantic cues (i.e., relation cues) to further enhance the accuracy. Experiments on real-world datasets validate that our solution outperforms several competitive baselines.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"91 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75120961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Complex user behavior, especially in settings such as social media, can be organized as time-evolving networks. Through network embedding, we can extract general-purpose vector representations of these dynamic networks which allow us to analyze them without extensive feature engineering. Prior work has shown how to generate network embeddings while preserving the structural role proximity of nodes. These methods, however, cannot capture the temporal evolution of the structural identity of the nodes in dynamic networks. Other works, on the other hand, have focused on learning microscopic dynamic embeddings. Though these methods can learn node representations over dynamic networks, these representations capture the local context of nodes and do not learn the structural roles of nodes. In this article, we propose a novel method for learning structural node embeddings in discrete-time dynamic networks. Our method, called HR2vec, tracks historical topology information in dynamic networks to learn dynamic structural role embeddings. Through experiments on synthetic and real-world temporal datasets, we show that our method outperforms other well-known methods in tasks where structural equivalence and historical information both play important roles. HR2vec can be used to model dynamic user behavior in any networked setting where users can be represented as nodes. Additionally, we propose a novel method (called network fingerprinting) that uses HR2vec embeddings for modeling whole (or partial) time-evolving networks. We showcase our network fingerprinting method on synthetic and real-world networks. Specifically, we demonstrate how our method can be used for detecting foreign-backed information operations on Twitter.
{"title":"Dynamic Structural Role Node Embedding for User Modeling in Evolving Networks","authors":"Lili Wang, Chenghan Huang, Ying Lu, Weicheng Ma, Ruibo Liu, Soroush Vosoughi","doi":"10.1145/3472955","DOIUrl":"https://doi.org/10.1145/3472955","url":null,"abstract":"Complex user behavior, especially in settings such as social media, can be organized as time-evolving networks. Through network embedding, we can extract general-purpose vector representations of these dynamic networks which allow us to analyze them without extensive feature engineering. Prior work has shown how to generate network embeddings while preserving the structural role proximity of nodes. These methods, however, cannot capture the temporal evolution of the structural identity of the nodes in dynamic networks. Other works, on the other hand, have focused on learning microscopic dynamic embeddings. Though these methods can learn node representations over dynamic networks, these representations capture the local context of nodes and do not learn the structural roles of nodes. In this article, we propose a novel method for learning structural node embeddings in discrete-time dynamic networks. Our method, called HR2vec, tracks historical topology information in dynamic networks to learn dynamic structural role embeddings. Through experiments on synthetic and real-world temporal datasets, we show that our method outperforms other well-known methods in tasks where structural equivalence and historical information both play important roles. HR2vec can be used to model dynamic user behavior in any networked setting where users can be represented as nodes. Additionally, we propose a novel method (called network fingerprinting) that uses HR2vec embeddings for modeling whole (or partial) time-evolving networks. We showcase our network fingerprinting method on synthetic and real-world networks. Specifically, we demonstrate how our method can be used for detecting foreign-backed information operations on Twitter.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"1 1","pages":"1 - 21"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78542822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Personalized search tailors document ranking lists for each individual user based on her interests and query intent to better satisfy the user’s information need. Many personalized search models have been proposed. They first build a user interest profile from the user’s search history, and then re-rank the documents based on the personalized matching scores between the created profile and candidate documents. In this article, we attempt to solve the personalized search problem from an alternative perspective of clarifying the user’s intention of the current query. We know that there are many ambiguous words in natural language such as “Apple.” People with different knowledge backgrounds and interests have personalized understandings of these words. Therefore, we propose a personalized search model with personal word embeddings for each individual user that mainly contain the word meanings that the user already knows and can reflect the user interests. To learn great personal word embeddings, we design a pre-training model that captures both the textual information of the query log and the information about user interests contained in the click-through data represented as a graph structure. With personal word embeddings, we obtain the personalized word and context-aware representations of the query and documents. Furthermore, we also employ the current session as the short-term search context to dynamically disambiguate the current query. Finally, we use a matching model to calculate the matching score between the personalized query and document representations for ranking. Experimental results on two large-scale query logs show that our designed model significantly outperforms state-of-the-art personalization models.
{"title":"Clarifying Ambiguous Keywords with Personal Word Embeddings for Personalized Search","authors":"Jing Yao, Zhicheng Dou, Ji-rong Wen","doi":"10.1145/3470564","DOIUrl":"https://doi.org/10.1145/3470564","url":null,"abstract":"Personalized search tailors document ranking lists for each individual user based on her interests and query intent to better satisfy the user’s information need. Many personalized search models have been proposed. They first build a user interest profile from the user’s search history, and then re-rank the documents based on the personalized matching scores between the created profile and candidate documents. In this article, we attempt to solve the personalized search problem from an alternative perspective of clarifying the user’s intention of the current query. We know that there are many ambiguous words in natural language such as “Apple.” People with different knowledge backgrounds and interests have personalized understandings of these words. Therefore, we propose a personalized search model with personal word embeddings for each individual user that mainly contain the word meanings that the user already knows and can reflect the user interests. To learn great personal word embeddings, we design a pre-training model that captures both the textual information of the query log and the information about user interests contained in the click-through data represented as a graph structure. With personal word embeddings, we obtain the personalized word and context-aware representations of the query and documents. Furthermore, we also employ the current session as the short-term search context to dynamically disambiguate the current query. Finally, we use a matching model to calculate the matching score between the personalized query and document representations for ranking. Experimental results on two large-scale query logs show that our designed model significantly outperforms state-of-the-art personalization models.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"8 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90373644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Chen, Lei Zhu, Ronghui Xu, Yang Liu, Xiaohui Yu, Yilong Yin
Venue categories used in location-based social networks often exhibit a hierarchical structure, together with the category sequences derived from users’ check-ins. The two data modalities provide a wealth of information for us to capture the semantic relationships between those categories. To understand the venue semantics, existing methods usually embed venue categories into low-dimensional spaces by modeling the linear context (i.e., the positional neighbors of the given category) in check-in sequences. However, the hierarchical structure of venue categories, which inherently encodes the relationships between categories, is largely untapped. In this article, we propose a venue Category Embedding Model named Hier-CEM, which generates a latent representation for each venue category by embedding the Hierarchical structure of categories and utilizing multiple types of context. Specifically, we investigate two kinds of hierarchical context based on any given venue category hierarchy and show how to model them together with the linear context collaboratively. We apply Hier-CEM to three tasks on two real check-in datasets collected from Foursquare. Experimental results show that Hier-CEM is better at capturing both semantic and sequential information inherent in venues than state-of-the-art embedding methods.
{"title":"Embedding Hierarchical Structures for Venue Category Representation","authors":"Meng Chen, Lei Zhu, Ronghui Xu, Yang Liu, Xiaohui Yu, Yilong Yin","doi":"10.1145/3478285","DOIUrl":"https://doi.org/10.1145/3478285","url":null,"abstract":"Venue categories used in location-based social networks often exhibit a hierarchical structure, together with the category sequences derived from users’ check-ins. The two data modalities provide a wealth of information for us to capture the semantic relationships between those categories. To understand the venue semantics, existing methods usually embed venue categories into low-dimensional spaces by modeling the linear context (i.e., the positional neighbors of the given category) in check-in sequences. However, the hierarchical structure of venue categories, which inherently encodes the relationships between categories, is largely untapped. In this article, we propose a venue Category Embedding Model named Hier-CEM, which generates a latent representation for each venue category by embedding the Hierarchical structure of categories and utilizing multiple types of context. Specifically, we investigate two kinds of hierarchical context based on any given venue category hierarchy and show how to model them together with the linear context collaboratively. We apply Hier-CEM to three tasks on two real check-in datasets collected from Foursquare. Experimental results show that Hier-CEM is better at capturing both semantic and sequential information inherent in venues than state-of-the-art embedding methods.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"1 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90916116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ameer Albahem, Damiano Spina, Falk Scholer, L. Cavedon
In many search scenarios, such as exploratory, comparative, or survey-oriented search, users interact with dynamic search systems to satisfy multi-aspect information needs. These systems utilize different dynamic approaches that exploit various user feedback granularity types. Although studies have provided insights about the role of many components of these systems, they used black-box and isolated experimental setups. Therefore, the effects of these components or their interactions are still not well understood. We address this by following a methodology based on Analysis of Variance (ANOVA). We built a Grid Of Points that consists of systems based on different ways to instantiate three components: initial rankers, dynamic rerankers, and user feedback granularity. Using evaluation scores based on the TREC Dynamic Domain collections, we built several ANOVA models to estimate the effects. We found that (i) although all components significantly affect search effectiveness, the initial ranker has the largest effective size, (ii) the effect sizes of these components vary based on the length of the search session and the used effectiveness metric, and (iii) initial rankers and dynamic rerankers have more prominent effects than user feedback granularity. To improve effectiveness, we recommend improving the quality of initial rankers and dynamic rerankers. This does not require eliciting detailed user feedback, which might be expensive or invasive.
{"title":"Component-based Analysis of Dynamic Search Performance","authors":"Ameer Albahem, Damiano Spina, Falk Scholer, L. Cavedon","doi":"10.1145/3483237","DOIUrl":"https://doi.org/10.1145/3483237","url":null,"abstract":"In many search scenarios, such as exploratory, comparative, or survey-oriented search, users interact with dynamic search systems to satisfy multi-aspect information needs. These systems utilize different dynamic approaches that exploit various user feedback granularity types. Although studies have provided insights about the role of many components of these systems, they used black-box and isolated experimental setups. Therefore, the effects of these components or their interactions are still not well understood. We address this by following a methodology based on Analysis of Variance (ANOVA). We built a Grid Of Points that consists of systems based on different ways to instantiate three components: initial rankers, dynamic rerankers, and user feedback granularity. Using evaluation scores based on the TREC Dynamic Domain collections, we built several ANOVA models to estimate the effects. We found that (i) although all components significantly affect search effectiveness, the initial ranker has the largest effective size, (ii) the effect sizes of these components vary based on the length of the search session and the used effectiveness metric, and (iii) initial rankers and dynamic rerankers have more prominent effects than user feedback granularity. To improve effectiveness, we recommend improving the quality of initial rankers and dynamic rerankers. This does not require eliciting detailed user feedback, which might be expensive or invasive.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"283 1","pages":"1 - 47"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83105366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The broad adoption of electronic health records (EHRs) has led to vast amounts of data being accumulated on a patient’s history, diagnosis, prescriptions, and lab tests. Advances in recommender technologies have the potential to utilize this information to help doctors personalize the prescribed medications. However, existing medication recommendation systems have yet to make use of all these information sources in a seamless manner, and they do not provide a justification on why a particular medication is recommended. In this work, we design a two-stage personalized medication recommender system called PREMIER that incorporates information from the EHR. We utilize the various weights in the system to compute the contributions from the information sources for the recommended medications. Our system models the drug interaction from an external drug database and the drug co-occurrence from the EHR as graphs. Experiment results on MIMIC-III and a proprietary outpatient dataset show that PREMIER outperforms state-of-the-art medication recommendation systems while achieving the best tradeoff between accuracy and drug-drug interaction. Case studies demonstrate that the justifications provided by PREMIER are appropriate and aligned to clinical practices.
{"title":"Personalizing Medication Recommendation with a Graph-Based Approach","authors":"Suman Bhoi, M. Lee, W. Hsu, A. Fang, N. Tan","doi":"10.1145/3488668","DOIUrl":"https://doi.org/10.1145/3488668","url":null,"abstract":"The broad adoption of electronic health records (EHRs) has led to vast amounts of data being accumulated on a patient’s history, diagnosis, prescriptions, and lab tests. Advances in recommender technologies have the potential to utilize this information to help doctors personalize the prescribed medications. However, existing medication recommendation systems have yet to make use of all these information sources in a seamless manner, and they do not provide a justification on why a particular medication is recommended. In this work, we design a two-stage personalized medication recommender system called PREMIER that incorporates information from the EHR. We utilize the various weights in the system to compute the contributions from the information sources for the recommended medications. Our system models the drug interaction from an external drug database and the drug co-occurrence from the EHR as graphs. Experiment results on MIMIC-III and a proprietary outpatient dataset show that PREMIER outperforms state-of-the-art medication recommendation systems while achieving the best tradeoff between accuracy and drug-drug interaction. Case studies demonstrate that the justifications provided by PREMIER are appropriate and aligned to clinical practices.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"107 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77424457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}