C. Chen, Min Zhang, Weizhi Ma, Yiqun Liu, Shaoping Ma
To provide more accurate recommendation, it is a trending topic to go beyond modeling user-item interactions and take context features into account. Factorization Machines (FM) with negative sampling is a popular solution for context-aware recommendation. However, it is not robust as sampling may lost important information and usually leads to non-optimal performances in practical. Several recent efforts have enhanced FM with deep learning architectures for modelling high-order feature interactions. While they either focus on rating prediction task only, or typically adopt the negative sampling strategy for optimizing the ranking performance. Due to the dramatic fluctuation of sampling, it is reasonable to argue that these sampling-based FM methods are still suboptimal for context-aware recommendation. In this paper, we propose to learn FM without sampling for ranking tasks that helps context-aware recommendation particularly. Despite effectiveness, such a non-sampling strategy presents strong challenge in learning efficiency of the model. Accordingly, we further design a new ideal framework named Efficient Non-Sampling Factorization Machines (ENSFM). ENSFM not only seamlessly connects the relationship between FM and Matrix Factorization (MF), but also resolves the challenging efficiency issue via novel memorization strategies. Through extensive experiments on three real-world public datasets, we show that 1) the proposed ENSFM consistently and significantly outperforms the state-of-the-art methods on context-aware Top-K recommendation, and 2) ENSFM achieves significant advantages in training efficiency, which makes it more applicable to real-world large-scale systems. Moreover, the empirical results indicate that a proper learning method is even more important than advanced neural network structures for Top-K recommendation task. Our implementation has been released 1 to facilitate further developments on efficient non-sampling methods.
{"title":"Efficient Non-Sampling Factorization Machines for Optimal Context-Aware Recommendation","authors":"C. Chen, Min Zhang, Weizhi Ma, Yiqun Liu, Shaoping Ma","doi":"10.1145/3366423.3380303","DOIUrl":"https://doi.org/10.1145/3366423.3380303","url":null,"abstract":"To provide more accurate recommendation, it is a trending topic to go beyond modeling user-item interactions and take context features into account. Factorization Machines (FM) with negative sampling is a popular solution for context-aware recommendation. However, it is not robust as sampling may lost important information and usually leads to non-optimal performances in practical. Several recent efforts have enhanced FM with deep learning architectures for modelling high-order feature interactions. While they either focus on rating prediction task only, or typically adopt the negative sampling strategy for optimizing the ranking performance. Due to the dramatic fluctuation of sampling, it is reasonable to argue that these sampling-based FM methods are still suboptimal for context-aware recommendation. In this paper, we propose to learn FM without sampling for ranking tasks that helps context-aware recommendation particularly. Despite effectiveness, such a non-sampling strategy presents strong challenge in learning efficiency of the model. Accordingly, we further design a new ideal framework named Efficient Non-Sampling Factorization Machines (ENSFM). ENSFM not only seamlessly connects the relationship between FM and Matrix Factorization (MF), but also resolves the challenging efficiency issue via novel memorization strategies. Through extensive experiments on three real-world public datasets, we show that 1) the proposed ENSFM consistently and significantly outperforms the state-of-the-art methods on context-aware Top-K recommendation, and 2) ENSFM achieves significant advantages in training efficiency, which makes it more applicable to real-world large-scale systems. Moreover, the empirical results indicate that a proper learning method is even more important than advanced neural network structures for Top-K recommendation task. Our implementation has been released 1 to facilitate further developments on efficient non-sampling methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"0 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75973456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, sequential recommendation has attracted substantial attention from researchers due to its status as an essential service for e-commerce. Accurately understanding user intention is an important factor to improve the performance of recommendation system. However, user intention is highly time-dependent and flexible, so it is very challenging to learn the latent dynamic intention of users for sequential recommendation. To this end, in this paper, we propose a novel intention modeling from ordered and unordered facets (IMfOU) for sequential recommendation. Specifically, the global and local item embedding (GLIE) we proposed can comprehensively capture the sequential context information in the sequences and highlight the important features that users care about. We further design ordered preference drift learning (OPDL) and unordered purchase motivation learning (UPML) to obtain user’s the process of preference drift and purchase motivation respectively. With combining the users’ dynamic preference and current motivation, it considers not only sequential dependencies between items but also flexible dependencies and models the user purchase intention more accurately from ordered and unordered facets respectively. Evaluation results on three real-world datasets demonstrate that our proposed approach achieves better performance than the state-of-the-art sequential recommendation methods achieving improvement of AUC by an average of 2.26%.
{"title":"Intention Modeling from Ordered and Unordered Facets for Sequential Recommendation","authors":"Xueliang Guo, Chongyang Shi, Chuanming Liu","doi":"10.1145/3366423.3380190","DOIUrl":"https://doi.org/10.1145/3366423.3380190","url":null,"abstract":"Recently, sequential recommendation has attracted substantial attention from researchers due to its status as an essential service for e-commerce. Accurately understanding user intention is an important factor to improve the performance of recommendation system. However, user intention is highly time-dependent and flexible, so it is very challenging to learn the latent dynamic intention of users for sequential recommendation. To this end, in this paper, we propose a novel intention modeling from ordered and unordered facets (IMfOU) for sequential recommendation. Specifically, the global and local item embedding (GLIE) we proposed can comprehensively capture the sequential context information in the sequences and highlight the important features that users care about. We further design ordered preference drift learning (OPDL) and unordered purchase motivation learning (UPML) to obtain user’s the process of preference drift and purchase motivation respectively. With combining the users’ dynamic preference and current motivation, it considers not only sequential dependencies between items but also flexible dependencies and models the user purchase intention more accurately from ordered and unordered facets respectively. Evaluation results on three real-world datasets demonstrate that our proposed approach achieves better performance than the state-of-the-art sequential recommendation methods achieving improvement of AUC by an average of 2.26%.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78489000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanyue Xu, Y. Sheng, Zuobai Zhang, Haibin Kan, Zhongzhi Zhang
The mean hitting time from a node i to a node j selected randomly according to the stationary distribution of random walks is called the Kemeny constant, which has found various applications. It was proved that over all graphs with N vertices, complete graphs have the exact minimum Kemeny constant, growing linearly with N. Here we study numerically or analytically the Kemeny constant on many sparse real-world and model networks with scale-free small-world topology, and show that their Kemeny constant also behaves linearly with N. Thus, sparse networks with scale-free and small-world topology are favorable architectures with optimal scaling of Kemeny constant. We then present a theoretically guaranteed estimation algorithm, which approximates the Kemeny constant for a graph in nearly linear time with respect to the number of edges. Extensive numerical experiments on model and real networks show that our approximation algorithm is both efficient and accurate.
{"title":"Power-Law Graphs Have Minimal Scaling of Kemeny Constant for Random Walks","authors":"Wanyue Xu, Y. Sheng, Zuobai Zhang, Haibin Kan, Zhongzhi Zhang","doi":"10.1145/3366423.3380093","DOIUrl":"https://doi.org/10.1145/3366423.3380093","url":null,"abstract":"The mean hitting time from a node i to a node j selected randomly according to the stationary distribution of random walks is called the Kemeny constant, which has found various applications. It was proved that over all graphs with N vertices, complete graphs have the exact minimum Kemeny constant, growing linearly with N. Here we study numerically or analytically the Kemeny constant on many sparse real-world and model networks with scale-free small-world topology, and show that their Kemeny constant also behaves linearly with N. Thus, sparse networks with scale-free and small-world topology are favorable architectures with optimal scaling of Kemeny constant. We then present a theoretically guaranteed estimation algorithm, which approximates the Kemeny constant for a graph in nearly linear time with respect to the number of edges. Extensive numerical experiments on model and real networks show that our approximation algorithm is both efficient and accurate.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78786104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contextual multi-armed bandit algorithms have received significant attention in modeling users’ preferences for online personalized recommender systems in a timely manner. While significant progress has been made along this direction, a few major challenges have not been well addressed yet: (i) a vast majority of the literature is based on linear models that cannot capture complex non-linear inter-dependencies of user-item interactions; (ii) existing literature mainly ignores the latent relations among users and non-recommended items: hence may not properly reflect users’ preferences in the real-world; (iii) current solutions are mainly based on historical data and are prone to cold-start problems for new users who have no interaction history. To address the above challenges, we develop a Graph Regularized Cross-modal (GRC) learning model, a general framework to exploit transferable knowledge learned from user-item interactions as well as the external features of users and items in online personalized recommendations. In particular, the GRC framework leverage a non-linearity of neural network to model complex inherent structure of user-item interactions. We further augment GRC with the cooperation of the metric learning technique and a graph-constrained embedding module, to map the units from different dimensions (temporal, social and semantic) into the same latent space. An extensive set of experiments are conducted on two benchmark datasets as well as a large scale proprietary dataset from a major search engine demonstrates the power of the proposed GRC model in effectively capturing users’ dynamic preferences under different settings by outperforming all baselines by a large margin.
{"title":"Learning from Cross-Modal Behavior Dynamics with Graph-Regularized Neural Contextual Bandit","authors":"Xian Wu, Suleyman Cetintas, Deguang Kong, Miaoyu Lu, Jian Yang, N. Chawla","doi":"10.1145/3366423.3380178","DOIUrl":"https://doi.org/10.1145/3366423.3380178","url":null,"abstract":"Contextual multi-armed bandit algorithms have received significant attention in modeling users’ preferences for online personalized recommender systems in a timely manner. While significant progress has been made along this direction, a few major challenges have not been well addressed yet: (i) a vast majority of the literature is based on linear models that cannot capture complex non-linear inter-dependencies of user-item interactions; (ii) existing literature mainly ignores the latent relations among users and non-recommended items: hence may not properly reflect users’ preferences in the real-world; (iii) current solutions are mainly based on historical data and are prone to cold-start problems for new users who have no interaction history. To address the above challenges, we develop a Graph Regularized Cross-modal (GRC) learning model, a general framework to exploit transferable knowledge learned from user-item interactions as well as the external features of users and items in online personalized recommendations. In particular, the GRC framework leverage a non-linearity of neural network to model complex inherent structure of user-item interactions. We further augment GRC with the cooperation of the metric learning technique and a graph-constrained embedding module, to map the units from different dimensions (temporal, social and semantic) into the same latent space. An extensive set of experiments are conducted on two benchmark datasets as well as a large scale proprietary dataset from a major search engine demonstrates the power of the proposed GRC model in effectively capturing users’ dynamic preferences under different settings by outperforming all baselines by a large margin.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79906162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Udit Paul, Alexander Ermakov, Michael Nekrasov, V. Adarsh, E. Belding-Royer
Natural disasters are increasing worldwide at an alarming rate. To aid relief operations during and post disaster, humanitarian organizations rely on various types of situational information such as missing, trapped or injured people and damaged infrastructure in an area. Crucial and timely identification of infrastructure and utility damage is critical to properly plan and execute search and rescue operations. However, in the wake of natural disasters, real-time identification of this information becomes challenging. In this research, we investigate the use of tweets posted on the Twitter social media platform to detect power and communication outages during natural disasters. We first curate a data set of 18,097 tweets based on domain-specific keywords obtained using Latent Dirichlet Allocation. We annotate the gathered data set to separate the tweets into different types of outage-related events: power outage, communication outage and both power-communication outage. We analyze the tweets to identify information such as popular words, length of words and hashtags as well as sentiments that are associated with tweets in these outage-related categories. Furthermore, we apply machine learning algorithms to classify these tweets into their respective categories. Our results show that simple classifiers such as the boosting algorithm are able to classify outage related tweets from unrelated tweets with close to 100% f1-score. Additionally, we observe that the transfer learning model, BERT, is able to classify different categories of outage-related tweets with close to 90% accuracy in less than 90 seconds of training and testing time, demonstrating that tweets can be mined in real-time to assist first responders during natural disasters.
{"title":"#Outage: Detecting Power and Communication Outages from Social Networks","authors":"Udit Paul, Alexander Ermakov, Michael Nekrasov, V. Adarsh, E. Belding-Royer","doi":"10.1145/3366423.3380251","DOIUrl":"https://doi.org/10.1145/3366423.3380251","url":null,"abstract":"Natural disasters are increasing worldwide at an alarming rate. To aid relief operations during and post disaster, humanitarian organizations rely on various types of situational information such as missing, trapped or injured people and damaged infrastructure in an area. Crucial and timely identification of infrastructure and utility damage is critical to properly plan and execute search and rescue operations. However, in the wake of natural disasters, real-time identification of this information becomes challenging. In this research, we investigate the use of tweets posted on the Twitter social media platform to detect power and communication outages during natural disasters. We first curate a data set of 18,097 tweets based on domain-specific keywords obtained using Latent Dirichlet Allocation. We annotate the gathered data set to separate the tweets into different types of outage-related events: power outage, communication outage and both power-communication outage. We analyze the tweets to identify information such as popular words, length of words and hashtags as well as sentiments that are associated with tweets in these outage-related categories. Furthermore, we apply machine learning algorithms to classify these tweets into their respective categories. Our results show that simple classifiers such as the boosting algorithm are able to classify outage related tweets from unrelated tweets with close to 100% f1-score. Additionally, we observe that the transfer learning model, BERT, is able to classify different categories of outage-related tweets with close to 90% accuracy in less than 90 seconds of training and testing time, demonstrating that tweets can be mined in real-time to assist first responders during natural disasters.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83573510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rhys Biddle, Aditya Joshi, Shaowu Liu, Cécile Paris, Guandong Xu
Harnessing data from social media to monitor health events is a promising avenue for public health surveillance. A key step is the detection of reports of a disease (referred to as ‘health mention classification’) amongst tweets that mention disease words. Prior work shows that figurative usage of disease words may prove to be challenging for health mention classification. Since the experience of a disease is associated with a negative sentiment, we present a method that utilises sentiment information to improve health mention classification. Specifically, our classifier for health mention classification combines pre-trained contextual word representations with sentiment distributions of words in the tweet. For our experiments, we extend a benchmark dataset of tweets for health mention classification, adding over 14k manually annotated tweets across diseases. We also additionally annotate each tweet with a label that indicates if the disease words are used in a figurative sense. Our classifier outperforms current SOTA approaches in detecting both health-related and figurative tweets that mention disease words. We also show that tweets containing disease words are mentioned figuratively more often than in a health-related context, proving to be challenging for classifiers targeting health-related tweets.
{"title":"Leveraging Sentiment Distributions to Distinguish Figurative From Literal Health Reports on Twitter","authors":"Rhys Biddle, Aditya Joshi, Shaowu Liu, Cécile Paris, Guandong Xu","doi":"10.1145/3366423.3380198","DOIUrl":"https://doi.org/10.1145/3366423.3380198","url":null,"abstract":"Harnessing data from social media to monitor health events is a promising avenue for public health surveillance. A key step is the detection of reports of a disease (referred to as ‘health mention classification’) amongst tweets that mention disease words. Prior work shows that figurative usage of disease words may prove to be challenging for health mention classification. Since the experience of a disease is associated with a negative sentiment, we present a method that utilises sentiment information to improve health mention classification. Specifically, our classifier for health mention classification combines pre-trained contextual word representations with sentiment distributions of words in the tweet. For our experiments, we extend a benchmark dataset of tweets for health mention classification, adding over 14k manually annotated tweets across diseases. We also additionally annotate each tweet with a label that indicates if the disease words are used in a figurative sense. Our classifier outperforms current SOTA approaches in detecting both health-related and figurative tweets that mention disease words. We also show that tweets containing disease words are mentioned figuratively more often than in a health-related context, proving to be challenging for classifiers targeting health-related tweets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81958135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent recommender systems started to use rating elicitation, which asks new users to rate a small seed itemset for inferring their preferences, to improve the quality of initial recommendations. The key challenge of the rating elicitation is to choose the seed items which can best infer the new users’ preference. This paper proposes a novel end-to-end Deep learning framework for Rating Elicitation (DRE), that chooses all the seed items at a time with consideration of the non-linear interactions. To this end, it first defines categorical distributions to sample seed items from the entire itemset, then it trains both the categorical distributions and a neural reconstruction network to infer users’ preferences on the remaining items from CF information of the sampled seed items. Through the end-to-end training, the categorical distributions are learned to select the most representative seed items while reflecting the complex non-linear interactions. Experimental results show that DRE outperforms the state-of-the-art approaches in the recommendation quality by accurately inferring the new users’ preferences and its seed itemset better represents the latent space than the seed itemset obtained by the other methods.
{"title":"Deep Rating Elicitation for New Users in Collaborative Filtering","authors":"Wonbin Kweon, SeongKu Kang, Junyoung Hwang, Hwanjo Yu","doi":"10.1145/3366423.3380042","DOIUrl":"https://doi.org/10.1145/3366423.3380042","url":null,"abstract":"Recent recommender systems started to use rating elicitation, which asks new users to rate a small seed itemset for inferring their preferences, to improve the quality of initial recommendations. The key challenge of the rating elicitation is to choose the seed items which can best infer the new users’ preference. This paper proposes a novel end-to-end Deep learning framework for Rating Elicitation (DRE), that chooses all the seed items at a time with consideration of the non-linear interactions. To this end, it first defines categorical distributions to sample seed items from the entire itemset, then it trains both the categorical distributions and a neural reconstruction network to infer users’ preferences on the remaining items from CF information of the sampled seed items. Through the end-to-end training, the categorical distributions are learned to select the most representative seed items while reflecting the complex non-linear interactions. Experimental results show that DRE outperforms the state-of-the-art approaches in the recommendation quality by accurately inferring the new users’ preferences and its seed itemset better represents the latent space than the seed itemset obtained by the other methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75353863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhihao Wang, Qiang Li, Jinke Song, Haining Wang, Limin Sun
IP-based geolocation is essential for various location-aware Internet applications, such as online advertisement, content delivery, and online fraud prevention. Achieving accurate geolocation enormously relies on the number of high-quality (i.e., the fine-grained and stable over time) landmarks. However, the previous efforts of garnering landmarks have been impeded by the limited visible landmarks on the Internet and manual time cost. In this paper, we leverage the availability of numerous online webcams that are used to monitor physical surroundings as a rich source of promising high-quality landmarks for serving IP-based geolocation. In particular, we present a new framework called GeoCAM, which is designed to automatically generate qualified landmarks from online webcams, providing IP-based geolocation services with high accuracy and wide coverage. GeoCAM periodically monitors websites that are hosting live webcams and uses the natural language processing technique to extract the IP addresses and latitude/longitude of webcams for generating landmarks at large-scale. We develop a prototype of GeoCAM and conduct real-world experiments for validating its efficacy. Our results show that GeoCam can detect 282,902 live webcams hosted in webpages with 94.2% precision and 90.4% recall, and then generate 16,863 stable and fine-grained landmarks, which are two orders of magnitude more than the landmarks used in prior works. Thus, by correlating a large scale of landmarks, GeoCAM is able to provide a geolocation service with high accuracy and wide coverage.
{"title":"Towards IP-based Geolocation via Fine-grained and Stable Webcam Landmarks","authors":"Zhihao Wang, Qiang Li, Jinke Song, Haining Wang, Limin Sun","doi":"10.1145/3366423.3380216","DOIUrl":"https://doi.org/10.1145/3366423.3380216","url":null,"abstract":"IP-based geolocation is essential for various location-aware Internet applications, such as online advertisement, content delivery, and online fraud prevention. Achieving accurate geolocation enormously relies on the number of high-quality (i.e., the fine-grained and stable over time) landmarks. However, the previous efforts of garnering landmarks have been impeded by the limited visible landmarks on the Internet and manual time cost. In this paper, we leverage the availability of numerous online webcams that are used to monitor physical surroundings as a rich source of promising high-quality landmarks for serving IP-based geolocation. In particular, we present a new framework called GeoCAM, which is designed to automatically generate qualified landmarks from online webcams, providing IP-based geolocation services with high accuracy and wide coverage. GeoCAM periodically monitors websites that are hosting live webcams and uses the natural language processing technique to extract the IP addresses and latitude/longitude of webcams for generating landmarks at large-scale. We develop a prototype of GeoCAM and conduct real-world experiments for validating its efficacy. Our results show that GeoCam can detect 282,902 live webcams hosted in webpages with 94.2% precision and 90.4% recall, and then generate 16,863 stable and fine-grained landmarks, which are two orders of magnitude more than the landmarks used in prior works. Thus, by correlating a large scale of landmarks, GeoCAM is able to provide a geolocation service with high accuracy and wide coverage.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89543717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Citing, quoting, and forwarding & commenting behaviors are widely seen in academia, news media, and social media. Existing behavior modeling approaches focused on mining content and describing preferences of authors, speakers, and users. However, behavioral intention plays an important role in generating content on the platforms. In this work, we propose to identify the referential intention which motivates the action of using the referred (e.g., cited, quoted, and retweeted) source and content to support their claims. We adopt a theory in sociology to develop a schema of four types of intentions. The challenge lies in the heterogeneity of observed contextual information surrounding the referential behavior, such as referred content (e.g., a cited paper), local context (e.g., the sentence citing the paper), neighboring context (e.g., the former and latter sentences), and network context (e.g., the academic network of authors, affiliations, and keywords). We propose a new neural framework with Interactive Hierarchical Attention (IHA) to identify the intention of referential behavior by properly aggregating the heterogeneous contexts. Experiments demonstrate that the proposed method can effectively identify the type of intention of citing behaviors (on academic data) and retweeting behaviors (on Twitter). And learning the heterogeneous contexts collectively can improve the performance. This work opens a door for understanding content generation from a fundamental perspective of behavior sciences.
{"title":"Identifying Referential Intention with Heterogeneous Contexts","authors":"W. Yu, Mengxia Yu, Tong Zhao, Meng Jiang","doi":"10.1145/3366423.3380175","DOIUrl":"https://doi.org/10.1145/3366423.3380175","url":null,"abstract":"Citing, quoting, and forwarding & commenting behaviors are widely seen in academia, news media, and social media. Existing behavior modeling approaches focused on mining content and describing preferences of authors, speakers, and users. However, behavioral intention plays an important role in generating content on the platforms. In this work, we propose to identify the referential intention which motivates the action of using the referred (e.g., cited, quoted, and retweeted) source and content to support their claims. We adopt a theory in sociology to develop a schema of four types of intentions. The challenge lies in the heterogeneity of observed contextual information surrounding the referential behavior, such as referred content (e.g., a cited paper), local context (e.g., the sentence citing the paper), neighboring context (e.g., the former and latter sentences), and network context (e.g., the academic network of authors, affiliations, and keywords). We propose a new neural framework with Interactive Hierarchical Attention (IHA) to identify the intention of referential behavior by properly aggregating the heterogeneous contexts. Experiments demonstrate that the proposed method can effectively identify the type of intention of citing behaviors (on academic data) and retweeting behaviors (on Twitter). And learning the heterogeneous contexts collectively can improve the performance. This work opens a door for understanding content generation from a fundamental perspective of behavior sciences.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"144 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77583268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoya Chong, Qing Li, Howard Leung, Qianhui Men, Xianjin Chao
Personalized recommendation aims at ranking a set of items according to the learnt preferences of the user. Existing methods optimize the ranking function by considering an item that the user has not bought yet as a negative item and assuming that the user prefers the positive item that he has bought to the negative item. The strategy is to exclude irrelevant items from the dataset to narrow down the set of potential positive items to improve ranking accuracy. It conflicts with the goal of recommendation from the seller’s point of view, which aims to enlarge that set for each user. In this paper, we diminish this limitation by proposing a novel learning method called Hierarchical Visual-aware Minimax Ranking (H-VMMR), in which a new concept of predictive sampling is proposed to sample items in a close relationship with the positive items (e.g., substitutes, compliments). We set up the problem by maximizing the preference discrepancy between positive and negative items, as well as minimizing the gap between positive and predictive items based on visual features. We also build a hierarchical learning model based on co-purchase data to solve the data sparsity problem. Our method is able to enlarge the set of potential positive items as well as true negative items during ranking. The experimental results show that our H-VMMR outperforms the state-of-the-art learning methods.
{"title":"Hierarchical Visual-aware Minimax Ranking Based on Co-purchase Data for Personalized Recommendation","authors":"Xiaoya Chong, Qing Li, Howard Leung, Qianhui Men, Xianjin Chao","doi":"10.1145/3366423.3380007","DOIUrl":"https://doi.org/10.1145/3366423.3380007","url":null,"abstract":"Personalized recommendation aims at ranking a set of items according to the learnt preferences of the user. Existing methods optimize the ranking function by considering an item that the user has not bought yet as a negative item and assuming that the user prefers the positive item that he has bought to the negative item. The strategy is to exclude irrelevant items from the dataset to narrow down the set of potential positive items to improve ranking accuracy. It conflicts with the goal of recommendation from the seller’s point of view, which aims to enlarge that set for each user. In this paper, we diminish this limitation by proposing a novel learning method called Hierarchical Visual-aware Minimax Ranking (H-VMMR), in which a new concept of predictive sampling is proposed to sample items in a close relationship with the positive items (e.g., substitutes, compliments). We set up the problem by maximizing the preference discrepancy between positive and negative items, as well as minimizing the gap between positive and predictive items based on visual features. We also build a hierarchical learning model based on co-purchase data to solve the data sparsity problem. Our method is able to enlarge the set of potential positive items as well as true negative items during ranking. The experimental results show that our H-VMMR outperforms the state-of-the-art learning methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77685321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}