Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining最新文献

A General View for Network Embedding as Matrix Factorization 网络嵌入作为矩阵分解的一般观点

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3291029

Xin Liu, T. Murata, Kyoung-Sook Kim, Chatchawan Kotarasu, Chenyi Zhuang

We propose a general view that demonstrates the relationship between network embedding approaches and matrix factorization. Unlike previous works that present the equivalence for the approaches from a skip-gram model perspective, we provide a more fundamental connection from an optimization (objective function) perspective. We demonstrate that matrix factorization is equivalent to optimizing two objectives: one is for bringing together the embeddings of similar nodes; the other is for separating the embeddings of distant nodes. The matrix to be factorized has a general form: S-β. The elements of $mathbfS $ indicate pairwise node similarities. They can be based on any user-defined similarity/distance measure or learned from random walks on networks. The shift number β is related to a parameter that balances the two objectives. More importantly, the resulting embeddings are sensitive to β and we can improve the embeddings by tuning β. Experiments show that matrix factorization based on a new proposed similarity measure and β-tuning strategy significantly outperforms existing matrix factorization approaches on a range of benchmark networks.

我们提出了一个一般的观点来证明网络嵌入方法和矩阵分解之间的关系。不同于以往从跳跃图模型的角度给出等价的方法，我们从优化(目标函数)的角度提供了更基本的联系。我们证明了矩阵分解相当于优化两个目标:一个是将相似节点的嵌入结合在一起;另一种方法用于分离距离节点的嵌入。要分解的矩阵具有一般形式:S-β。$mathbfS $的元素表示成对节点相似度。它们可以基于任何用户定义的相似性/距离度量，也可以从网络上的随机漫步中学习。移位数β与平衡两个目标的参数有关。更重要的是，所得到的嵌入对β很敏感，我们可以通过调整β来改善嵌入。实验表明，在一系列基准网络上，基于新提出的相似性度量和β调优策略的矩阵分解显著优于现有的矩阵分解方法。

{"title":"A General View for Network Embedding as Matrix Factorization","authors":"Xin Liu, T. Murata, Kyoung-Sook Kim, Chatchawan Kotarasu, Chenyi Zhuang","doi":"10.1145/3289600.3291029","DOIUrl":"https://doi.org/10.1145/3289600.3291029","url":null,"abstract":"We propose a general view that demonstrates the relationship between network embedding approaches and matrix factorization. Unlike previous works that present the equivalence for the approaches from a skip-gram model perspective, we provide a more fundamental connection from an optimization (objective function) perspective. We demonstrate that matrix factorization is equivalent to optimizing two objectives: one is for bringing together the embeddings of similar nodes; the other is for separating the embeddings of distant nodes. The matrix to be factorized has a general form: S-β. The elements of $mathbfS $ indicate pairwise node similarities. They can be based on any user-defined similarity/distance measure or learned from random walks on networks. The shift number β is related to a parameter that balances the two objectives. More importantly, the resulting embeddings are sensitive to β and we can improve the embeddings by tuning β. Experiments show that matrix factorization based on a new proposed similarity measure and β-tuning strategy significantly outperforms existing matrix factorization approaches on a range of benchmark networks.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115002973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

VisCrime: A Crime Visualisation System for Crime Trajectory from Multi-Dimensional Sources VisCrime:一个多维来源的犯罪轨迹可视化系统

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3290617

Ahsan Morshed, Pei-wei Tsai, P. Jayaraman, T. Sellis, Dimitrios Georgakopoulos, Samuel V. S. Burke, Shane Joachim, Ming-Sheng Quah, Stefan Tsvetkov, Jason Liew, C. Jenkins

Open multidimensional data from existing sources and social media often carries insightful information on social issues. With the increase of high volume data and the proliferation of visual analytics platforms, users can more easily interact with and pick out meaningful information from a large dataset. In this paper, we present VisCrime, a system that uses visual analytics to maps out crimes that have occurred in a region/neighbourhood. VisCrime is underpinned by a novel trajectory algorithm that is used to create trajectories from open data sources that reports incidents of crime and data gathered from social media. Our system can be accessed at http://viscrime.ml/deckmap

来自现有来源和社交媒体的开放多维数据通常包含有关社会问题的深刻信息。随着大数据量的增加和可视化分析平台的激增，用户可以更容易地与大数据集交互并从中挑选出有意义的信息。在本文中，我们介绍了VisCrime，这是一个使用可视化分析来绘制区域/社区发生的犯罪地图的系统。VisCrime的基础是一种新的轨迹算法，该算法用于从公开数据源(报告犯罪事件和从社交媒体收集的数据)创建轨迹。我们的系统可以访问http://viscrime.ml/deckmap

引用次数: 4

Relevance Search over Schema-Rich Knowledge Graphs 富模式知识图的相关性搜索

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3290970

Yu Gu, Tianshuo Zhou, Gong Cheng, Ziyang Li, Jeff Z. Pan, Yuzhong Qu

Relevance search over a knowledge graph (KG) has gained much research attention. Given a query entity in a KG, the problem is to find its most relevant entities. However, the relevance function is hidden and dynamic. Different users for different queries may consider relevance from different angles of semantics. The ambiguity in a query is more noticeable in the presence of thousands of types of entities and relations in a schema-rich KG, which has challenged the effectiveness and scalability of existing methods. To meet the challenge, our approach called RelSUE requests a user to provide a small number of answer entities as examples, and then automatically learns the most likely relevance function from these examples. Specifically, we assume the intent of a query can be characterized by a set of meta-paths at the schema level. RelSUE searches a KG for diversified significant meta-paths that best characterize the relevance of the user-provided examples to the query entity. It reduces the large search space of a schema-rich KG using distance and degree-based heuristics, and performs reasoning to deduplicate meta-paths that represent equivalent query-specific semantics. Finally, a linear model is learned to predict meta-path based relevance. Extensive experiments demonstrate that RelSUE outperforms several state-of-the-art methods.

基于知识图谱的关联搜索(KG)已经引起了广泛的关注。给定KG中的查询实体，问题是找到与它最相关的实体。然而，关联函数是隐藏的、动态的。对于不同的查询，不同的用户可能会从不同的语义角度考虑相关性。在模式丰富的KG中存在数千种类型的实体和关系时，查询中的歧义更加明显，这对现有方法的有效性和可扩展性提出了挑战。为了应对这一挑战，我们的方法RelSUE要求用户提供少量的答案实体作为示例，然后从这些示例中自动学习最可能的相关函数。具体来说，我们假设查询的意图可以通过模式级别的一组元路径来表征。relesue在KG中搜索各种重要的元路径，这些元路径最好地描述了用户提供的示例与查询实体的相关性。它使用距离和基于程度的启发式方法减少了模式丰富的KG的巨大搜索空间，并执行推理以去重复表示等价查询特定语义的元路径。最后，学习了一个线性模型来预测基于元路径的相关性。大量的实验表明，relesue优于几种最先进的方法。

{"title":"Relevance Search over Schema-Rich Knowledge Graphs","authors":"Yu Gu, Tianshuo Zhou, Gong Cheng, Ziyang Li, Jeff Z. Pan, Yuzhong Qu","doi":"10.1145/3289600.3290970","DOIUrl":"https://doi.org/10.1145/3289600.3290970","url":null,"abstract":"Relevance search over a knowledge graph (KG) has gained much research attention. Given a query entity in a KG, the problem is to find its most relevant entities. However, the relevance function is hidden and dynamic. Different users for different queries may consider relevance from different angles of semantics. The ambiguity in a query is more noticeable in the presence of thousands of types of entities and relations in a schema-rich KG, which has challenged the effectiveness and scalability of existing methods. To meet the challenge, our approach called RelSUE requests a user to provide a small number of answer entities as examples, and then automatically learns the most likely relevance function from these examples. Specifically, we assume the intent of a query can be characterized by a set of meta-paths at the schema level. RelSUE searches a KG for diversified significant meta-paths that best characterize the relevance of the user-provided examples to the query entity. It reduces the large search space of a schema-rich KG using distance and degree-based heuristics, and performs reasoning to deduplicate meta-paths that represent equivalent query-specific semantics. Finally, a linear model is learned to predict meta-path based relevance. Extensive experiments demonstrate that RelSUE outperforms several state-of-the-art methods.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120972069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Neural Demographic Prediction using Search Query 使用搜索查询的神经人口预测

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3291034

Chuhan Wu, Fangzhao Wu, Junxin Liu, Shaojian He, Yongfeng Huang, Xing Xie

Demographics of online users such as age and gender play an important role in personalized web applications. However, it is difficult to directly obtain the demographic information of online users. Luckily, search queries can cover many online users and the search queries from users with different demographics usually have some difference in contents and writing styles. Thus, search queries can provide useful clues for demographic prediction. In this paper, we study predicting users' demographics based on their search queries, and propose a neural approach for this task. Since search queries can be very noisy and many of them are not useful, instead of combining all queries together for user representation, in our approach we propose a hierarchical user representation with attention (HURA) model to learn informative user representations from their search queries. Our HURA model first learns representations for search queries from words using a word encoder, which consists of a CNN network and a word-level attention network to select important words. Then we learn representations of users based on the representations of their search queries using a query encoder, which contains a CNN network to capture the local contexts of search queries and a query-level attention network to select informative search queries for demographic prediction. Experiments on two real-world datasets validate that our approach can effectively improve the performance of search query based age and gender prediction and consistently outperform many baseline methods.

在线用户的年龄和性别等人口统计数据在个性化web应用程序中起着重要作用。然而，直接获取在线用户的人口统计信息是很困难的。幸运的是，搜索查询可以覆盖许多在线用户，并且来自不同人口统计数据的用户的搜索查询通常在内容和写作风格上有所不同。因此，搜索查询可以为人口统计预测提供有用的线索。在本文中，我们研究了基于用户搜索查询的人口统计预测，并提出了一种神经网络方法。由于搜索查询可能非常嘈杂，而且其中许多是无用的，因此在我们的方法中，我们提出了一个带注意的分层用户表示(HURA)模型，以从他们的搜索查询中学习信息丰富的用户表示，而不是将所有查询组合在一起进行用户表示。我们的HURA模型首先使用单词编码器从单词中学习搜索查询的表示，该编码器由CNN网络和单词级注意网络组成，以选择重要的单词。然后，我们使用查询编码器根据用户搜索查询的表示来学习用户的表示，该编码器包含一个CNN网络来捕获搜索查询的本地上下文，以及一个查询级关注网络来选择信息丰富的搜索查询进行人口统计预测。在两个真实数据集上的实验验证了我们的方法可以有效地提高基于年龄和性别的搜索查询预测的性能，并且始终优于许多基线方法。

{"title":"Neural Demographic Prediction using Search Query","authors":"Chuhan Wu, Fangzhao Wu, Junxin Liu, Shaojian He, Yongfeng Huang, Xing Xie","doi":"10.1145/3289600.3291034","DOIUrl":"https://doi.org/10.1145/3289600.3291034","url":null,"abstract":"Demographics of online users such as age and gender play an important role in personalized web applications. However, it is difficult to directly obtain the demographic information of online users. Luckily, search queries can cover many online users and the search queries from users with different demographics usually have some difference in contents and writing styles. Thus, search queries can provide useful clues for demographic prediction. In this paper, we study predicting users' demographics based on their search queries, and propose a neural approach for this task. Since search queries can be very noisy and many of them are not useful, instead of combining all queries together for user representation, in our approach we propose a hierarchical user representation with attention (HURA) model to learn informative user representations from their search queries. Our HURA model first learns representations for search queries from words using a word encoder, which consists of a CNN network and a word-level attention network to select important words. Then we learn representations of users based on the representations of their search queries using a query encoder, which contains a CNN network to capture the local contexts of search queries and a query-level attention network to select informative search queries for demographic prediction. Experiments on two real-world datasets validate that our approach can effectively improve the performance of search query based age and gender prediction and consistently outperform many baseline methods.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125804683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Solving the Sparsity Problem in Recommendations via Cross-Domain Item Embedding Based on Co-Clustering 基于协同聚类的跨域条目嵌入解决推荐中的稀疏性问题

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3290973

Yaqing Wang, Chunyan Feng, Caili Guo, Yunfei Chu, Jenq-Neng Hwang

Session-based recommendations recently receive much attentions due to no available user data in many cases, e.g., users are not logged-in/tracked. Most session-based methods focus on exploring abundant historical records of anonymous users but ignoring the sparsity problem, where historical data are lacking or are insufficient for items in sessions. In fact, as users' behavior is relevant across domains, information from different domains is correlative, e.g., a user tends to watch related movies in a movie domain, after listening to some movie-themed songs in a music domain (i.e., cross-domain sessions). Therefore, we can learn a complete item description to solve the sparsity problem using complementary information from related domains. In this paper, we propose an innovative method, called Cross-Domain Item Embedding method based on Co-clustering (CDIE-C), to learn cross-domain comprehensive representations of items by collectively leveraging single-domain and cross-domain sessions within a unified framework. We first extract cluster-level correlations across domains using co-clustering and filter out noise. Then, cross-domain items and clusters are embedded into a unified space by jointly capturing item-level sequence information and cluster-level correlative information. Besides, CDIE-C enhances information exchange across domains utilizing three types of relations (i.e., item-to-context-item, item-to-context-co-cluster and co-cluster-to-context-item relations). Finally, we train CDIE-C with two efficient training strategies, i.e., joint training and two-stage training. Empirical results show CDIE-C outperforms the state-of-the-art recommendation methods on three cross-domain datasets and can effectively alleviate the sparsity problem.

基于会话的推荐最近受到了很多关注，因为在许多情况下没有可用的用户数据，例如，用户没有登录/跟踪。大多数基于会话的方法侧重于探索大量匿名用户的历史记录，但忽略了稀疏性问题，即缺少历史数据或会话中的条目不足。实际上，由于用户的行为是跨域相关的，来自不同域的信息是相关的，例如，用户在音乐域中听了一些电影主题的歌曲后，倾向于在电影域中观看相关的电影(即跨域会话)。因此，我们可以学习一个完整的项目描述，利用相关领域的互补信息来解决稀疏性问题。本文提出了一种基于协同聚类的跨域条目嵌入方法(CDIE-C)，通过在统一框架内共同利用单域和跨域会话来学习条目的跨域综合表示。我们首先使用共聚类提取跨域的聚类级相关性并滤除噪声。然后，通过联合捕获项目级序列信息和聚类级相关信息，将跨域项目和聚类嵌入到统一的空间中。此外，CDIE-C利用三种类型的关系(即项目到上下文项目、项目到上下文协同集群和协同集群到上下文项目的关系)增强了跨域的信息交换。最后，我们对CDIE-C进行了两种高效的训练策略，即联合训练和两阶段训练。实证结果表明，CDIE-C在三个跨域数据集上的推荐性能优于当前最先进的推荐方法，可以有效地缓解稀疏性问题。

{"title":"Solving the Sparsity Problem in Recommendations via Cross-Domain Item Embedding Based on Co-Clustering","authors":"Yaqing Wang, Chunyan Feng, Caili Guo, Yunfei Chu, Jenq-Neng Hwang","doi":"10.1145/3289600.3290973","DOIUrl":"https://doi.org/10.1145/3289600.3290973","url":null,"abstract":"Session-based recommendations recently receive much attentions due to no available user data in many cases, e.g., users are not logged-in/tracked. Most session-based methods focus on exploring abundant historical records of anonymous users but ignoring the sparsity problem, where historical data are lacking or are insufficient for items in sessions. In fact, as users' behavior is relevant across domains, information from different domains is correlative, e.g., a user tends to watch related movies in a movie domain, after listening to some movie-themed songs in a music domain (i.e., cross-domain sessions). Therefore, we can learn a complete item description to solve the sparsity problem using complementary information from related domains. In this paper, we propose an innovative method, called Cross-Domain Item Embedding method based on Co-clustering (CDIE-C), to learn cross-domain comprehensive representations of items by collectively leveraging single-domain and cross-domain sessions within a unified framework. We first extract cluster-level correlations across domains using co-clustering and filter out noise. Then, cross-domain items and clusters are embedded into a unified space by jointly capturing item-level sequence information and cluster-level correlative information. Besides, CDIE-C enhances information exchange across domains utilizing three types of relations (i.e., item-to-context-item, item-to-context-co-cluster and co-cluster-to-context-item relations). Finally, we train CDIE-C with two efficient training strategies, i.e., joint training and two-stage training. Empirical results show CDIE-C outperforms the state-of-the-art recommendation methods on three cross-domain datasets and can effectively alleviate the sparsity problem.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114247094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Offline Evaluation to Make Decisions About PlaylistRecommendation Algorithms 关于播放列表推荐算法的离线评估决策

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3291027

Alois Gruson, Praveen Chandar, C. Charbuillet, James McInerney, Samantha Hansen, Damien Tardieu, Ben Carterette

Evaluating algorithmic recommendations is an important, but difficult, problem. Evaluations conducted offline using data collected from user interactions with an online system often suffer from biases arising from the user interface or the recommendation engine. Online evaluation (A/B testing) can more easily address problems of bias, but depending on setting can be time-consuming and incur risk of negatively impacting the user experience, not to mention that it is generally more difficult when access to a large user base is not taken as granted. A compromise based on em counterfactual analysis is to present some subset of online users with recommendation results that have been randomized or otherwise manipulated, log their interactions, and then use those to de-bias offline evaluations on historical data. However, previous work does not offer clear conclusions on how well such methods correlate with and are able to predict the results of online A/B tests. Understanding this is crucial to widespread adoption of new offline evaluation techniques in recommender systems. In this work we present a comparison of offline and online evaluation results for a particular recommendation problem: recommending playlists of tracks to a user looking for music. We describe two different ways to think about de-biasing offline collections for more accurate evaluation. Our results show that, contrary to much of the previous work on this topic, properly-conducted offline experiments do correlate well to A/B test results, and moreover that we can expect an offline evaluation to identify the best candidate systems for online testing with high probability.

评估算法推荐是一个重要但困难的问题。使用从用户与在线系统交互中收集的数据进行离线评估，通常会受到用户界面或推荐引擎产生的偏见的影响。在线评估(A/B测试)可以更容易地解决偏见问题，但取决于设置可能会耗费时间，并产生对用户体验产生负面影响的风险，更不用说当访问大量用户基础不被视为理所当然时，通常会更加困难。基于em反事实分析的一种折衷方法是向一些在线用户子集提供随机化或以其他方式操纵的推荐结果，记录他们的交互，然后使用这些结果来消除对历史数据的离线评估的偏见。然而，之前的工作并没有提供明确的结论，说明这些方法与在线A/B测试的结果有多大的相关性。理解这一点对于在推荐系统中广泛采用新的离线评估技术至关重要。在这项工作中，我们提出了一个特定推荐问题的离线和在线评估结果的比较:向寻找音乐的用户推荐曲目播放列表。我们描述了两种不同的方法来考虑对离线集合进行去偏，以获得更准确的评估。我们的结果表明，与之前关于该主题的大部分工作相反，正确进行的离线实验确实与A/B测试结果有很好的相关性，而且我们可以期望离线评估以高概率确定在线测试的最佳候选系统。

{"title":"Offline Evaluation to Make Decisions About PlaylistRecommendation Algorithms","authors":"Alois Gruson, Praveen Chandar, C. Charbuillet, James McInerney, Samantha Hansen, Damien Tardieu, Ben Carterette","doi":"10.1145/3289600.3291027","DOIUrl":"https://doi.org/10.1145/3289600.3291027","url":null,"abstract":"Evaluating algorithmic recommendations is an important, but difficult, problem. Evaluations conducted offline using data collected from user interactions with an online system often suffer from biases arising from the user interface or the recommendation engine. Online evaluation (A/B testing) can more easily address problems of bias, but depending on setting can be time-consuming and incur risk of negatively impacting the user experience, not to mention that it is generally more difficult when access to a large user base is not taken as granted. A compromise based on em counterfactual analysis is to present some subset of online users with recommendation results that have been randomized or otherwise manipulated, log their interactions, and then use those to de-bias offline evaluations on historical data. However, previous work does not offer clear conclusions on how well such methods correlate with and are able to predict the results of online A/B tests. Understanding this is crucial to widespread adoption of new offline evaluation techniques in recommender systems. In this work we present a comparison of offline and online evaluation results for a particular recommendation problem: recommending playlists of tracks to a user looking for music. We describe two different ways to think about de-biasing offline collections for more accurate evaluation. Our results show that, contrary to much of the previous work on this topic, properly-conducted offline experiments do correlate well to A/B test results, and moreover that we can expect an offline evaluation to identify the best candidate systems for online testing with high probability.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129151061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 83

Intelligent Traffic Analytics: From Monitoring to Controlling 智能交通分析:从监测到控制

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3290615

Sheng Wang, Yunzhuang Shen, Z. Bao, X. Qin

In this paper, we would like to demonstrate an intelligent traffic analytics system called T4, which enables intelligent analytics over real-time and historical trajectories from vehicles. At the front end, we visualize the current traffic flow and result trajectories of different types of queries, as well as the histograms of traffic flow and traffic lights. At the back end, T4 is able to support multiple types of common queries over trajectories, with compact storage, efficient index and fast pruning algorithms. The output of those queries can be used for further monitoring and analytics purposes. Moreover, we train the deep models for traffic flow prediction and traffic light control to reduce traffic congestion. A preliminary version of T4 is available at https://sites.google.com/site/shengwangcs/torch.

在本文中，我们想展示一个名为T4的智能交通分析系统，它可以对车辆的实时和历史轨迹进行智能分析。在前端，我们可视化当前的交通流量和不同类型查询的结果轨迹，以及交通流量和交通灯的直方图。在后端，T4能够支持多种类型的轨迹常见查询，具有紧凑的存储，高效的索引和快速修剪算法。这些查询的输出可用于进一步的监视和分析目的。此外，我们还训练了交通流预测和交通灯控制的深度模型，以减少交通拥堵。T4的初步版本可在https://sites.google.com/site/shengwangcs/torch上获得。

引用次数: 7

Session details: Session 6: Networks and Social Behavior 会议详情:会议6:网络和社会行为

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3310346

Huan Liu

引用次数: 0

XBully: Cyberbullying Detection within a Multi-Modal Context XBully:多模态环境中的网络欺凌检测

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3291037

Lu Cheng, Jundong Li, Yasin N. Silva, Deborah L. Hall, Huan Liu

Over the last decade, research has revealed the high prevalence of cyberbullying among youth and raised serious concerns in society. Information on the social media platforms where cyberbullying is most prevalent (e.g., Instagram, Facebook, Twitter) is inherently multi-modal, yet most existing work on cyberbullying identification has focused solely on building generic classification models that rely exclusively on text analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data (e.g., image, video, user profile, time, and location), and thus fail to offer a comprehensive understanding of cyberbullying. Conventionally, when information from different modalities is presented together, it often reveals complementary insights about the application domain and facilitates better learning performance. In this paper, we study the novel problem of cyberbullying detection within a multi-modal context by exploiting social media data in a collaborative way. This task, however, is challenging due to the complex combination of both cross-modal correlations among various modalities and structural dependencies between different social media sessions, and the diverse attribute information of different modalities. To address these challenges, we propose XBully, a novel cyberbullying detection framework, that first reformulates multi-modal social media data as a heterogeneous network and then aims to learn node embedding representations upon it. Extensive experimental evaluations on real-world multi-modal social media datasets show that the XBully framework is superior to the state-of-the-art cyberbullying detection models.

在过去的十年中，研究表明网络欺凌在青少年中非常普遍，并引起了社会的严重关注。网络欺凌最普遍的社交媒体平台(如Instagram、Facebook、Twitter)上的信息本质上是多模态的，但大多数现有的网络欺凌识别工作都只关注于建立通用分类模型，这些模型完全依赖于对在线社交媒体会话(如帖子)的文本分析。尽管在经验上取得了成功，但这些努力忽略了社交媒体数据中体现的多模态信息(如图像、视频、用户资料、时间和位置)，因此无法全面理解网络欺凌。通常，当来自不同模式的信息一起呈现时，它通常会揭示关于应用领域的互补见解，并促进更好的学习性能。在本文中，我们通过以协作的方式利用社交媒体数据，研究了多模态环境下网络欺凌检测的新问题。然而，由于各种模式之间的跨模式相关性和不同社交媒体会话之间的结构依赖关系的复杂组合，以及不同模式的不同属性信息，这一任务具有挑战性。为了应对这些挑战，我们提出了一种新的网络欺凌检测框架XBully，它首先将多模态社交媒体数据重新表述为异构网络，然后旨在学习其上的节点嵌入表示。对真实世界多模态社交媒体数据集的广泛实验评估表明，XBully框架优于最先进的网络欺凌检测模型。

{"title":"XBully: Cyberbullying Detection within a Multi-Modal Context","authors":"Lu Cheng, Jundong Li, Yasin N. Silva, Deborah L. Hall, Huan Liu","doi":"10.1145/3289600.3291037","DOIUrl":"https://doi.org/10.1145/3289600.3291037","url":null,"abstract":"Over the last decade, research has revealed the high prevalence of cyberbullying among youth and raised serious concerns in society. Information on the social media platforms where cyberbullying is most prevalent (e.g., Instagram, Facebook, Twitter) is inherently multi-modal, yet most existing work on cyberbullying identification has focused solely on building generic classification models that rely exclusively on text analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data (e.g., image, video, user profile, time, and location), and thus fail to offer a comprehensive understanding of cyberbullying. Conventionally, when information from different modalities is presented together, it often reveals complementary insights about the application domain and facilitates better learning performance. In this paper, we study the novel problem of cyberbullying detection within a multi-modal context by exploiting social media data in a collaborative way. This task, however, is challenging due to the complex combination of both cross-modal correlations among various modalities and structural dependencies between different social media sessions, and the diverse attribute information of different modalities. To address these challenges, we propose XBully, a novel cyberbullying detection framework, that first reformulates multi-modal social media data as a heterogeneous network and then aims to learn node embedding representations upon it. Extensive experimental evaluations on real-world multi-modal social media datasets show that the XBully framework is superior to the state-of-the-art cyberbullying detection models.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132495271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

Crosslingual Document Embedding as Reduced-Rank Ridge Regression 基于降阶脊回归的跨语言文档嵌入

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pub Date : 2019-01-30 DOI: 10.1145/3289600.3291023

Martin Josifoski, I. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West

There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.

最近，人们对将基于向量的单词表示扩展到多种语言很感兴趣，这样单词就可以跨语言进行比较。在本文中，我们将重点从单词转移到文档，并引入了一种方法，将用任何语言编写的文档嵌入到一个独立于语言的向量空间中。对于训练，我们的方法利用多语言语料库，其中相同的概念以多种语言覆盖(但不一定通过精确的翻译)，例如维基百科。我们的方法Cr5(跨语言降阶岭回归)首先训练一个基于岭回归的分类器，该分类器使用特定于语言的词袋特征来预测给定文档的概念。我们表明，当将学习到的权重矩阵约束为低秩时，它可以被分解以获得从特定语言的词袋到语言独立的嵌入的所需映射。大多数先前的方法是使用预训练的单语言词向量，对它们进行后处理使它们跨语言，最后对词向量进行平均以获得文档向量，而Cr5是端到端训练的，因此是跨语言的，也是文档级的。此外，由于我们的算法以奇异值分解为核心操作，因此具有很高的可扩展性。实验表明，我们的方法在跨语言文档检索任务上达到了最先进的性能。最后，虽然没有训练过嵌入句子和单词，但它在跨语言句子和单词检索任务上也取得了有竞争力的表现。

{"title":"Crosslingual Document Embedding as Reduced-Rank Ridge Regression","authors":"Martin Josifoski, I. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West","doi":"10.1145/3289600.3291023","DOIUrl":"https://doi.org/10.1145/3289600.3291023","url":null,"abstract":"There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133961719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12