首页 > 最新文献

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management最新文献

英文 中文
Tensor Rank Estimation and Completion via CP-based Nuclear Norm 基于cp核范数的张量秩估计与补全
Qiquan Shi, Haiping Lu, Yiu-ming Cheung
Tensor completion (TC) is a challenging problem of recovering missing entries of a tensor from its partial observation. One main TC approach is based on CP/Tucker decomposition. However, this approach often requires the determination of a tensor rank a priori. This rank estimation problem is difficult in practice. Several Bayesian solutions have been proposed but they often under/over-estimate the tensor rank while being quite slow. To address this problem of rank estimation with missing entries, we view the weight vector of the orthogonal CP decomposition of a tensor to be analogous to the vector of singular values of a matrix. Subsequently, we define a new CP-based tensor nuclear norm as the $L_1$-norm of this weight vector. We then propose Tensor Rank Estimation based on $L_1$-regularized orthogonal CP decomposition (TREL1) for both CP-rank and Tucker-rank. Specifically, we incorporate a regularization with CP-based tensor nuclear norm when minimizing the reconstruction error in TC to automatically determine the rank of an incomplete tensor. Experimental results on both synthetic and real data show that: 1) Given sufficient observed entries, TREL1 can estimate the true rank (both CP-rank and Tucker-rank) of incomplete tensors well; 2) The rank estimated by TREL1 can consistently improve recovery accuracy of decomposition-based TC methods; 3) TREL1 is not sensitive to its parameters in general and more efficient than existing rank estimation methods.
张量补全(TC)是一个具有挑战性的问题,从张量的部分观测中恢复缺项。一种主要的TC方法是基于CP/Tucker分解。然而,这种方法通常需要先验地确定张量秩。这种秩估计问题在实践中比较困难。已经提出了几个贝叶斯解决方案,但它们经常低估/高估张量秩,而且速度很慢。为了解决这个缺项秩估计的问题,我们将张量的正交CP分解的权向量看作类似于矩阵的奇异值向量。随后,我们定义了一个新的基于cp的张量核范数作为这个权向量的L_1范数。然后,我们提出了基于$L_1$正则化正交CP分解(TREL1)的张量秩估计,用于CP- Rank和Tucker-rank。具体来说,我们在最小化TC中的重构误差时,将正则化与基于cp的张量核范数相结合,以自动确定不完整张量的秩。在合成数据和真实数据上的实验结果表明:1)在给定足够的观测条目的情况下,TREL1可以很好地估计不完全张量的真秩(CP-rank和Tucker-rank);2) TREL1估计的秩能持续提高基于分解的TC方法的恢复精度;3) TREL1总体上对参数不敏感,比现有的秩估计方法效率更高。
{"title":"Tensor Rank Estimation and Completion via CP-based Nuclear Norm","authors":"Qiquan Shi, Haiping Lu, Yiu-ming Cheung","doi":"10.1145/3132847.3132945","DOIUrl":"https://doi.org/10.1145/3132847.3132945","url":null,"abstract":"Tensor completion (TC) is a challenging problem of recovering missing entries of a tensor from its partial observation. One main TC approach is based on CP/Tucker decomposition. However, this approach often requires the determination of a tensor rank a priori. This rank estimation problem is difficult in practice. Several Bayesian solutions have been proposed but they often under/over-estimate the tensor rank while being quite slow. To address this problem of rank estimation with missing entries, we view the weight vector of the orthogonal CP decomposition of a tensor to be analogous to the vector of singular values of a matrix. Subsequently, we define a new CP-based tensor nuclear norm as the $L_1$-norm of this weight vector. We then propose Tensor Rank Estimation based on $L_1$-regularized orthogonal CP decomposition (TREL1) for both CP-rank and Tucker-rank. Specifically, we incorporate a regularization with CP-based tensor nuclear norm when minimizing the reconstruction error in TC to automatically determine the rank of an incomplete tensor. Experimental results on both synthetic and real data show that: 1) Given sufficient observed entries, TREL1 can estimate the true rank (both CP-rank and Tucker-rank) of incomplete tensors well; 2) The rank estimated by TREL1 can consistently improve recovery accuracy of decomposition-based TC methods; 3) TREL1 is not sensitive to its parameters in general and more efficient than existing rank estimation methods.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75262250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Coupled Sparse Matrix Factorization for Response Time Prediction in Logistics Services 耦合稀疏矩阵分解在物流服务响应时间预测中的应用
Yuqi Wang, Jiannong Cao, Lifang He, Wengen Li, Lichao Sun, Philip S. Yu
Nowadays, there is an emerging way of connecting logistics orders and van drivers, where it is crucial to predict the order response time. Accurate prediction of order response time would not only facilitate decision making on order dispatching, but also pave ways for applications such as supply-demand analysis and driver scheduling, leading to high system efficiency. In this work, we forecast order response time on current day by fusing data from order history and driver historical locations. Specifically, we propose Coupled Sparse Matrix Factorization (CSMF) to deal with the heterogeneous fusion and data sparsity challenges raised in this problem. CSMF jointly learns from multiple heterogeneous sparse data through the proposed weight setting mechanism therein. Experiments on real-world datasets demonstrate the effectiveness of our approach, compared to various baseline methods. The performances of many variants of the proposed method are also presented to show the effectiveness of each component.
如今,有一种新兴的方式连接物流订单和货车司机,其中预测订单响应时间至关重要。准确预测订单响应时间,不仅有利于订单调度决策,而且为供需分析、驾驶员调度等应用奠定基础,提高系统效率。在这项工作中,我们通过融合订单历史和司机历史位置的数据来预测当天的订单响应时间。具体来说,我们提出了耦合稀疏矩阵分解(CSMF)来解决这一问题中的异构融合和数据稀疏性挑战。CSMF通过提出的权重设置机制从多个异构稀疏数据中进行联合学习。与各种基线方法相比,在真实数据集上的实验证明了我们的方法的有效性。本文还给出了该方法的多个变体的性能,以显示每个组件的有效性。
{"title":"Coupled Sparse Matrix Factorization for Response Time Prediction in Logistics Services","authors":"Yuqi Wang, Jiannong Cao, Lifang He, Wengen Li, Lichao Sun, Philip S. Yu","doi":"10.1145/3132847.3132948","DOIUrl":"https://doi.org/10.1145/3132847.3132948","url":null,"abstract":"Nowadays, there is an emerging way of connecting logistics orders and van drivers, where it is crucial to predict the order response time. Accurate prediction of order response time would not only facilitate decision making on order dispatching, but also pave ways for applications such as supply-demand analysis and driver scheduling, leading to high system efficiency. In this work, we forecast order response time on current day by fusing data from order history and driver historical locations. Specifically, we propose Coupled Sparse Matrix Factorization (CSMF) to deal with the heterogeneous fusion and data sparsity challenges raised in this problem. CSMF jointly learns from multiple heterogeneous sparse data through the proposed weight setting mechanism therein. Experiments on real-world datasets demonstrate the effectiveness of our approach, compared to various baseline methods. The performances of many variants of the proposed method are also presented to show the effectiveness of each component.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75269657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Detecting Social Bots by Jointly Modeling Deep Behavior and Content Information 基于深度行为和内容信息联合建模的社交机器人检测
C. Cai, Linjing Li, D. Zeng
Bots are regarded as the most common kind of malwares in the era of Web 2.0. In recent years, Internet has been populated by hundreds of millions of bots, especially on social media. Thus, the demand on effective and efficient bot detection algorithms is more urgent than ever. Existing works have partly satisfied this requirement by way of laborious feature engineering. In this paper, we propose a deep bot detection model aiming to learn an effective representation of social user and then detect social bots by jointly modeling social behavior and content information. The proposed model learns the representation of social behavior by encoding both endogenous and exogenous factors which affect user behavior. As to the representation of content, we regard the user content as temporal text data instead of just plain text as be treated in other existing works to extract semantic information and latent temporal patterns. To the best of our knowledge, this is the first trial that applies deep learning in modeling social users and accomplishing social bot detection. Experiments on real world dataset collected from Twitter demonstrate the effectiveness of the proposed model.
机器人被认为是Web 2.0时代最常见的一种恶意软件。近年来,互联网上充斥着数以亿计的机器人,尤其是在社交媒体上。因此,对高效的机器人检测算法的需求比以往任何时候都更加迫切。现有的工作已经通过费力的特征工程部分地满足了这一要求。在本文中,我们提出了一种深度机器人检测模型,旨在学习社交用户的有效表示,然后通过对社交行为和内容信息的联合建模来检测社交机器人。该模型通过编码影响用户行为的内源性和外源性因素来学习社会行为的表示。在内容的表示上,我们将用户内容作为时间文本数据,而不是像其他现有的作品那样仅仅是纯文本来提取语义信息和潜在的时间模式。据我们所知,这是第一次将深度学习应用于社交用户建模和完成社交机器人检测的试验。从Twitter上收集的真实数据集的实验证明了所提出模型的有效性。
{"title":"Detecting Social Bots by Jointly Modeling Deep Behavior and Content Information","authors":"C. Cai, Linjing Li, D. Zeng","doi":"10.1145/3132847.3133050","DOIUrl":"https://doi.org/10.1145/3132847.3133050","url":null,"abstract":"Bots are regarded as the most common kind of malwares in the era of Web 2.0. In recent years, Internet has been populated by hundreds of millions of bots, especially on social media. Thus, the demand on effective and efficient bot detection algorithms is more urgent than ever. Existing works have partly satisfied this requirement by way of laborious feature engineering. In this paper, we propose a deep bot detection model aiming to learn an effective representation of social user and then detect social bots by jointly modeling social behavior and content information. The proposed model learns the representation of social behavior by encoding both endogenous and exogenous factors which affect user behavior. As to the representation of content, we regard the user content as temporal text data instead of just plain text as be treated in other existing works to extract semantic information and latent temporal patterns. To the best of our knowledge, this is the first trial that applies deep learning in modeling social users and accomplishing social bot detection. Experiments on real world dataset collected from Twitter demonstrate the effectiveness of the proposed model.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73428419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Capturing Feature-Level Irregularity in Disease Progression Modeling 捕获疾病进展建模中的特征级不规则性
Kaiping Zheng, Wei Wang, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip
Disease progression modeling (DPM) analyzes patients' electronic medical records (EMR) to predict the health state of patients, which facilitates accurate prognosis, early detection and treatment of chronic diseases. However, EMR are irregular because patients visit hospital irregularly based on the need of treatment. For each visit, they are typically given different diagnoses, prescribed various medications and lab tests. Consequently, EMR exhibit irregularity at the feature level. To handle this issue, we propose a model based on the Gated Recurrent Unit by decaying the effect of previous records using fine-grained feature-level time span information, and learn the decaying parameters for different features to take into account their different behaviours like decaying speeds under irregularity. Extensive experimental results in both an Alzheimer's disease dataset and a chronic kidney disease dataset demonstrate that our proposed model of capturing feature-level irregularity can effectively improve the accuracy of DPM.
疾病进展模型(Disease progression modeling, DPM)通过分析患者的电子病历(electronic medical records, EMR)来预测患者的健康状态,从而促进慢性病的准确预后、早期发现和治疗。然而,EMR是不规律的,因为患者根据治疗需要不定期访问医院。每次就诊,他们通常会得到不同的诊断,开出各种药物和实验室检查。因此,EMR在特征级别上表现出不规则性。为了解决这个问题,我们提出了一个基于门控循环单元的模型,利用细粒度的特征级时间跨度信息来衰减先前记录的影响,并学习不同特征的衰减参数,以考虑它们在不规则性下的不同行为,如衰减速度。在阿尔茨海默病数据集和慢性肾脏疾病数据集上的大量实验结果表明,我们提出的捕获特征级不规则的模型可以有效地提高DPM的准确性。
{"title":"Capturing Feature-Level Irregularity in Disease Progression Modeling","authors":"Kaiping Zheng, Wei Wang, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip","doi":"10.1145/3132847.3132944","DOIUrl":"https://doi.org/10.1145/3132847.3132944","url":null,"abstract":"Disease progression modeling (DPM) analyzes patients' electronic medical records (EMR) to predict the health state of patients, which facilitates accurate prognosis, early detection and treatment of chronic diseases. However, EMR are irregular because patients visit hospital irregularly based on the need of treatment. For each visit, they are typically given different diagnoses, prescribed various medications and lab tests. Consequently, EMR exhibit irregularity at the feature level. To handle this issue, we propose a model based on the Gated Recurrent Unit by decaying the effect of previous records using fine-grained feature-level time span information, and learn the decaying parameters for different features to take into account their different behaviours like decaying speeds under irregularity. Extensive experimental results in both an Alzheimer's disease dataset and a chronic kidney disease dataset demonstrate that our proposed model of capturing feature-level irregularity can effectively improve the accuracy of DPM.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75365561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Automatic Navbox Generation by Interpretable Clustering over Linked Entities 链接实体上可解释聚类的自动导航框生成
Chenhao Xie, Lihan Chen, Jiaqing Liang, Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang, Wei Wang
Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.
很少有人致力于为维基百科文章生成结构化导航框(Navbox)。Navbox是Wikipedia条目页面中的一个表,它为相关实体提供一致的导航系统。Navbox对维基百科的读者和编辑效率至关重要。在本文中,我们的目标是为维基百科条目自动生成导航框。我们没有直接对非结构化的自然语言文本执行信息提取,而是通过关注Wikipedia文章中丰富的半结构化数据集来探索另一种方法:链接实体。本文的核心思想是:如果我们对链接实体进行聚类并进行适当的解释,我们就可以为文章实体构建一个高质量的导航框。我们提出了一种聚类-标记算法来实现这一思想。实验表明,所提出的解决方案是有效的。最终,我们的方法为维基百科提供了195万个高质量的新导航框。
{"title":"Automatic Navbox Generation by Interpretable Clustering over Linked Entities","authors":"Chenhao Xie, Lihan Chen, Jiaqing Liang, Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang, Wei Wang","doi":"10.1145/3132847.3132899","DOIUrl":"https://doi.org/10.1145/3132847.3132899","url":null,"abstract":"Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75377581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deception Detection: When Computers Become Better than Humans 欺骗检测:当计算机变得比人类更好
Rada Mihalcea
Whether we like it or not, deception happens every day and everywhere: thousands of trials taking place daily around the world; little white lies: "I'm busy that day!" even if your calendar is blank; news "with a twist" (a.k.a. fake news) meant to attract the readers attraction, and get some advertisement clicks on the side; portrayed identities, on dating sites and elsewhere. Can a computer automatically detect deception in written accounts or in video recordings? In this talk, I will describe our work in building linguistic and multimodal algorithms for deception detection, targeting deceptive statements, trial videos, fake news, identity deceptions, and also going after deception in multiple cultures. I will also show how these algorithms can provide insights into what makes a good lie - and thus teach us how we can spot a liar. As it turns out, computers can be trained to identify lies in many different contexts, and they can do it much better than humans do!
不管我们喜欢与否,欺骗每天都在发生,无处不在:世界各地每天都有成千上万的审判发生;善意的小谎言:“我那天很忙!”即使你的日历是空的;新闻“带拐弯抹角”(又名假新闻)意在吸引读者的眼球,并在一旁获得一些广告点击;在交友网站和其他地方,伪造身份。计算机能自动检测出书面记录或录像中的欺骗行为吗?在这次演讲中,我将描述我们在建立语言和多模态算法的工作,用于欺骗检测,针对欺骗性陈述,审判视频,假新闻,身份欺骗,以及在多种文化中追踪欺骗。我还将展示这些算法是如何让我们洞察到什么是一个好的谎言——从而教会我们如何识别说谎者。事实证明,经过训练,计算机可以在许多不同的环境中识别谎言,而且它们比人类做得更好!
{"title":"Deception Detection: When Computers Become Better than Humans","authors":"Rada Mihalcea","doi":"10.1145/3132847.3137174","DOIUrl":"https://doi.org/10.1145/3132847.3137174","url":null,"abstract":"Whether we like it or not, deception happens every day and everywhere: thousands of trials taking place daily around the world; little white lies: \"I'm busy that day!\" even if your calendar is blank; news \"with a twist\" (a.k.a. fake news) meant to attract the readers attraction, and get some advertisement clicks on the side; portrayed identities, on dating sites and elsewhere. Can a computer automatically detect deception in written accounts or in video recordings? In this talk, I will describe our work in building linguistic and multimodal algorithms for deception detection, targeting deceptive statements, trial videos, fake news, identity deceptions, and also going after deception in multiple cultures. I will also show how these algorithms can provide insights into what makes a good lie - and thus teach us how we can spot a liar. As it turns out, computers can be trained to identify lies in many different contexts, and they can do it much better than humans do!","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75576058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FM-Hawkes: A Hawkes Process Based Approach for Modeling Online Activity Correlations 基于Hawkes过程的在线活动相关性建模方法
Sha Li, Xiaofeng Gao, Weiming Bao, Guihai Chen
Understanding and predicting user behavior on online platforms has proved to be of significant value, with applications spanning from targeted advertising, political campaigning, anomaly detection to user self-monitoring. With the growing functionality and flexibility of online platforms, users can now accomplish a variety of tasks online. This advancement has rendered many previous works that focus on modeling a single type of activity obsolete. In this work, we target this new problem by modeling the interplay between the time series of different types of activities and apply our model to predict future user behavior. Our model, FM-Hawkes, stands for Fourier-based kernel multi-dimensional Hawkes process. Specifically, we model the multiple activity time series as a multi-dimensional Hawkes process. The correlations between different types of activities are then captured by the influence factor. As for the temporal triggering kernel, we observe that the intensity function consists of numerous kernel functions with time shift. Thus, we employ a Fourier transformation based non-parametric estimation. Our model is not bound to any particular platform and explicitly interprets the causal relationship between actions. By applying our model to real-life datasets, we confirm that the mutual excitation effect between different activities prevails among users. Prediction results show our superiority over models that do not consider action types and flexible kernels
理解和预测在线平台上的用户行为已被证明具有重要价值,应用范围从定向广告、政治竞选、异常检测到用户自我监控。随着在线平台的功能和灵活性不断增长,用户现在可以在线完成各种任务。这一进步使得许多先前专注于单一类型活动建模的工作过时了。在这项工作中,我们通过建模不同类型活动的时间序列之间的相互作用来解决这个新问题,并应用我们的模型来预测未来的用户行为。我们的模型,FM-Hawkes,代表基于傅里叶的核多维Hawkes过程。具体来说,我们将多个活动时间序列建模为一个多维霍克斯过程。然后,影响因子捕获不同类型活动之间的相关性。对于时间触发核,我们观察到强度函数由许多具有时移的核函数组成。因此,我们采用基于非参数估计的傅里叶变换。我们的模型不受任何特定平台的约束,并且明确地解释了行为之间的因果关系。通过将我们的模型应用于实际数据集,我们证实了不同活动之间的相互激励效应在用户中普遍存在。预测结果表明我们的模型优于不考虑动作类型和柔性核的模型
{"title":"FM-Hawkes: A Hawkes Process Based Approach for Modeling Online Activity Correlations","authors":"Sha Li, Xiaofeng Gao, Weiming Bao, Guihai Chen","doi":"10.1145/3132847.3132883","DOIUrl":"https://doi.org/10.1145/3132847.3132883","url":null,"abstract":"Understanding and predicting user behavior on online platforms has proved to be of significant value, with applications spanning from targeted advertising, political campaigning, anomaly detection to user self-monitoring. With the growing functionality and flexibility of online platforms, users can now accomplish a variety of tasks online. This advancement has rendered many previous works that focus on modeling a single type of activity obsolete. In this work, we target this new problem by modeling the interplay between the time series of different types of activities and apply our model to predict future user behavior. Our model, FM-Hawkes, stands for Fourier-based kernel multi-dimensional Hawkes process. Specifically, we model the multiple activity time series as a multi-dimensional Hawkes process. The correlations between different types of activities are then captured by the influence factor. As for the temporal triggering kernel, we observe that the intensity function consists of numerous kernel functions with time shift. Thus, we employ a Fourier transformation based non-parametric estimation. Our model is not bound to any particular platform and explicitly interprets the causal relationship between actions. By applying our model to real-life datasets, we confirm that the mutual excitation effect between different activities prevails among users. Prediction results show our superiority over models that do not consider action types and flexible kernels","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76157235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Query and Animate Multi-attribute Trajectory Data 查询和动画多属性轨迹数据
Jianqiu Xu, R. H. Güting
The widespread use of GPS-enabled devices has led to huge amounts of trajectory data. In addition to location and time, trajectories are associated with descriptive attributes representing different aspects of real entities, called multi-attribute trajectories. This comes from the combination of several data sources and enables a range of new applications in which users can find interesting trajectories and discover potential relationships that cannot be determined solely based on GPS data. In this demo, we provide the motivation scenario and introduce a system that is developed to integrate standard trajectories (a sequence of timestamped locations) and attributes into one unified framework. The system is able to answer a range of interesting queries on multi-attribute trajectories that are not handled by standard trajectories. The system supports both standard trajectories and multi-attribute trajectories. We demonstrate how to form queries and animate multi-attribute trajectories in the system. To our knowledge, existing moving objects prototype systems do not support multi-attribute trajectories.
gps设备的广泛使用导致了大量的轨迹数据。除了位置和时间之外,轨迹还与代表真实实体不同方面的描述性属性相关联,称为多属性轨迹。这来自多个数据源的组合,并使一系列新的应用程序成为可能,用户可以在其中找到有趣的轨迹,并发现无法仅根据GPS数据确定的潜在关系。在这个演示中,我们提供了动机场景,并介绍了一个系统,该系统被开发用于将标准轨迹(一系列时间戳位置)和属性集成到一个统一的框架中。该系统能够回答一系列标准轨迹无法处理的多属性轨迹上的有趣查询。系统支持标准轨迹和多属性轨迹。我们演示了如何在系统中形成查询和动画化多属性轨迹。据我们所知,现有的运动对象原型系统不支持多属性轨迹。
{"title":"Query and Animate Multi-attribute Trajectory Data","authors":"Jianqiu Xu, R. H. Güting","doi":"10.1145/3132847.3133178","DOIUrl":"https://doi.org/10.1145/3132847.3133178","url":null,"abstract":"The widespread use of GPS-enabled devices has led to huge amounts of trajectory data. In addition to location and time, trajectories are associated with descriptive attributes representing different aspects of real entities, called multi-attribute trajectories. This comes from the combination of several data sources and enables a range of new applications in which users can find interesting trajectories and discover potential relationships that cannot be determined solely based on GPS data. In this demo, we provide the motivation scenario and introduce a system that is developed to integrate standard trajectories (a sequence of timestamped locations) and attributes into one unified framework. The system is able to answer a range of interesting queries on multi-attribute trajectories that are not handled by standard trajectories. The system supports both standard trajectories and multi-attribute trajectories. We demonstrate how to form queries and animate multi-attribute trajectories in the system. To our knowledge, existing moving objects prototype systems do not support multi-attribute trajectories.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72624418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Source Retrieval for Web-Scale Text Reuse Detection web规模文本重用检测的源检索
Matthias Hagen, Martin Potthast, Payam Adineh, Ehsan Fatehifar, Benno Stein
The first step of text reuse detection addresses the source retrieval problem: given a suspicious document, a set of candidate sources from which text might have been reused have to be retrieved by querying a search engine. Afterwards, in a second step, the retrieved candidates run through a text alignment with the suspicious document in order to identify reused passages. Obviously, any true source of text reuse that is not retrieved during the source retrieval step reduces the overall recall of a reuse detector. Hence, source retrieval is a recall-oriented task, a fact ignored even by experts: Only 3 of 20 teams participating in a respective task at PAN 2012-2016 managed to find more than half of the sources, the best one achieving a recall of only~0.59. We propose a new approach that reaches a recall of~0.89---a performance gain of~51%.
文本重用检测的第一步解决源检索问题:给定一个可疑文档,必须通过查询搜索引擎检索可能重用文本的一组候选源。然后,在第二步中,检索到的候选文档与可疑文档进行文本对齐,以便识别重用的段落。显然,在源检索步骤中没有检索到的任何真正的文本重用源都会降低重用检测器的总召回率。因此,源检索是一个面向回忆的任务,这一事实甚至被专家们忽视了:在PAN 2012-2016上,参加相应任务的20个团队中,只有3个团队设法找到了一半以上的源,最好的团队实现了约0.59的召回率。我们提出了一种新的方法,召回率达到了0.89,性能提高了51%。
{"title":"Source Retrieval for Web-Scale Text Reuse Detection","authors":"Matthias Hagen, Martin Potthast, Payam Adineh, Ehsan Fatehifar, Benno Stein","doi":"10.1145/3132847.3133097","DOIUrl":"https://doi.org/10.1145/3132847.3133097","url":null,"abstract":"The first step of text reuse detection addresses the source retrieval problem: given a suspicious document, a set of candidate sources from which text might have been reused have to be retrieved by querying a search engine. Afterwards, in a second step, the retrieved candidates run through a text alignment with the suspicious document in order to identify reused passages. Obviously, any true source of text reuse that is not retrieved during the source retrieval step reduces the overall recall of a reuse detector. Hence, source retrieval is a recall-oriented task, a fact ignored even by experts: Only 3 of 20 teams participating in a respective task at PAN 2012-2016 managed to find more than half of the sources, the best one achieving a recall of only~0.59. We propose a new approach that reaches a recall of~0.89---a performance gain of~51%.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77651220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Additional Workshops Co-located with CIKM 2017 与CIKM 2017同期举办的额外研讨会
M. Winslett
Summary of three workshops co-located with CIKM 2017.
与CIKM 2017同期举办的三个研讨会总结。
{"title":"Additional Workshops Co-located with CIKM 2017","authors":"M. Winslett","doi":"10.1145/3132847.3152359","DOIUrl":"https://doi.org/10.1145/3132847.3152359","url":null,"abstract":"Summary of three workshops co-located with CIKM 2017.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77694724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1