首页 > 最新文献

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

英文 中文
Customer Simulation for Direct Marketing Experiments 直销实验的顾客模拟
Yegor Tkachenko, Mykel J. Kochenderfer, Krzysztof Kluza
Optimization of control policies for corporate customer relationship management (CRM) systems can boost customer satisfaction, reduce attrition, and increase expected lifetime value of the customer base. However, evaluation of these policies is often complicated. Policies can be evaluated with real-life marketing interactions, but such evaluation can be prohibitively expensive and time consuming. Customer simulators learned from data are an inexpensive alternative suitable for rapid campaign tests. We summarize the literature on the evaluation of direct marketing policies through simulation and propose a decomposition of the problem into distinct tasks: (a) generation of the initial client database snapshot and (b) propagation of clients through time in response to company actions. We present open-source simulators trained and validated on two direct marketing data sets of varying size and complexity.
优化企业客户关系管理(CRM)系统的控制策略可以提高客户满意度,减少人员流失,并增加客户群的预期生命周期价值。然而,对这些政策的评估往往是复杂的。政策可以通过现实生活中的营销互动进行评估,但这种评估可能非常昂贵且耗时。从数据中学习的客户模拟器是一种廉价的替代方案,适用于快速活动测试。我们通过模拟总结了关于直接营销政策评估的文献,并提出将问题分解为不同的任务:(a)生成初始客户数据库快照和(b)响应公司行动的客户随时间传播。我们在两个不同大小和复杂性的直接营销数据集上训练和验证了开源模拟器。
{"title":"Customer Simulation for Direct Marketing Experiments","authors":"Yegor Tkachenko, Mykel J. Kochenderfer, Krzysztof Kluza","doi":"10.1109/DSAA.2016.59","DOIUrl":"https://doi.org/10.1109/DSAA.2016.59","url":null,"abstract":"Optimization of control policies for corporate customer relationship management (CRM) systems can boost customer satisfaction, reduce attrition, and increase expected lifetime value of the customer base. However, evaluation of these policies is often complicated. Policies can be evaluated with real-life marketing interactions, but such evaluation can be prohibitively expensive and time consuming. Customer simulators learned from data are an inexpensive alternative suitable for rapid campaign tests. We summarize the literature on the evaluation of direct marketing policies through simulation and propose a decomposition of the problem into distinct tasks: (a) generation of the initial client database snapshot and (b) propagation of clients through time in response to company actions. We present open-source simulators trained and validated on two direct marketing data sets of varying size and complexity.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"271 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131552204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Sparse Linear Discriminant Analysis in Structured Covariates Space 结构化协变量空间中的稀疏线性判别分析
S. Safo, Q. Long
Classification with high dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high dimensional data is as bad as random guessing due to the many noise features that increases misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this article and propose methods that incorporate variable selection into the classification problem, for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods to existing sparse LDA approaches via simulation studies and real data analysis.
在许多现代统计研究中,高维变量分类是一个流行的目标。Fisher的线性判别分析(LDA)是一种将实体划分为现有群体的常用而有效的工具。众所周知,使用Fisher判别法对高维数据进行分类与随机猜测一样糟糕,因为许多噪声特征增加了误分类率。最近,人们认识到复杂的生物机制是通过多个特征共同作用而发生的,尽管这些特征单独可能会导致数据中的噪声积累。鉴于此,重要的是使用使用重要变量子集的判别向量进行分类,同时也利用特征之间的先验生物学关系。我们在本文中解决了这个问题,并提出了将变量选择纳入分类问题的方法,以识别重要的生物标志物。此外,我们使用无向图将变量之间关系的先验信息纳入LDA问题,以识别功能上有意义的生物标志物。通过仿真研究和实际数据分析,将我们的方法与现有的稀疏LDA方法进行了比较。
{"title":"Sparse Linear Discriminant Analysis in Structured Covariates Space","authors":"S. Safo, Q. Long","doi":"10.1002/sam.11376","DOIUrl":"https://doi.org/10.1002/sam.11376","url":null,"abstract":"Classification with high dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high dimensional data is as bad as random guessing due to the many noise features that increases misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this article and propose methods that incorporate variable selection into the classification problem, for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods to existing sparse LDA approaches via simulation studies and real data analysis.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Senpy: A Pragmatic Linked Sentiment Analysis Framework Senpy:一个语用关联情感分析框架
J. F. Sánchez-Rada, C. Iglesias, Ignacio Corcuera, Óscar Araque
Sentiment and emotion analysis technologies have quickly gained momentum in industry and academia. This popularity has spawned a myriad of service and tools. Due to the lack of common interfaces and models, each of these services imposes specific interfaces and representation models. Heterogeneity makes it costly to integrate different services, evaluate them or switch between them. This work aims to remedy heterogeneity by providing an extensible framework and an API aligned with the NIF service specification. It also includes a reference implementation, a first step towards a successful and cost-effective adoption. The specific contributions in this paper are: (i) the Senpy framework, (ii) an architecture for the framework that follows a plug-in approach, (iii) a reference open source implementation of the architecture, (iv) the use and validation of the framework and architecture in a big data sentiment analysis European project. Our aim is to foster the development of a new generation of emotion aware services by isolating the development of new algorithms from the representation of results and the deployment of services.
情绪和情感分析技术在工业界和学术界迅速获得了发展势头。这种流行催生了无数的服务和工具。由于缺乏公共接口和模型,这些服务中的每一个都强加了特定的接口和表示模型。异构性使得集成不同的服务、评估它们或在它们之间切换的成本很高。这项工作旨在通过提供与NIF服务规范一致的可扩展框架和API来纠正异构性。它还包括参考实施,这是朝着成功和具有成本效益的采用迈出的第一步。本文的具体贡献是:(i) Senpy框架,(ii)遵循插件方法的框架架构,(iii)架构的参考开源实现,(iv)框架和架构在大数据情感分析欧洲项目中的使用和验证。我们的目标是通过将新算法的开发与结果的表示和服务的部署隔离开来,促进新一代情感感知服务的发展。
{"title":"Senpy: A Pragmatic Linked Sentiment Analysis Framework","authors":"J. F. Sánchez-Rada, C. Iglesias, Ignacio Corcuera, Óscar Araque","doi":"10.1109/DSAA.2016.79","DOIUrl":"https://doi.org/10.1109/DSAA.2016.79","url":null,"abstract":"Sentiment and emotion analysis technologies have quickly gained momentum in industry and academia. This popularity has spawned a myriad of service and tools. Due to the lack of common interfaces and models, each of these services imposes specific interfaces and representation models. Heterogeneity makes it costly to integrate different services, evaluate them or switch between them. This work aims to remedy heterogeneity by providing an extensible framework and an API aligned with the NIF service specification. It also includes a reference implementation, a first step towards a successful and cost-effective adoption. The specific contributions in this paper are: (i) the Senpy framework, (ii) an architecture for the framework that follows a plug-in approach, (iii) a reference open source implementation of the architecture, (iv) the use and validation of the framework and architecture in a big data sentiment analysis European project. Our aim is to foster the development of a new generation of emotion aware services by isolating the development of new algorithms from the representation of results and the deployment of services.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122000326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Data-Driven Sales Leads Prediction for Everything-as-a-Service in the Cloud 数据驱动的销售线索预测云中的一切即服务
Chul Sung, Bo Zhang, Chunhui Y. Higgins, Y. Choe
A cloud platform website, offering a catalog of services, operates under a freemium business model or a free trial business model, aggressively marketing to customers who have previously visited. In such a cloud platform or service business, accurate identification of high profile customers is central to the success for the business. However, there are several limitations of existing approaches because of the following challenges: (1) heavy customer traffic flows, (2) the noise in user behaviors, (3) a lack of collaboration across stakeholders, (4) class imbalanced customer data (few paying customers vs. high numbers of freemium or trial customers), and (5) unpredictable business environments. In this paper, we propose a data-driven iterative sales lead prediction framework for cloud everything as a service (XaaS), including a cloud platform or software. In this framework, from the BizDevOps process we collaborate to extract business insights from multiple business stakeholders. From these business insights, we calculate service usage scores using our RFDL (Recency, Frequency, Duration, and Lifetime) analysis and estimate sales lead prediction based on the usage scores in a supervised manner. Our framework adapts to a continuously changing environment through iterations of the whole process, maintains its performance of sales lead prediction, and finally shares the prediction results to the sales or marketing team effectively. A three-month pilot implementation of the framework led to more than 300 paying customers and more than $200K increase in revenue. We expect our scalable, iterative sales lead prediction approach to be widely applicable to online or cloud business domains where there is a constant flux of customer traffic.
一个提供服务目录的云平台网站,以免费增值商业模式或免费试用商业模式运营,积极向以前访问过的客户进行营销。在这样的云平台或服务业务中,准确识别高知名度客户是业务成功的关键。然而,由于以下挑战,现有方法存在一些局限性:(1)大量的客户流量,(2)用户行为的噪音,(3)缺乏利益相关者之间的协作,(4)客户数据的类别不平衡(很少付费客户vs大量免费增值或试用客户),以及(5)不可预测的商业环境。在本文中,我们提出了一个数据驱动的迭代销售领先预测框架,用于云一切即服务(XaaS),包括云平台或软件。在这个框架中,从BizDevOps过程中,我们协作从多个业务涉众中提取业务见解。根据这些业务见解,我们使用RFDL(最近度、频率、持续时间和生命周期)分析计算服务使用分数,并以监督的方式根据使用分数估计销售线索预测。我们的框架通过整个过程的迭代来适应不断变化的环境,保持其销售线索预测的性能,并最终将预测结果有效地分享给销售或营销团队。经过三个月的试点实施,该框架吸引了300多名付费客户,收入增加了20多万美元。我们希望我们的可扩展、迭代的销售线索预测方法能够广泛适用于客户流量不断变化的在线或云业务领域。
{"title":"Data-Driven Sales Leads Prediction for Everything-as-a-Service in the Cloud","authors":"Chul Sung, Bo Zhang, Chunhui Y. Higgins, Y. Choe","doi":"10.1109/DSAA.2016.83","DOIUrl":"https://doi.org/10.1109/DSAA.2016.83","url":null,"abstract":"A cloud platform website, offering a catalog of services, operates under a freemium business model or a free trial business model, aggressively marketing to customers who have previously visited. In such a cloud platform or service business, accurate identification of high profile customers is central to the success for the business. However, there are several limitations of existing approaches because of the following challenges: (1) heavy customer traffic flows, (2) the noise in user behaviors, (3) a lack of collaboration across stakeholders, (4) class imbalanced customer data (few paying customers vs. high numbers of freemium or trial customers), and (5) unpredictable business environments. In this paper, we propose a data-driven iterative sales lead prediction framework for cloud everything as a service (XaaS), including a cloud platform or software. In this framework, from the BizDevOps process we collaborate to extract business insights from multiple business stakeholders. From these business insights, we calculate service usage scores using our RFDL (Recency, Frequency, Duration, and Lifetime) analysis and estimate sales lead prediction based on the usage scores in a supervised manner. Our framework adapts to a continuously changing environment through iterations of the whole process, maintains its performance of sales lead prediction, and finally shares the prediction results to the sales or marketing team effectively. A three-month pilot implementation of the framework led to more than 300 paying customers and more than $200K increase in revenue. We expect our scalable, iterative sales lead prediction approach to be widely applicable to online or cloud business domains where there is a constant flux of customer traffic.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126832973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient Sampling-Based ADMM for Distributed Data 基于高效采样的分布式数据ADMM
Jun-Kun Wang, Shou-de Lin
This paper presents two strategies to speed up the alternating direction method of multipliers (ADMM) for distributed data. In the first method, inspired by stochastic gradient descent, each machine uses only a subset of its data at the first few iterations, speeding up those iterations. A key result is in proving that despite this approximation, our method enjoys the same convergence rate in terms of the number of iterations as the standard ADMM, and hence is faster overall. The second method also follows the idea of sampling a subset of the data to update the model before the communication of each round. It converts an objective to the approximated dual form and performs ADMM on the dual. The method turns out to be a distributed variant of the recently proposed SDCA-ADMM. Yet, compared to the straightforward distributed implementation of SDCA-ADMM, the proposed method enjoys less frequent communication between machines, better memory usage, and lighter computational demand. Experiments demonstrate the effectiveness of our two strategies.
本文提出了两种提高分布式数据乘法器交替方向法(ADMM)速度的策略。在第一种方法中,受随机梯度下降的启发,每台机器在前几次迭代中只使用其数据的一个子集,从而加快了迭代速度。一个关键的结果是证明,尽管有这种近似,我们的方法在迭代次数方面与标准ADMM具有相同的收敛速度,因此总体上更快。第二种方法也遵循了在每轮通信之前对数据子集进行采样以更新模型的思想。它将物镜转换为近似对偶形式,并对该对偶进行ADMM。该方法是最近提出的SDCA-ADMM的分布式变体。然而,与SDCA-ADMM的直接分布式实现相比,所提出的方法在机器之间的通信频率更低,内存使用更好,计算需求更少。实验证明了这两种策略的有效性。
{"title":"Efficient Sampling-Based ADMM for Distributed Data","authors":"Jun-Kun Wang, Shou-de Lin","doi":"10.1109/DSAA.2016.41","DOIUrl":"https://doi.org/10.1109/DSAA.2016.41","url":null,"abstract":"This paper presents two strategies to speed up the alternating direction method of multipliers (ADMM) for distributed data. In the first method, inspired by stochastic gradient descent, each machine uses only a subset of its data at the first few iterations, speeding up those iterations. A key result is in proving that despite this approximation, our method enjoys the same convergence rate in terms of the number of iterations as the standard ADMM, and hence is faster overall. The second method also follows the idea of sampling a subset of the data to update the model before the communication of each round. It converts an objective to the approximated dual form and performs ADMM on the dual. The method turns out to be a distributed variant of the recently proposed SDCA-ADMM. Yet, compared to the straightforward distributed implementation of SDCA-ADMM, the proposed method enjoys less frequent communication between machines, better memory usage, and lighter computational demand. Experiments demonstrate the effectiveness of our two strategies.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115241083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Role of Mentions on Tweet Virality 论提及在推特病毒式传播中的作用
Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Sumanth Bandi, Anand Kumar, Jean-Loup Guillaume, Bivas Mitra
In this paper, we investigate the role of mentions on tweet propagation. We propose a novel tweet propagation model SIR_MF based on a multiplex network framework, that allows to analyze the effects of mentioning on final retweet count. The basic bricks of this model are supported by a comprehensive study of multiple real datasets and simulations of the model show a nice agreement with the empirically observed tweet popularity. Studies and experiments also reveal that follower count, retweet rate & profile similarity are important factors in gaining tweet popularity and allow to better understand the impact of the mention strategies on the retweet count. Interestingly, we analytically identify a critical retweet rate regulating the role of mention on the tweet popularity. Finally, our data driven simulation demonstrates that the proposed mention recommendation heuristic "Easy-Mention" outperforms the benchmark "Whom-To-Mention" algorithm.
在本文中,我们研究了提及在tweet传播中的作用。我们提出了一种新的基于多路网络框架的推文传播模型SIR_MF,该模型允许分析提及对最终转发数的影响。该模型的基本组成部分得到了对多个真实数据集的全面研究的支持,模型的模拟与经验观察到的tweet流行度非常吻合。研究和实验还表明,关注者数量、转发率和个人资料相似度是获得推文受欢迎程度的重要因素,可以更好地理解提及策略对转发数的影响。有趣的是,我们通过分析确定了一个关键的转发率,它调节了提及对推文受欢迎程度的作用。最后,我们的数据驱动仿真表明,提出的启发式推荐“Easy-Mention”优于基准的“who - to - mention”算法。
{"title":"On the Role of Mentions on Tweet Virality","authors":"Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Sumanth Bandi, Anand Kumar, Jean-Loup Guillaume, Bivas Mitra","doi":"10.1109/DSAA.2016.28","DOIUrl":"https://doi.org/10.1109/DSAA.2016.28","url":null,"abstract":"In this paper, we investigate the role of mentions on tweet propagation. We propose a novel tweet propagation model SIR_MF based on a multiplex network framework, that allows to analyze the effects of mentioning on final retweet count. The basic bricks of this model are supported by a comprehensive study of multiple real datasets and simulations of the model show a nice agreement with the empirically observed tweet popularity. Studies and experiments also reveal that follower count, retweet rate & profile similarity are important factors in gaining tweet popularity and allow to better understand the impact of the mention strategies on the retweet count. Interestingly, we analytically identify a critical retweet rate regulating the role of mention on the tweet popularity. Finally, our data driven simulation demonstrates that the proposed mention recommendation heuristic \"Easy-Mention\" outperforms the benchmark \"Whom-To-Mention\" algorithm.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116282213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Temporal Network Change Detection Using Network Centralities 利用网络中心性进行时态网络变化检测
Yoshitaro Yonamoto, K. Morino, K. Yamanishi
In this paper, we propose a novel change detection method for temporal networks. In usual change detection algorithms, change scores are generated from an observed time series. When this change score reaches a threshold, an alert is raised to declare the change. Our method aggregates these change scores and alerts based on network centralities. Many types of changes in a network can be discovered from changes to the network structure. Thus, nodes and links should be monitored in order to recognize changes. However, it is difficult to focus on the appropriate nodes and links when there is little information regarding the dataset. Network centrality such as PageRank measures the importance of nodes in a network based on certain criteria. Therefore, it is natural to apply network centralities in order to improve the accuracy of change detection methods. Our analysis reveals how and when network centrality works well in terms of change detection. Based on this understanding, we propose an aggregating algorithm that emphasizes the appropriate network centralities. Our evaluation of the proposed aggregation algorithm showed highly accurate predictions for an artificial dataset and two real datasets. Our method contributes to extending the field of change detection in temporal networks by utilizing network centralities.
本文提出了一种新的时间网络变化检测方法。在通常的变化检测算法中,变化分数是由观察到的时间序列生成的。当此更改得分达到阈值时,将引发警报以声明更改。我们的方法基于网络中心性聚合这些变化分数和警报。从网络结构的变化中可以发现网络中许多类型的变化。因此,应该监视节点和链接,以便识别更改。然而,当关于数据集的信息很少时,很难关注适当的节点和链接。网络中心性(如PageRank)根据一定的标准衡量网络中节点的重要性。因此,为了提高变化检测方法的准确性,应用网络中心性是很自然的。我们的分析揭示了网络中心性如何以及何时在变化检测方面发挥作用。基于这种理解,我们提出了一种强调适当的网络中心性的聚合算法。我们对所提出的聚合算法的评估显示了对人工数据集和两个真实数据集的高度准确的预测。该方法利用网络中心性扩展了时态网络的变化检测领域。
{"title":"Temporal Network Change Detection Using Network Centralities","authors":"Yoshitaro Yonamoto, K. Morino, K. Yamanishi","doi":"10.1109/DSAA.2016.13","DOIUrl":"https://doi.org/10.1109/DSAA.2016.13","url":null,"abstract":"In this paper, we propose a novel change detection method for temporal networks. In usual change detection algorithms, change scores are generated from an observed time series. When this change score reaches a threshold, an alert is raised to declare the change. Our method aggregates these change scores and alerts based on network centralities. Many types of changes in a network can be discovered from changes to the network structure. Thus, nodes and links should be monitored in order to recognize changes. However, it is difficult to focus on the appropriate nodes and links when there is little information regarding the dataset. Network centrality such as PageRank measures the importance of nodes in a network based on certain criteria. Therefore, it is natural to apply network centralities in order to improve the accuracy of change detection methods. Our analysis reveals how and when network centrality works well in terms of change detection. Based on this understanding, we propose an aggregating algorithm that emphasizes the appropriate network centralities. Our evaluation of the proposed aggregation algorithm showed highly accurate predictions for an artificial dataset and two real datasets. Our method contributes to extending the field of change detection in temporal networks by utilizing network centralities.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116329489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Distributed Decision Tree Algorithm and Its Implementation on Big Data Platforms 一种分布式决策树算法及其在大数据平台上的实现
Jingxiang Chen, Tao Wang, Ralph Abbey, J. Pingenot
Decision tree algorithms are very popular in the field of data mining. This paper proposes a distributed decision tree algorithm and shows examples of its implementation on big data platforms. The major contribution of this paper is the novel KS-Tree algorithm which builds a decision tree in a distributed environment. KS-Tree is applied to some real world data mining problems and compared with state-of-the-art decision tree techniques that are implemented in R and Apache Spark. The results show that KS-Tree can achieve better results, especially with large data sets. Furthermore, we demonstrate that KS-Tree can be applied to various data mining tasks, such as variable selection.
决策树算法在数据挖掘领域非常流行。本文提出了一种分布式决策树算法,并给出了该算法在大数据平台上的实现实例。本文的主要贡献是新颖的KS-Tree算法,该算法在分布式环境下构建决策树。KS-Tree应用于一些现实世界的数据挖掘问题,并与R和Apache Spark中实现的最先进的决策树技术进行了比较。结果表明,KS-Tree可以获得更好的结果,特别是在大数据集上。此外,我们证明了KS-Tree可以应用于各种数据挖掘任务,例如变量选择。
{"title":"A Distributed Decision Tree Algorithm and Its Implementation on Big Data Platforms","authors":"Jingxiang Chen, Tao Wang, Ralph Abbey, J. Pingenot","doi":"10.1109/DSAA.2016.64","DOIUrl":"https://doi.org/10.1109/DSAA.2016.64","url":null,"abstract":"Decision tree algorithms are very popular in the field of data mining. This paper proposes a distributed decision tree algorithm and shows examples of its implementation on big data platforms. The major contribution of this paper is the novel KS-Tree algorithm which builds a decision tree in a distributed environment. KS-Tree is applied to some real world data mining problems and compared with state-of-the-art decision tree techniques that are implemented in R and Apache Spark. The results show that KS-Tree can achieve better results, especially with large data sets. Furthermore, we demonstrate that KS-Tree can be applied to various data mining tasks, such as variable selection.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128271426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Behavior-Oriented Time Segmentation for Mining Individualized Rules of Mobile Phone Users 面向行为的手机用户个性化规则挖掘时间分割
Iqbal H. Sarker, A. Colman, M. A. Kabir, Jun Han
Mobile or cellular phones can record various types of context data related to a user's phone call activities. In this paper, we present an approach to discovering individualized behavior rules for mobile users from their phone call records, based on the temporal context in which a user accepts, rejects or misses a call. One of the determinants of an individual's phone behavior is the various activities undertaken at various times of a day and days of the week. In many cases, such behavior will follow temporal patterns. Currently, researchers modeling user behavior using temporal context statically segment time into arbitrary categories (e.g., morning, evening) or periods (e.g., 1 hour). However, such time categorization does not necessarily map to the patterns of individual user activity and subsequent behavior. Therefore, we propose a behavior-oriented time segmentation (BOTS) technique that dynamically identifies diverse time segments for an individual user's behaviors based on the phone call records. Experiments on real datasets show that our proposed technique better captures the user's dominant call response behavior at various times of the day and week, thereby enabling more appropriate rules to be created for the purpose of automated handling of incoming calls, in an intelligent call interruption management system.
移动电话或蜂窝电话可以记录与用户通话活动相关的各种类型的上下文数据。在本文中,我们提出了一种基于用户接受、拒绝或错过电话的时间背景,从移动用户的电话记录中发现个性化行为规则的方法。一个人的手机行为的决定因素之一是在一天和一周的不同时间进行的各种活动。在许多情况下,这种行为将遵循时间模式。目前,研究人员使用时间上下文来建模用户行为,静态地将时间划分为任意类别(例如,早晨,晚上)或时间段(例如,1小时)。然而,这种时间分类并不一定映射到单个用户活动和后续行为的模式。因此,我们提出了一种基于行为导向的时间分割(BOTS)技术,该技术基于电话记录动态识别个体用户行为的不同时间段。在真实数据集上的实验表明,我们提出的技术更好地捕获了用户在一天和一周的不同时间的主要呼叫响应行为,从而能够在智能呼叫中断管理系统中创建更合适的规则,用于自动处理传入呼叫。
{"title":"Behavior-Oriented Time Segmentation for Mining Individualized Rules of Mobile Phone Users","authors":"Iqbal H. Sarker, A. Colman, M. A. Kabir, Jun Han","doi":"10.1109/DSAA.2016.60","DOIUrl":"https://doi.org/10.1109/DSAA.2016.60","url":null,"abstract":"Mobile or cellular phones can record various types of context data related to a user's phone call activities. In this paper, we present an approach to discovering individualized behavior rules for mobile users from their phone call records, based on the temporal context in which a user accepts, rejects or misses a call. One of the determinants of an individual's phone behavior is the various activities undertaken at various times of a day and days of the week. In many cases, such behavior will follow temporal patterns. Currently, researchers modeling user behavior using temporal context statically segment time into arbitrary categories (e.g., morning, evening) or periods (e.g., 1 hour). However, such time categorization does not necessarily map to the patterns of individual user activity and subsequent behavior. Therefore, we propose a behavior-oriented time segmentation (BOTS) technique that dynamically identifies diverse time segments for an individual user's behaviors based on the phone call records. Experiments on real datasets show that our proposed technique better captures the user's dominant call response behavior at various times of the day and week, thereby enabling more appropriate rules to be created for the purpose of automated handling of incoming calls, in an intelligent call interruption management system.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122848600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks 基于长短期记忆网络的汽车控制网络数据异常检测
Adrian Taylor, Sylvain P. Leblanc, N. Japkowicz
Modern automobiles have been proven vulnerable to hacking by security researchers. By exploiting vulnerabilities in the car's external interfaces, such as wifi, bluetooth, and physical connections, they can access a car's controller area network (CAN) bus. On the CAN bus, commands can be sent to control the car, for example cutting the brakes or stopping the engine. While securing the car's interfaces to the outside world is an important part of mitigating this threat, the last line of defence is detecting malicious behaviour on the CAN bus. We propose an anomaly detector based on a Long Short-Term Memory neural network to detect CAN bus attacks. The detector works by learning to predict the next data word originating from each sender on the bus. Highly surprising bits in the actual next word are flagged as anomalies. We evaluate the detector by synthesizing anomalies with modified CAN bus data. The synthesized anomalies are designed to mimic attacks reported in the literature. We show that the detector can detect anomalies we synthesized with low false alarm rates. Additionally, the granularity of the bit predictions can provide forensic investigators clues as to the nature of flagged anomalies.
安全研究人员已经证明,现代汽车很容易受到黑客攻击。通过利用汽车外部接口(如wifi、蓝牙和物理连接)中的漏洞,他们可以访问汽车的控制器局域网(can)总线。在CAN总线上,可以发送命令来控制汽车,例如切断刹车或停止发动机。虽然确保汽车与外界的接口安全是减轻这种威胁的重要组成部分,但最后一道防线是检测CAN总线上的恶意行为。提出了一种基于长短期记忆神经网络的异常检测器来检测CAN总线攻击。检测器通过学习预测来自总线上每个发送者的下一个数据字来工作。下一个单词中非常令人惊讶的部分被标记为异常。我们用修改后的CAN总线数据综合异常来评估探测器。合成的异常被设计成模仿文献中报道的攻击。实验结果表明,该检测器能够以较低的虚警率检测出我们合成的异常。此外,钻头预测的粒度可以为法医调查人员提供有关标记异常性质的线索。
{"title":"Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks","authors":"Adrian Taylor, Sylvain P. Leblanc, N. Japkowicz","doi":"10.1109/DSAA.2016.20","DOIUrl":"https://doi.org/10.1109/DSAA.2016.20","url":null,"abstract":"Modern automobiles have been proven vulnerable to hacking by security researchers. By exploiting vulnerabilities in the car's external interfaces, such as wifi, bluetooth, and physical connections, they can access a car's controller area network (CAN) bus. On the CAN bus, commands can be sent to control the car, for example cutting the brakes or stopping the engine. While securing the car's interfaces to the outside world is an important part of mitigating this threat, the last line of defence is detecting malicious behaviour on the CAN bus. We propose an anomaly detector based on a Long Short-Term Memory neural network to detect CAN bus attacks. The detector works by learning to predict the next data word originating from each sender on the bus. Highly surprising bits in the actual next word are flagged as anomalies. We evaluate the detector by synthesizing anomalies with modified CAN bus data. The synthesized anomalies are designed to mimic attacks reported in the literature. We show that the detector can detect anomalies we synthesized with low false alarm rates. Additionally, the granularity of the bit predictions can provide forensic investigators clues as to the nature of flagged anomalies.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128077029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 268
期刊
2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1