首页 > 最新文献

Proceedings of the 3rd IKDD Conference on Data Science, 2016最新文献

英文 中文
On the Dynamics of Username Changing Behavior on Twitter 关于Twitter用户名改变行为的动态
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888452
Paridhi Jain, P. Kumaraguru
People extensively use username to lookup users, their profiles and tweets that mention them via Twitter search engine. Often, the searched username is outdated due to a recent username change and no longer refers to the user of interest. Search by the user's old username results in a failed attempt to reach the user's profile, thereby making others falsely believe that the user account has been deactivated. Such search can also redirect to a different user who later picks the old username, thereby reaching to a different person altogether. Past studies show that a substantial section of Twitter users change their username over time. We also observe similar trends when tracked 8.7 million users on Twitter for a duration of two months. To this point, little is known about how and why do these users undergo changes to their username, given the consequences of unreachability. To answer this, we analyze username changing behavior of carefully selected users on Twitter and find that users change username frequently within short time intervals (a day) and choose new username un-related to the old one. Few favor a username by repeatedly choosing it multiple times. We explore few of the many reasons that may have caused username changes. We believe that studying username changing behavior can help correctly find the user of interest in addition to learning username creation strategies and uncovering plausible malicious intentions for the username change.
人们广泛使用用户名来查找用户、他们的个人资料以及通过Twitter搜索引擎提到他们的推文。通常,由于最近的用户名更改,搜索的用户名已经过时,不再指向感兴趣的用户。通过用户的旧用户名进行搜索将导致无法访问该用户的配置文件,从而使其他人错误地认为该用户帐户已停用。这样的搜索也可以重定向到另一个用户,这个用户后来选择了旧的用户名,从而达到完全不同的人。过去的研究表明,相当一部分Twitter用户会随着时间的推移而改变他们的用户名。我们对Twitter上的870万用户进行了为期两个月的跟踪调查,也发现了类似的趋势。到目前为止,考虑到不可访问的后果,对于这些用户如何以及为什么要更改其用户名知之甚少。为了回答这个问题,我们分析了精心挑选的Twitter用户的用户名更改行为,发现用户在短时间间隔(一天)内频繁更改用户名,并选择与旧用户无关的新用户名。很少有人喜欢反复选择一个用户名。我们探讨了可能导致用户名更改的许多原因中的一些。我们认为,除了学习用户名创建策略和发现用户名更改的恶意意图外,研究用户名更改行为还可以帮助正确找到感兴趣的用户。
{"title":"On the Dynamics of Username Changing Behavior on Twitter","authors":"Paridhi Jain, P. Kumaraguru","doi":"10.1145/2888451.2888452","DOIUrl":"https://doi.org/10.1145/2888451.2888452","url":null,"abstract":"People extensively use username to lookup users, their profiles and tweets that mention them via Twitter search engine. Often, the searched username is outdated due to a recent username change and no longer refers to the user of interest. Search by the user's old username results in a failed attempt to reach the user's profile, thereby making others falsely believe that the user account has been deactivated. Such search can also redirect to a different user who later picks the old username, thereby reaching to a different person altogether. Past studies show that a substantial section of Twitter users change their username over time. We also observe similar trends when tracked 8.7 million users on Twitter for a duration of two months. To this point, little is known about how and why do these users undergo changes to their username, given the consequences of unreachability. To answer this, we analyze username changing behavior of carefully selected users on Twitter and find that users change username frequently within short time intervals (a day) and choose new username un-related to the old one. Few favor a username by repeatedly choosing it multiple times. We explore few of the many reasons that may have caused username changes. We believe that studying username changing behavior can help correctly find the user of interest in addition to learning username creation strategies and uncovering plausible malicious intentions for the username change.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Learning DTW-Shapelets for Time-Series Classification 学习DTW-Shapelets用于时间序列分类
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888456
Mit Shah, Josif Grabocka, Nicolas Schilling, Martin Wistuba, L. Schmidt-Thieme
Shapelets are discriminative patterns in time series, that best predict the target variable when their distances to the respective time series are used as features for a classifier. Since the shapelet is simply any time series of some length less than or equal to the length of the shortest time series in our data set, there is an enormous amount of possible shapelets present in the data. Initially, shapelets were found by extracting numerous candidates and evaluating them for their prediction quality. Then, Grabocka et al. [2] proposed a novel approach of learning time series shapelets called LTS. A new mathematical formalization of the task via a classification objective function was proposed and a tailored stochastic gradient learning was applied. It enabled learning near-to-optimal shapelets without the overhead of trying out lots of candidates. The Euclidean distance measure was used as distance metric in the proposed approach. As a limitation, it is not able to learn a single shapelet, that can be representative of different subsequences of time series, which are just warped along time axis. To consider these cases, we propose to use Dynamic Time Warping (DTW) as a distance measure in the framework of LTS. The proposed approach was evaluated on 11 real world data sets from the UCR repository and a synthetic data set created by ourselves. The experimental results show that the proposed approach outperforms the existing methods on these data sets.
Shapelets是时间序列中的判别模式,当它们到各自时间序列的距离被用作分类器的特征时,它可以最好地预测目标变量。由于shapelet是长度小于或等于我们数据集中最短时间序列长度的任何时间序列,因此数据中存在大量可能的shapelet。最初,shapelets是通过提取大量候选对象并评估它们的预测质量来发现的。然后,Grabocka等人提出了一种新的学习时间序列小波的方法,称为LTS。提出了一种新的基于分类目标函数的任务数学形式化方法,并应用了定制的随机梯度学习。它可以学习接近最优的shapelets,而无需尝试大量候选对象的开销。该方法采用欧几里得距离度量作为距离度量。作为一个限制,它不能学习一个单一的shapelet,它可以代表时间序列的不同子序列,这些子序列只是沿着时间轴弯曲。考虑到这些情况,我们建议使用动态时间翘曲(DTW)作为LTS框架中的距离度量。在来自UCR存储库的11个真实数据集和我们自己创建的一个合成数据集上对所提出的方法进行了评估。实验结果表明,该方法在这些数据集上的性能优于现有方法。
{"title":"Learning DTW-Shapelets for Time-Series Classification","authors":"Mit Shah, Josif Grabocka, Nicolas Schilling, Martin Wistuba, L. Schmidt-Thieme","doi":"10.1145/2888451.2888456","DOIUrl":"https://doi.org/10.1145/2888451.2888456","url":null,"abstract":"Shapelets are discriminative patterns in time series, that best predict the target variable when their distances to the respective time series are used as features for a classifier. Since the shapelet is simply any time series of some length less than or equal to the length of the shortest time series in our data set, there is an enormous amount of possible shapelets present in the data. Initially, shapelets were found by extracting numerous candidates and evaluating them for their prediction quality. Then, Grabocka et al. [2] proposed a novel approach of learning time series shapelets called LTS. A new mathematical formalization of the task via a classification objective function was proposed and a tailored stochastic gradient learning was applied. It enabled learning near-to-optimal shapelets without the overhead of trying out lots of candidates. The Euclidean distance measure was used as distance metric in the proposed approach. As a limitation, it is not able to learn a single shapelet, that can be representative of different subsequences of time series, which are just warped along time axis. To consider these cases, we propose to use Dynamic Time Warping (DTW) as a distance measure in the framework of LTS. The proposed approach was evaluated on 11 real world data sets from the UCR repository and a synthetic data set created by ourselves. The experimental results show that the proposed approach outperforms the existing methods on these data sets.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123187974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Competing Algorithm Detection from Research Papers 研究论文中的竞争算法检测
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888473
S. Ganguly, Vikram Pudi
We propose an unsupervised approach to extract all competing algorithms present in a given scholarly article. The algorithm names are treated as named entities and natural language processing techniques are used to extract them. All extracted entity names are linked with their respective original papers in the reference section by our novel entity-citation linking algorithm. Then these entity-citation pairs are ranked based on the number of comparison related cue-words present in the entity-citation context. We manually annotated a small subset of DBLP Computer Science conference papers and report both qualitative and quantitative results of our algorithm on it.
我们提出了一种无监督的方法来提取给定学术文章中存在的所有竞争算法。算法名称被视为命名实体,并使用自然语言处理技术提取它们。通过我们新颖的实体-引文链接算法,将所有提取的实体名称与参考文献部分中各自的原始论文链接起来。然后根据实体引文上下文中存在的比较相关线索词的数量对这些实体引文对进行排名。我们手动注释了DBLP计算机科学会议论文的一小部分,并报告了我们的算法在其上的定性和定量结果。
{"title":"Competing Algorithm Detection from Research Papers","authors":"S. Ganguly, Vikram Pudi","doi":"10.1145/2888451.2888473","DOIUrl":"https://doi.org/10.1145/2888451.2888473","url":null,"abstract":"We propose an unsupervised approach to extract all competing algorithms present in a given scholarly article. The algorithm names are treated as named entities and natural language processing techniques are used to extract them. All extracted entity names are linked with their respective original papers in the reference section by our novel entity-citation linking algorithm. Then these entity-citation pairs are ranked based on the number of comparison related cue-words present in the entity-citation context. We manually annotated a small subset of DBLP Computer Science conference papers and report both qualitative and quantitative results of our algorithm on it.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129092652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using Sort-Union to Enhance Economically-Efficient Sentiment Stream Analysis 利用排序联合增强经济高效的情感流分析
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888468
Prateek Goel, Manajit Chakraborty, C. R. Chowdary
Sentiment drifts due to people changing their opinions instantly on microblogs e.g. Twitter, are a major challenge in sentiment analysis. In this paper, we have developed a method that selects most frequent messages from a relevant message set constructed using state-of-the-art sampling approaches. Our proposed technique increases the robustness of the classifier against sentiment drifts. Experiments conducted on three publicly available standard Twitter datasets reveal that the modified version performs better in terms of reduction in training resources, error minimization and execution time.
由于人们在微博(如Twitter)上立即改变自己的观点而导致的情绪漂移是情绪分析中的一个主要挑战。在本文中,我们开发了一种方法,从使用最先进的采样方法构建的相关消息集中选择最频繁的消息。我们提出的技术增加了分类器对情感漂移的鲁棒性。在三个公开可用的标准Twitter数据集上进行的实验表明,修改后的版本在减少训练资源、最小化错误和执行时间方面表现更好。
{"title":"Using Sort-Union to Enhance Economically-Efficient Sentiment Stream Analysis","authors":"Prateek Goel, Manajit Chakraborty, C. R. Chowdary","doi":"10.1145/2888451.2888468","DOIUrl":"https://doi.org/10.1145/2888451.2888468","url":null,"abstract":"Sentiment drifts due to people changing their opinions instantly on microblogs e.g. Twitter, are a major challenge in sentiment analysis. In this paper, we have developed a method that selects most frequent messages from a relevant message set constructed using state-of-the-art sampling approaches. Our proposed technique increases the robustness of the classifier against sentiment drifts. Experiments conducted on three publicly available standard Twitter datasets reveal that the modified version performs better in terms of reduction in training resources, error minimization and execution time.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126420830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audience Prism: Segmentation and Early Classification of Visitors Based on Reading Interests 受众棱镜:基于阅读兴趣的受众细分与早期分类
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888459
Lilly Kumari, Sunny Dhamnani, Akshat Bhatnagar, Atanu R. Sinha, R. Sinha
The largest Media and Entertainment (M&E) web portals today cater to more than 100 Million unique visitors every month. In Customer Relationship Management, customer segmentation plays an important role, with the goal of targeting different products for different segments. Marketers segment their customers based on customer attributes. In the non-subscription based media business, the customer is analogous to the visitor, the product to the content, and a purchase to consumption. Knowing which segment an audience member belongs to, enables better engagement. In this work, we address the problems: 1) How can we segment audience members of an M&E web property based on their media consumption interests? 2) When a new visitor arrives, how can we classify them into one of the above defined segments (without having to wait for consumption history)? We apply our proposed solution to a real world data-set and show that we can achieve coherent clusters and can predict cluster membership with a high level of accuracy. We also build a tool that the editors can find valuable towards understanding their audience.
如今,最大的媒体和娱乐(M&E)门户网站每月的独立访问者超过1亿。在客户关系管理中,客户细分扮演着重要的角色,目标是针对不同的细分市场提供不同的产品。营销人员根据客户属性对客户进行细分。在基于非订阅的媒体业务中,客户相当于访问者,产品相当于内容,购买相当于消费。了解受众成员属于哪个部分,可以更好地参与其中。在这项工作中,我们解决了以下问题:1)我们如何根据媒体消费兴趣对M&E网站的受众成员进行细分?2)当一个新的访客到来时,我们如何将他们分类到上述定义的一个细分市场中(不需要等待消费历史)?我们将我们提出的解决方案应用于真实世界的数据集,并表明我们可以实现一致的聚类,并且可以高精度地预测聚类隶属度。我们还建立了一个工具,编辑可以发现有价值的了解他们的读者。
{"title":"Audience Prism: Segmentation and Early Classification of Visitors Based on Reading Interests","authors":"Lilly Kumari, Sunny Dhamnani, Akshat Bhatnagar, Atanu R. Sinha, R. Sinha","doi":"10.1145/2888451.2888459","DOIUrl":"https://doi.org/10.1145/2888451.2888459","url":null,"abstract":"The largest Media and Entertainment (M&E) web portals today cater to more than 100 Million unique visitors every month. In Customer Relationship Management, customer segmentation plays an important role, with the goal of targeting different products for different segments. Marketers segment their customers based on customer attributes. In the non-subscription based media business, the customer is analogous to the visitor, the product to the content, and a purchase to consumption. Knowing which segment an audience member belongs to, enables better engagement. In this work, we address the problems: 1) How can we segment audience members of an M&E web property based on their media consumption interests? 2) When a new visitor arrives, how can we classify them into one of the above defined segments (without having to wait for consumption history)? We apply our proposed solution to a real world data-set and show that we can achieve coherent clusters and can predict cluster membership with a high level of accuracy. We also build a tool that the editors can find valuable towards understanding their audience.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124615212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Smart filters for social retrieval 用于社会检索的智能过滤器
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888457
Balaji Vasan Srinivasan, Tanya Goyal, N. M. Nainani, Kartik K. Sreenivasan
Social media platform are increasingly becoming a rich source of information for capturing the views and opinions of online customers. Major brands listen to the social streams to understand the general pulse of their online community. The foremost task here is to construct a "filter" to fetch the brand-relevant data from the social streams. Due to the nature of social platforms, simple filters/queries for retrieval yield a lot of noise leading to a need for complicated filters. Constructing such complicated filters is a non-trivial task and requires significant time-investment from a social marketer. In this paper, we propose a method to automate this task by expanding a seed set of watch keywords to maximize the number of retrieved relevant social feeds around the brand and combining them appropriately into a social query. We show the strengths and weaknesses of the proposed approach in the light of real-world social feeds for various brands.
社交媒体平台正日益成为捕捉在线客户观点和意见的丰富信息来源。各大品牌通过收听社交流来了解其在线社区的总体脉搏。这里最重要的任务是构建一个“过滤器”,从社交流中获取与品牌相关的数据。由于社交平台的性质,用于检索的简单过滤器/查询会产生大量噪音,从而需要复杂的过滤器。构建如此复杂的过滤器是一项不平凡的任务,需要社会营销人员投入大量时间。在本文中,我们提出了一种自动化这项任务的方法,通过扩展观察关键字的种子集来最大化围绕品牌检索的相关社交提要的数量,并将它们适当地组合到社交查询中。我们将根据各种品牌的真实社交feed来展示所建议方法的优势和劣势。
{"title":"Smart filters for social retrieval","authors":"Balaji Vasan Srinivasan, Tanya Goyal, N. M. Nainani, Kartik K. Sreenivasan","doi":"10.1145/2888451.2888457","DOIUrl":"https://doi.org/10.1145/2888451.2888457","url":null,"abstract":"Social media platform are increasingly becoming a rich source of information for capturing the views and opinions of online customers. Major brands listen to the social streams to understand the general pulse of their online community. The foremost task here is to construct a \"filter\" to fetch the brand-relevant data from the social streams. Due to the nature of social platforms, simple filters/queries for retrieval yield a lot of noise leading to a need for complicated filters. Constructing such complicated filters is a non-trivial task and requires significant time-investment from a social marketer. In this paper, we propose a method to automate this task by expanding a seed set of watch keywords to maximize the number of retrieved relevant social feeds around the brand and combining them appropriately into a social query. We show the strengths and weaknesses of the proposed approach in the light of real-world social feeds for various brands.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114259044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Approach to Allocate Advertisement Slots for Banner Advertising 一种条幅广告的广告位分配方法
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888472
V. Kavya, P. Reddy
In the banner advertising scenario, an advertiser aims to reach the maximum number of potential visitors and a publisher tries to meet the requests of increased number of advertisers to maximize the revenue. In the literature, a model was introduced to extract the knowledge of coverage patterns from transactional database. In this paper, we propose an ad slots allocation approach by extending the notion of coverage patterns to select distinct sets of ad slots to meet the requests of multiple advertisers. The preliminary experimental results on a real world dataset show that the proposed approach meets the requests of increased number of advertisers when compared with the baseline approach of allocation.
在横幅广告场景中,广告主的目标是达到最大数量的潜在访问者,而发布商则试图满足越来越多的广告主的需求,以实现收入最大化。在文献中,引入了一个从事务数据库中提取覆盖模式知识的模型。在本文中,我们提出了一种广告位分配方法,通过扩展覆盖模式的概念来选择不同的广告位集来满足多个广告商的要求。在真实数据集上的初步实验结果表明,与基线分配方法相比,该方法能够满足越来越多的广告客户的需求。
{"title":"An Approach to Allocate Advertisement Slots for Banner Advertising","authors":"V. Kavya, P. Reddy","doi":"10.1145/2888451.2888472","DOIUrl":"https://doi.org/10.1145/2888451.2888472","url":null,"abstract":"In the banner advertising scenario, an advertiser aims to reach the maximum number of potential visitors and a publisher tries to meet the requests of increased number of advertisers to maximize the revenue. In the literature, a model was introduced to extract the knowledge of coverage patterns from transactional database. In this paper, we propose an ad slots allocation approach by extending the notion of coverage patterns to select distinct sets of ad slots to meet the requests of multiple advertisers. The preliminary experimental results on a real world dataset show that the proposed approach meets the requests of increased number of advertisers when compared with the baseline approach of allocation.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125566694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Consensus Clustering Approach for Discovering Overlapping Nodes in Social Networks 社会网络重叠节点发现的共识聚类方法
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888471
D. Shankar, S. Bhavani
Community discovery is an important problem that has been addressed in social networks through multiple perspectives. Most of these algorithms discover disjoint communities and yield widely varying results with regard to number of communities as well as community membership. We utilize this information positively by interpreting the results as opinions of different algorithms regarding membership of a node in a community. A novel approach to discovering overlapping nodes is proposed based on Consensus Clustering and we design two algorithms, namely core-consensus and periphery-consensus. The algorithms are implemented on LFR networks which are synthetic bench mark data sets created for community discovery and comparative performance is presented. It is shown that overlapping nodes are detected with a high Recall of above 96 % with an average F-measure of nearly 75% for dense networks and 65% for sparse networks which are on par with high-performing algorithms in the literature.
社区发现是社交网络中的一个重要问题,已经从多个角度得到了解决。这些算法大多发现不相交的社区,并在社区数量和社区成员方面产生很大差异的结果。我们积极地利用这些信息,将结果解释为关于社区中节点成员的不同算法的意见。提出了一种基于共识聚类的重叠节点发现新方法,并设计了核心共识算法和外围共识算法。这些算法是在LFR网络上实现的,LFR网络是为社区发现而创建的综合基准数据集,并给出了比较性能。结果表明,重叠节点的检测召回率高达96%以上,密集网络的平均f值接近75%,稀疏网络的平均f值接近65%,与文献中高性能算法相当。
{"title":"Consensus Clustering Approach for Discovering Overlapping Nodes in Social Networks","authors":"D. Shankar, S. Bhavani","doi":"10.1145/2888451.2888471","DOIUrl":"https://doi.org/10.1145/2888451.2888471","url":null,"abstract":"Community discovery is an important problem that has been addressed in social networks through multiple perspectives. Most of these algorithms discover disjoint communities and yield widely varying results with regard to number of communities as well as community membership. We utilize this information positively by interpreting the results as opinions of different algorithms regarding membership of a node in a community. A novel approach to discovering overlapping nodes is proposed based on Consensus Clustering and we design two algorithms, namely core-consensus and periphery-consensus. The algorithms are implemented on LFR networks which are synthetic bench mark data sets created for community discovery and comparative performance is presented. It is shown that overlapping nodes are detected with a high Recall of above 96 % with an average F-measure of nearly 75% for dense networks and 65% for sparse networks which are on par with high-performing algorithms in the literature.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131438514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Feature Creation based Slicing for Privacy Preserving Data Mining 基于特征创建的隐私保护数据挖掘切片
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888462
R. Priyadarsini, M. Valarmathi, S. Sivakumari
In the digital era vast amount of data are collected and shared for purpose of research and analysis. These data contain sensitive information about the people and organizations which needs to be protected during the process of data mining. This work proposes Feature Creation Based Slicing [FCBS] algorithm for preserving privacy such that sensitive data are not exposed during the process of data mining in Multi Trust Level [MTL] environment. The proposed algorithm applies three layers of privacy preservation using both perturbation and non-perturbation techniques and creates new features from already existing attribute vector. Experiments are performed on real life and benchmarked datasets and the results are compared with the existing slicing and L-diversity algorithms. The results show that privacy preserved datasets generated using the proposed algorithm yields negligible hiding failure while protecting sensitive patterns during association mining and gives comparable utility during classification. Due to feature creation process in the proposed algorithm, linking and known background attacks are prevented. Also, the variance values of the proposed privacy preserved datasets show that they can prevent diversity attacks.
在数字时代,为了研究和分析的目的,大量的数据被收集和共享。这些数据包含在数据挖掘过程中需要保护的人员和组织的敏感信息。本文提出了一种基于特征创建的切片(FCBS)算法来保护隐私,使得在多信任级别(MTL)环境下的数据挖掘过程中敏感数据不会暴露。该算法采用微扰和非微扰技术进行三层隐私保护,并从已有的属性向量中创建新的特征。在实际数据集和基准数据集上进行了实验,并将实验结果与现有的切片和l -分集算法进行了比较。结果表明,使用该算法生成的隐私保护数据集在关联挖掘过程中可以忽略隐藏失败,同时保护敏感模式,并且在分类过程中具有相当的效用。由于算法中的特征生成过程,避免了链接攻击和已知的后台攻击。此外,所提出的隐私保护数据集的方差值表明它们可以防止多样性攻击。
{"title":"Feature Creation based Slicing for Privacy Preserving Data Mining","authors":"R. Priyadarsini, M. Valarmathi, S. Sivakumari","doi":"10.1145/2888451.2888462","DOIUrl":"https://doi.org/10.1145/2888451.2888462","url":null,"abstract":"In the digital era vast amount of data are collected and shared for purpose of research and analysis. These data contain sensitive information about the people and organizations which needs to be protected during the process of data mining. This work proposes Feature Creation Based Slicing [FCBS] algorithm for preserving privacy such that sensitive data are not exposed during the process of data mining in Multi Trust Level [MTL] environment. The proposed algorithm applies three layers of privacy preservation using both perturbation and non-perturbation techniques and creates new features from already existing attribute vector. Experiments are performed on real life and benchmarked datasets and the results are compared with the existing slicing and L-diversity algorithms. The results show that privacy preserved datasets generated using the proposed algorithm yields negligible hiding failure while protecting sensitive patterns during association mining and gives comparable utility during classification. Due to feature creation process in the proposed algorithm, linking and known background attacks are prevented. Also, the variance values of the proposed privacy preserved datasets show that they can prevent diversity attacks.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134623794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Proceedings of the 3rd IKDD Conference on Data Science, 2016 第三届IKDD数据科学会议论文集,2016
Pub Date : 2016-03-13 DOI: 10.1145/2888451
M. Marathe, M. Mohania, Prateek Jain
This volume contains the papers presented at CoDS 2016: Third IKDD Conference on Data Sciences held on March 13-16, 2016 in Pune.
本卷包含2016年3月13日至16日在浦那举行的CoDS 2016:第三届IKDD数据科学会议上发表的论文。
{"title":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","authors":"M. Marathe, M. Mohania, Prateek Jain","doi":"10.1145/2888451","DOIUrl":"https://doi.org/10.1145/2888451","url":null,"abstract":"This volume contains the papers presented at CoDS 2016: Third IKDD Conference on Data Sciences held on March 13-16, 2016 in Pune.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127525668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 3rd IKDD Conference on Data Science, 2016
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1