2014 IEEE 30th International Conference on Data Engineering最新文献

英文中文

Interactive hierarchical tag clouds for summarizing spatiotemporal social contents 用于总结时空社会内容的交互式分层标签云

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816707

W. Kang, A. Tung, Feng Zhao, Xinyu Li

In recent years, much effort has been invested in analyzing social network data. However, it remains a great challenge to support interactive exploration of such huge amounts of data. In this paper, we propose Vesta, a system that enables visual exploration of social network data via tag clouds. Under Vesta, users can interactively explore and extract summaries of social network contents published in a certain spatial region during a certain period of time. These summaries are represented using a novel concept called hierarchical tag clouds, which allows users to zoom in/out to explore more specific/general tag summaries. In Vesta, the spatiotemporal data is split into partitions. A novel biclustering approach is applied for each partition to extract summaries, which are then used to construct a hierarchical latent Dirichlet allocation model to generate a topic hierarchy. At runtime, the topic hierarchies in the relevant partitions of the user-specified region are merged in a probabilistic manner to form tag hierarchies, which are used to construct interactive hierarchical tag clouds for visualization. The result of an extensive experimental study verifies the efficiency and effectiveness of Vesta.

近年来，人们在分析社交网络数据方面投入了大量精力。然而，支持对如此庞大的数据进行交互式探索仍然是一个巨大的挑战。在本文中，我们提出了Vesta，一个通过标签云对社交网络数据进行可视化探索的系统。在Vesta下，用户可以对某一空间区域在某一时间段内发布的社交网络内容进行交互式的挖掘和提取摘要。这些摘要使用一种称为分层标记云的新概念来表示，它允许用户放大/缩小以探索更具体/通用的标记摘要。在Vesta中，时空数据被分割成多个分区。采用一种新颖的双聚类方法对每个分区提取摘要，然后将其用于构建分层潜狄利克雷分配模型以生成主题层次结构。在运行时，将用户指定区域的相关分区中的主题层次以概率方式合并形成标签层次，用于构建交互式分层标签云，实现可视化。广泛的实验研究结果验证了Vesta的效率和有效性。

{"title":"Interactive hierarchical tag clouds for summarizing spatiotemporal social contents","authors":"W. Kang, A. Tung, Feng Zhao, Xinyu Li","doi":"10.1109/ICDE.2014.6816707","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816707","url":null,"abstract":"In recent years, much effort has been invested in analyzing social network data. However, it remains a great challenge to support interactive exploration of such huge amounts of data. In this paper, we propose Vesta, a system that enables visual exploration of social network data via tag clouds. Under Vesta, users can interactively explore and extract summaries of social network contents published in a certain spatial region during a certain period of time. These summaries are represented using a novel concept called hierarchical tag clouds, which allows users to zoom in/out to explore more specific/general tag summaries. In Vesta, the spatiotemporal data is split into partitions. A novel biclustering approach is applied for each partition to extract summaries, which are then used to construct a hierarchical latent Dirichlet allocation model to generate a topic hierarchy. At runtime, the topic hierarchies in the relevant partitions of the user-specified region are merged in a probabilistic manner to form tag hierarchies, which are used to construct interactive hierarchical tag clouds for visualization. The result of an extensive experimental study verifies the efficiency and effectiveness of Vesta.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117107664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

iCoDA: Interactive and exploratory data completeness analysis iCoDA:交互式和探索性数据完整性分析

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816747

Ruilin Liu, Guan Wang, Wendy Hui Wang, Flip Korn

The completeness of data is vital to data quality. In this demo, we present iCoDA, a system that supports interactive, exploratory data completeness analysis. iCoDA provides algorithms and tools to generate tableau patterns that concisely summarize the incomplete data under various configuration settings. During the demo, the audience can use iCoDA to interactively explore the tableau patterns generated from incomplete data, with the flexibility of filtering and navigating through different granularity of these patterns. iCoDA supports various visualization methods to the audience for the display of tableau patterns. Overall, we will demonstrate that iCoDA provides sophisticated analysis of data completeness.

数据的完整性对数据质量至关重要。在本演示中，我们介绍了iCoDA，这是一个支持交互式探索性数据完整性分析的系统。iCoDA提供了算法和工具来生成表格模式，这些模式可以简洁地总结各种配置设置下的不完整数据。在演示过程中，观众可以使用iCoDA交互式地探索从不完整数据生成的表格模式，并灵活地过滤和导航这些模式的不同粒度。iCoDA支持各种可视化方法，以便向观众显示表格模式。总之，我们将证明iCoDA提供了对数据完整性的复杂分析。

引用次数: 1

Ranking item features by mining online user-item interactions 通过挖掘在线用户-物品交互对物品特征进行排名

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816673

Sofiane Abbar, Habibur Rahman, Saravanan Thirumuruganathan, Carlos Castillo, Gautam Das

We assume a database of items in which each item is described by a set of attributes, some of which could be multi-valued. We refer to each of the distinct attribute values as a feature. We also assume that we have information about the interactions (such as visits or likes) between a set of users and those items. In our paper, we would like to rank the features of an item using user-item interactions. For instance, if the items are movies, features could be actors, directors or genres, and user-item interaction could be user liking the movie. These information could be used to identify the most important actors for each movie. While users are drawn to an item due to a subset of its features, a user-item interaction only provides an expression of user preference over the entire item, and not its component features. We design algorithms to rank the features of an item depending on whether interaction information is available at aggregated or individual level granularity and extend them to rank composite features (set of features). Our algorithms are based on constrained least squares, network flow and non-trivial adaptations to non-negative matrix factorization. We evaluate our algorithms using both real-world and synthetic datasets.

我们假设有一个项目数据库，其中每个项目都由一组属性描述，其中一些属性可以是多值的。我们将每个不同的属性值称为一个特征。我们还假设我们拥有一组用户与这些项目之间的交互信息(例如访问或点赞)。在我们的论文中，我们希望使用用户-项目交互对项目的特征进行排序。例如，如果项目是电影，功能可以是演员、导演或类型，用户-项目交互可以是用户喜欢的电影。这些信息可以用来确定每部电影中最重要的演员。当用户被某项功能的子集所吸引时，用户-项交互只提供了用户对整个项目的偏好表达，而不是其组件功能。我们设计算法，根据交互信息是否在聚合或单个级别粒度上可用来对项目的特征进行排序，并将其扩展到对组合特征(特征集)进行排序。我们的算法是基于约束最小二乘，网络流和非平凡适应非负矩阵分解。我们使用真实世界和合成数据集来评估我们的算法。

{"title":"Ranking item features by mining online user-item interactions","authors":"Sofiane Abbar, Habibur Rahman, Saravanan Thirumuruganathan, Carlos Castillo, Gautam Das","doi":"10.1109/ICDE.2014.6816673","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816673","url":null,"abstract":"We assume a database of items in which each item is described by a set of attributes, some of which could be multi-valued. We refer to each of the distinct attribute values as a feature. We also assume that we have information about the interactions (such as visits or likes) between a set of users and those items. In our paper, we would like to rank the features of an item using user-item interactions. For instance, if the items are movies, features could be actors, directors or genres, and user-item interaction could be user liking the movie. These information could be used to identify the most important actors for each movie. While users are drawn to an item due to a subset of its features, a user-item interaction only provides an expression of user preference over the entire item, and not its component features. We design algorithms to rank the features of an item depending on whether interaction information is available at aggregated or individual level granularity and extend them to rank composite features (set of features). Our algorithms are based on constrained least squares, network flow and non-trivial adaptations to non-negative matrix factorization. We evaluate our algorithms using both real-world and synthetic datasets.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125136179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A tool for Internet-scale cardinality estimation of XPath queries over distributed semistructured data 用于对分布式半结构化数据上的XPath查询进行互联网规模基数估计的工具

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816758

V. Slavov, A. Katib, P. Rao

We present a novel tool called XGossip for Internet-scale cardinality estimation of XPath queries over distributed XML data. XGossip relies on the principle of gossip, is scalable, decentralized, and can cope with network churn and failures. It employs a novel divide-and-conquer strategy for load balancing and reducing the overall network bandwidth consumption. It has a strong theoretical underpinning and provides provable guarantees on the accuracy of cardinality estimates, the number of messages exchanged, and the total bandwidth usage. In this demonstration, users will experience three engaging scenarios: In the first scenario, they can set up, configure, and deploy XGossip on Amazon Elastic Compute Cloud (EC2). In the second scenario, they can execute XGossip, pose XPath queries, observe in real-time the convergence speed of XGossip, the accuracy of cardinality estimates, the bandwidth usage, and the number of messages exchanged. In the third scenario, they can introduce network churn and failures during the execution of XGossip and observe how these impact the behavior of XGossip.

我们提出了一种名为XGossip的新工具，用于对分布式XML数据上的XPath查询进行互联网规模的基数估计。XGossip依赖于八卦的原理，是可扩展的、去中心化的，并且可以应对网络动荡和故障。它采用了一种新颖的分而治之的策略来实现负载平衡和减少总体网络带宽消耗。它具有强大的理论基础，并在基数估计的准确性、交换的消息数量和总带宽使用方面提供了可证明的保证。在这个演示中，用户将体验到三种引人入胜的场景:在第一个场景中，他们可以在Amazon Elastic Compute Cloud (EC2)上设置、配置和部署XGossip。在第二个场景中，他们可以执行XGossip，提出XPath查询，实时观察XGossip的收敛速度、基数估计的准确性、带宽使用情况和交换的消息数量。在第三种场景中，他们可以在XGossip执行期间引入网络动荡和故障，并观察这些情况如何影响XGossip的行为。

引用次数: 3

Mercury: A memory-constrained spatio-temporal real-time search on microblogs 水星:微博上内存受限的时空实时搜索

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816649

A. Magdy, M. Mokbel, S. Elnikety, Suman Nath, Yuxiong He

This paper presents Mercury; a system for real-time support of top-k spatio-temporal queries on microblogs, where users are able to browse recent microblogs near their locations. With high arrival rates of microblogs, Mercury ensures real-time query response within a tight memory-constrained environment. Mercury bounds its search space to include only those microblogs that have arrived within certain spatial and temporal boundaries, in which only the top-k microblogs, according to a spatio-temporal ranking function, are returned in the search results. Mercury employs: (a) a scalable dynamic in-memory index structure that is capable of digesting all incoming microblogs, (b) an efficient query processor that exploits the in-memory index through spatio-temporal pruning techniques that reduce the number of visited microblogs to return the final answer, (c) an index size tuning module that dynamically finds and adjusts the minimum index size to ensure that incoming queries will be answered accurately, and (d) a load shedding technique that trades slight decrease in query accuracy for significant storage savings. Extensive experimental results based on a real-time Twitter Firehose feed and actual locations of Bing search queries show that Mercury supports high arrival rates of up to 64K microblogs/second and average query latency of 4 msec.

本文介绍了水星;一个实时支持top-k时空查询的微博系统，用户可以在其位置附近浏览最近的微博。由于微博的高到达率，Mercury可以确保在内存受限的环境中进行实时查询响应。Mercury将其搜索空间限定为只包括那些到达一定空间和时间边界的微博，其中根据时空排序函数，只返回搜索结果中排名前k位的微博。汞的使用:(a)一个可扩展的动态内存索引结构，能够消化所有传入的微博;(b)一个高效的查询处理器，通过时空修剪技术利用内存索引，减少访问的微博数量，以返回最终答案;(c)一个索引大小调整模块，动态发现和调整最小索引大小，以确保传入的查询得到准确的回答;(d)一种负载释放技术，它以略微降低查询准确性换取显著的存储节省。基于实时Twitter Firehose feed和Bing搜索查询的实际位置的大量实验结果表明，Mercury支持高达64K微博/秒的高到达率和平均4毫秒的查询延迟。

{"title":"Mercury: A memory-constrained spatio-temporal real-time search on microblogs","authors":"A. Magdy, M. Mokbel, S. Elnikety, Suman Nath, Yuxiong He","doi":"10.1109/ICDE.2014.6816649","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816649","url":null,"abstract":"This paper presents Mercury; a system for real-time support of top-k spatio-temporal queries on microblogs, where users are able to browse recent microblogs near their locations. With high arrival rates of microblogs, Mercury ensures real-time query response within a tight memory-constrained environment. Mercury bounds its search space to include only those microblogs that have arrived within certain spatial and temporal boundaries, in which only the top-k microblogs, according to a spatio-temporal ranking function, are returned in the search results. Mercury employs: (a) a scalable dynamic in-memory index structure that is capable of digesting all incoming microblogs, (b) an efficient query processor that exploits the in-memory index through spatio-temporal pruning techniques that reduce the number of visited microblogs to return the final answer, (c) an index size tuning module that dynamically finds and adjusts the minimum index size to ensure that incoming queries will be answered accurately, and (d) a load shedding technique that trades slight decrease in query accuracy for significant storage savings. Extensive experimental results based on a real-time Twitter Firehose feed and actual locations of Bing search queries show that Mercury supports high arrival rates of up to 64K microblogs/second and average query latency of 4 msec.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133459913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

L2AP: Fast cosine similarity search with prefix L-2 norm bounds L2AP:前缀L-2范数界的快速余弦相似度搜索

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816700

D. Anastasiu, G. Karypis

The All-Pairs similarity search, or self-similarity join problem, finds all pairs of vectors in a high dimensional sparse dataset with a similarity value higher than a given threshold. The problem has been classically solved using a dynamically built inverted index. The search time is reduced by early pruning of candidates using size and value-based bounds on the similarity. In the context of cosine similarity and weighted vectors, leveraging the Cauchy-Schwarz inequality, we propose new ℓ2-norm bounds for reducing the inverted index size, candidate pool size, and the number of full dot-product computations. We tighten previous candidate generation and verification bounds and introduce several new ones to further improve our algorithm's performance. Our new pruning strategies enable significant speedups over baseline approaches, most times outperforming even approximate solutions. We perform an extensive evaluation of our algorithm, L2AP, and compare against state-of-the-art exact and approximate methods, AllPairs, MMJoin, and BayesLSH, across a variety of real-world datasets and similarity thresholds.

all - pair相似性搜索，或自相似性连接问题，在一个高维稀疏数据集中寻找相似性值高于给定阈值的所有向量对。这个问题已经用动态建立的倒排索引经典地解决了。通过在相似性上使用大小和基于值的界限对候选对象进行早期修剪，减少了搜索时间。在余弦相似度和加权向量的背景下，利用Cauchy-Schwarz不等式，我们提出了新的2-范数界限，以减少倒索引大小，候选池大小和完整点积计算的数量。我们收紧了之前的候选生成和验证边界，并引入了几个新的边界来进一步提高算法的性能。我们的新修剪策略可以显著提高基线方法的速度，大多数情况下甚至优于近似解决方案。我们对L2AP算法进行了广泛的评估，并在各种真实数据集和相似阈值上与最先进的精确和近似方法AllPairs、MMJoin和BayesLSH进行了比较。

引用次数: 48

Exploiting hardware transactional memory in main-memory databases 利用主存数据库中的硬件事务内存

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816683

Viktor Leis, A. Kemper, Thomas Neumann

So far, transactional memory-although a promising technique-suffered from the absence of an efficient hardware implementation. The upcoming Haswell microarchitecture from Intel introduces hardware transactional memory (HTM) in mainstream CPUs. HTM allows for efficient concurrent, atomic operations, which is also highly desirable in the context of databases. On the other hand HTM has several limitations that, in general, prevent a one-to-one mapping of database transactions to HTM transactions. In this work we devise several building blocks that can be used to exploit HTM in main-memory databases. We show that HTM allows to achieve nearly lock-free processing of database transactions by carefully controlling the data layout and the access patterns. The HTM component is used for detecting the (infrequent) conflicts, which allows for an optimistic, and thus very low-overhead execution of concurrent transactions.

到目前为止，事务性内存虽然是一种很有前途的技术，但却缺乏有效的硬件实现。英特尔即将推出的Haswell微架构在主流cpu中引入了硬件事务性内存(HTM)。HTM支持高效的并发原子操作，这在数据库上下文中也是非常需要的。另一方面，HTM有一些限制，通常会阻止数据库事务到HTM事务的一对一映射。在这项工作中，我们设计了几个可用于在主存数据库中利用HTM的构建块。我们展示了HTM允许通过仔细控制数据布局和访问模式来实现几乎无锁的数据库事务处理。HTM组件用于检测(不常见的)冲突，这允许以乐观的方式执行并发事务，因此开销非常低。

引用次数: 105

IQ-METER - An evaluation tool for data-transformation systems IQ-METER -数据转换系统的评估工具

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816745

G. Mecca, Paolo Papotti, Donatello Santoro

We call a data-transformation system any system that maps, translates and exchanges data across different representations. Nowadays, data architects are faced with a large variety of transformation tasks, and there is huge number of different approaches and systems that were conceived to solve them. As a consequence, it is very important to be able to evaluate such alternative solutions, in order to pick up the right ones for the problem at hand. To do this, we introduce IQ-Meter, the first comprehensive tool for the evaluation of data-transformation systems. IQ-Meter can be used to benchmark, test, and even learn the best usage of data-transformation tools. It builds on a number of novel algorithms to measure the quality of outputs and the human effort required by a given system, and ultimately measures “how much intelligence” the system brings to the solution of a data-translation task.

我们称任何映射、转换和交换不同表示形式的数据的系统为数据转换系统。如今，数据架构师面临着各种各样的转换任务，并且有大量不同的方法和系统可以用来解决这些任务。因此，能够评估这些替代解决方案非常重要，以便为手头的问题选择正确的解决方案。为此，我们介绍了IQ-Meter，这是第一个用于评估数据转换系统的综合工具。IQ-Meter可用于基准测试、测试，甚至学习数据转换工具的最佳用法。它建立在许多新颖算法的基础上，以衡量输出的质量和给定系统所需的人力，并最终衡量系统为数据翻译任务的解决方案带来了“多少智能”。

引用次数: 6

An efficient sampling method for characterizing points of interests on maps 地图上兴趣点特征的有效采样方法

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816719

P. Wang, Wenbo He, Xue Liu

Recently map services (e.g., Google maps) and location-based online social networks (e.g., Foursquare) attract a lot of attention and businesses. With the increasing popularity of these location-based services, exploring and characterizing points of interests (PoIs) such as restaurants and hotels on maps provides valuable information for applications such as start-up marketing research. Due to the lack of a direct fully access to PoI databases, it is infeasible to exhaustively search and collect all PoIs within a large area using public APIs, which usually impose a limit on the maximum query rate. In this paper, we propose an effective and efficient method to sample PoIs on maps, and give unbiased estimators to calculate PoI statistics such as sum and average aggregates. Experimental results based on real datasets show that our method is efficient, and requires six times less queries than state-of-the-art methods to achieve the same accuracy.

最近，地图服务(如谷歌地图)和基于位置的在线社交网络(如Foursquare)吸引了大量的关注和业务。随着这些基于位置的服务的日益普及，在地图上探索和描述兴趣点(poi)，如餐馆和酒店，为初创企业的市场研究等应用提供了有价值的信息。由于缺乏对PoI数据库的直接完全访问，使用公共api彻底搜索和收集大范围内的所有PoI是不可行的，这通常会对最大查询速率施加限制。在本文中，我们提出了一种在地图上对PoI进行采样的有效方法，并给出了无偏估计来计算PoI统计量，如总和和平均聚集。基于真实数据集的实验结果表明，我们的方法是有效的，并且需要比最先进的方法少6倍的查询来达到相同的精度。

引用次数: 13

Incremental cluster evolution tracking from highly dynamic network data 基于高动态网络数据的增量集群演化跟踪

2014 IEEE 30th International Conference on Data Engineering

Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816635

Pei Lee, L. Lakshmanan, E. Milios

Dynamic networks are commonly found in the current web age. In scenarios like social networks and social media, dynamic networks are noisy, are of large-scale and evolve quickly. In this paper, we focus on the cluster evolution tracking problem on highly dynamic networks, with clear application to event evolution tracking. There are several previous works on data stream clustering using a node-by-node approach for maintaining clusters. However, handling of bulk updates, i.e., a subgraph at a time, is critical for achieving acceptable performance over very large highly dynamic networks. We propose a subgraph-by-subgraph incremental tracking framework for cluster evolution in this paper. To effectively illustrate the techniques in our framework, we consider the event evolution tracking task in social streams as an application, where a social stream and an event are modeled as a dynamic post network and a dynamic cluster respectively. By monitoring through a fading time window, we introduce a skeletal graph to summarize the information in the dynamic network, and formalize cluster evolution patterns using a group of primitive evolution operations and their algebra. Two incremental computation algorithms are developed to maintain clusters and track evolution patterns as time rolls on and the network evolves. Our detailed experimental evaluation on large Twitter datasets demonstrates that our framework can effectively track the complete set of cluster evolution patterns from highly dynamic networks on the fly.

动态网络在当前的网络时代很常见。在社交网络和社交媒体等场景下，动态网络具有噪声大、规模大、演化快的特点。本文主要研究高动态网络上的聚类进化跟踪问题，并将其应用于事件进化跟踪。以前有一些关于数据流集群的工作，使用逐节点的方法来维护集群。然而，处理批量更新(即一次处理一个子图)对于在非常大的高动态网络上实现可接受的性能至关重要。本文提出了一种用于聚类演化的逐子图增量跟踪框架。为了有效地说明我们框架中的技术，我们将社交流中的事件演变跟踪任务视为一个应用程序，其中社交流和事件分别被建模为动态帖子网络和动态集群。通过衰落时间窗监测，引入骨架图来总结动态网络中的信息，并使用一组原始进化操作及其代数形式化聚类进化模式。随着时间的推移和网络的发展，开发了两种增量计算算法来维护集群和跟踪进化模式。我们在大型Twitter数据集上的详细实验评估表明，我们的框架可以有效地跟踪来自高度动态网络的完整集群演化模式。

{"title":"Incremental cluster evolution tracking from highly dynamic network data","authors":"Pei Lee, L. Lakshmanan, E. Milios","doi":"10.1109/ICDE.2014.6816635","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816635","url":null,"abstract":"Dynamic networks are commonly found in the current web age. In scenarios like social networks and social media, dynamic networks are noisy, are of large-scale and evolve quickly. In this paper, we focus on the cluster evolution tracking problem on highly dynamic networks, with clear application to event evolution tracking. There are several previous works on data stream clustering using a node-by-node approach for maintaining clusters. However, handling of bulk updates, i.e., a subgraph at a time, is critical for achieving acceptable performance over very large highly dynamic networks. We propose a subgraph-by-subgraph incremental tracking framework for cluster evolution in this paper. To effectively illustrate the techniques in our framework, we consider the event evolution tracking task in social streams as an application, where a social stream and an event are modeled as a dynamic post network and a dynamic cluster respectively. By monitoring through a fading time window, we introduce a skeletal graph to summarize the information in the dynamic network, and formalize cluster evolution patterns using a group of primitive evolution operations and their algebra. Two incremental computation algorithms are developed to maintain clusters and track evolution patterns as time rolls on and the network evolves. Our detailed experimental evaluation on large Twitter datasets demonstrates that our framework can effectively track the complete set of cluster evolution patterns from highly dynamic networks on the fly.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122006405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 IEEE 30th International Conference on Data Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀