Proceedings. International Database Engineering and Applications Symposium最新文献

英文中文

CMOA: continuous moving object anonymization CMOA:连续移动对象匿名化

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351486

Tsubasa Takahashi, Shinya Miyakawa

This paper proposes a continuous anonymization method for a trajectory stream. In today's mobile environment, positions of moving objects are frequently sensed and collected. For real-time movement pattern analyses of people and automobiles, trajectory streams have attracted a lot of attention. Trajectory streams lead to sensitive locations, such as homes and personal hospitals. Additionally, a set of spatio-temporal data might identify a user from a trajectory stream. Therefore, publishing original trajectory streams may cause critical breaches of privacy. To protect privacy of users, we need a mechanism which makes it difficult to identify users from crowds of trajectory streams. Several techniques for anonymizing trajectories have been proposed. Anonymized trajectories can be published without concerning about privacy issues. However, for the continuous publishing of trajectory streams, existing trajectory anonymization methods are not suitable because they anonymize the overall trajectories at a time. If the existing methods are applied in the continuous publishing, the resolution of anonymized trajectory is hugely degraded or trace-ability is lost. In this paper, we propose an anonymization technique for a trajectory stream. The method continuously anonymizes trajectory streams one by one, and dynamically reforms anonymized trajectory streams to improve the resolution. The experiments showed that our method could keep the resolution at a constant level.

提出了一种弹道流的连续匿名化方法。在当今的移动环境中，移动物体的位置经常被感知和收集。对于人和汽车的实时运动模式分析，轨迹流已经引起了人们的广泛关注。轨迹流指向敏感地点，如家庭和私人医院。此外，一组时空数据可以从轨迹流中识别用户。因此，发布原始轨迹流可能会严重侵犯隐私。为了保护用户的隐私，我们需要一种难以从众多轨迹流中识别用户的机制。已经提出了几种匿名化轨迹的技术。匿名轨迹可以在不考虑隐私问题的情况下发布。然而，对于连续发布的轨迹流，现有的轨迹匿名化方法由于一次性对整个轨迹进行匿名化而不适用。如果将现有的方法应用于连续发布，会大大降低匿名轨迹的分辨率或失去可追溯性。本文提出了一种轨迹流的匿名化技术。该方法对逐条轨迹流进行连续匿名化，并对匿名轨迹流进行动态改造以提高分辨率。实验表明，该方法可以使分辨率保持在恒定水平。

{"title":"CMOA: continuous moving object anonymization","authors":"Tsubasa Takahashi, Shinya Miyakawa","doi":"10.1145/2351476.2351486","DOIUrl":"https://doi.org/10.1145/2351476.2351486","url":null,"abstract":"This paper proposes a continuous anonymization method for a trajectory stream. In today's mobile environment, positions of moving objects are frequently sensed and collected. For real-time movement pattern analyses of people and automobiles, trajectory streams have attracted a lot of attention. Trajectory streams lead to sensitive locations, such as homes and personal hospitals. Additionally, a set of spatio-temporal data might identify a user from a trajectory stream. Therefore, publishing original trajectory streams may cause critical breaches of privacy. To protect privacy of users, we need a mechanism which makes it difficult to identify users from crowds of trajectory streams. Several techniques for anonymizing trajectories have been proposed. Anonymized trajectories can be published without concerning about privacy issues. However, for the continuous publishing of trajectory streams, existing trajectory anonymization methods are not suitable because they anonymize the overall trajectories at a time. If the existing methods are applied in the continuous publishing, the resolution of anonymized trajectory is hugely degraded or trace-ability is lost. In this paper, we propose an anonymization technique for a trajectory stream. The method continuously anonymizes trajectory streams one by one, and dynamically reforms anonymized trajectory streams to improve the resolution. The experiments showed that our method could keep the resolution at a constant level.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"53 1","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81898122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On the semantics of ST4SQL, a multidimensional spatio-temporal query language ST4SQL是一种多维时空查询语言

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351504

G. Pozzani, Combi Carlo

In Pozzani and Combi proposed ST4SQL, an SQL-based query language extending SQL with new constructs for querying spatio-temporal data. In particular ST4SQL deals with different temporal and spatial semantics, allowing one to specify how the system has to manage temporal and spatial dimensions for evaluating queries. Moreover, the query language introduces new constructs for grouping data with respect to temporal and spatial dimensions. All proposed constructs take into account data qualified with granularities [2]. In this paper we briefly present ST4SQL and we present, also through some examples, its semantics with respect to the standard SQL one.

在文章中，Pozzani和Combi提出了ST4SQL，这是一种基于SQL的查询语言，通过新的结构扩展了查询时空数据的SQL。特别是ST4SQL处理不同的时间和空间语义，允许指定系统如何管理时间和空间维度以评估查询。此外，查询语言还引入了根据时间和空间维度对数据进行分组的新结构。所有提出的构造都考虑了具有粒度的数据[2]。在本文中，我们简要介绍了ST4SQL，并通过一些示例介绍了其相对于标准SQL的语义。

引用次数: 0

SciQL: bridging the gap between science and relational DBMS SciQL:弥合科学和关系DBMS之间的鸿沟

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076639

Y. Zhang, M. Kersten, M. Ivanova, N. Nes

Scientific discoveries increasingly rely on the ability to efficiently grind massive amounts of experimental data using database technologies. To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we propose SciQL (pronounced as 'cycle'), the first SQL-based query language for scientific applications with both tables and arrays as first class citizens. It provides a seamless symbiosis of array-, set- and sequence-interpretations. A key innovation is the extension of value-based grouping of SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between elements positions. This leads to a generalisation of window-based query processing with wide applicability in science domains. This paper describes the main language features of SciQL and illustrates it using time-series concepts.

科学发现越来越依赖于使用数据库技术有效地处理大量实验数据的能力。为了弥合数据密集型研究领域的需求与当前DBMS技术之间的差距，我们提出了SciQL(发音为'cycle')，这是第一个基于sql的查询语言，用于将表和数组作为一级公民的科学应用程序。它提供了数组、集合和序列解释的无缝共生。一个关键的创新是SQL:2003中基于值的分组扩展为结构分组，即基于元素位置之间显式关系的固定大小和无界组。这导致了基于窗口的查询处理的普遍化，在科学领域具有广泛的适用性。本文描述了SciQL的主要语言特性，并用时间序列概念对其进行了说明。

引用次数: 81

Answering complex structured queries over the deep web 在深度网络上回答复杂的结构化查询

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076638

Fan Wang, G. Agrawal

A large part of the data on the World Wide Web resides in the deep web. Most deep web data sources only support simple text interfaces for querying them, which are easy to use but have limited expressive power. Therefore, processing complex structured queries over the deep web currently involves a large amount of manual work. Our work focuses on addressing the existing gap between users' need of expressing and executing complex structured queries over the deep web, and the simple and limited input interfaces of the existing deep web data sources. This paper presents a query planning problem formulation, with novel algorithms and optimizations, for enabling a high-level and highly expressive query language to be supported over deep web data sources. We particularly target three types of complex queries, which are select-project-join queries, aggregation queries, and nested queries. We have developed query planning algorithms to generate query plans for each of these, and propose several optimization techniques to further speedup query plan execution. In our experiments, we show our algorithm has good scalability and furthermore, for over 90% of the experimental queries, the execution time and result quality of the query plans generated by our algorithms are very close to the optimal plans generated by an exhaustive search algorithm. Furthermore, our optimization techniques outperform an existing optimization method in terms of both reduction in transmitted data records and query execution speedups.

万维网上的大部分数据驻留在深网中。大多数深层网络数据源只支持简单的文本界面来查询它们，这些界面易于使用，但表达能力有限。因此，在深度网络上处理复杂的结构化查询目前涉及大量的手工工作。我们的工作重点是解决用户在深度网络上表达和执行复杂结构化查询的需求与现有深度网络数据源的简单和有限输入接口之间的现有差距。本文提出了一种查询规划问题公式，采用新颖的算法和优化，以支持深度网络数据源上的高级和高表现力的查询语言。我们特别针对三种类型的复杂查询，它们是选择项目连接查询、聚合查询和嵌套查询。我们已经开发了查询规划算法来为每个查询生成查询计划，并提出了几种优化技术来进一步加速查询计划的执行。在实验中，我们证明了我们的算法具有良好的可扩展性，并且对于90%以上的实验查询，我们的算法生成的查询计划的执行时间和结果质量非常接近穷举搜索算法生成的最优计划。此外，我们的优化技术在减少传输数据记录和提高查询执行速度方面优于现有的优化方法。

{"title":"Answering complex structured queries over the deep web","authors":"Fan Wang, G. Agrawal","doi":"10.1145/2076623.2076638","DOIUrl":"https://doi.org/10.1145/2076623.2076638","url":null,"abstract":"A large part of the data on the World Wide Web resides in the deep web. Most deep web data sources only support simple text interfaces for querying them, which are easy to use but have limited expressive power. Therefore, processing complex structured queries over the deep web currently involves a large amount of manual work. Our work focuses on addressing the existing gap between users' need of expressing and executing complex structured queries over the deep web, and the simple and limited input interfaces of the existing deep web data sources.\u0000 This paper presents a query planning problem formulation, with novel algorithms and optimizations, for enabling a high-level and highly expressive query language to be supported over deep web data sources. We particularly target three types of complex queries, which are select-project-join queries, aggregation queries, and nested queries. We have developed query planning algorithms to generate query plans for each of these, and propose several optimization techniques to further speedup query plan execution.\u0000 In our experiments, we show our algorithm has good scalability and furthermore, for over 90% of the experimental queries, the execution time and result quality of the query plans generated by our algorithms are very close to the optimal plans generated by an exhaustive search algorithm. Furthermore, our optimization techniques outperform an existing optimization method in terms of both reduction in transmitted data records and query execution speedups.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"16 1","pages":"115-123"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73769689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

M-TOP: multi-target operator placement of query graphs for data streams M-TOP:数据流查询图的多目标操作符放置

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076631

N. Cipriani, O. Schiller, B. Mitschang

Nowadays, many applications processes stream-based data, such as financial market analysis, network intrusion detection, or visualization applications. To process stream-based data in an application-independent manner, distributed stream processing systems emerged. They typically translate a query to an operator graph, place the operators to stream processing nodes, and execute them to process the streamed data. The operator placement is crucial in such systems, as it deeply influences query execution. Often, different stream-based applications require dedicated placement of query graphs according to their specific objectives, e.g. bandwidth not less than 500 MBit/s and costs not more that 1 cost unit. This fact constraints operator placement. Existing approaches do not take into account application-specific objectives, thus not reflecting application-specific placement decisions. As objectives might conflict among each other, operator placement is subject to delicate trade-offs, such as bandwidth maximization is more important than cost reduction. Thus, the challenge is to find a solution which considers the application-specific objectives and their trade-offs. We present M-TOP, an QoS-aware multi-target operator placement framework for data stream systems. Particularly, we propose an operator placement strategy considering application-specific targets consisting of objectives, their respective trade-offs specifications, bottleneck conditions, and ranking schemes to compute a suitable placement. We integrated M-TOP into NexusDS, our distributed data stream processing middleware, and provide an experimental evaluation to show the effectiveness of M-TOP.

如今，许多应用程序处理基于流的数据，例如金融市场分析、网络入侵检测或可视化应用程序。为了以独立于应用程序的方式处理基于流的数据，分布式流处理系统应运而生。它们通常将查询转换为操作符图，将操作符放置到流处理节点，并执行它们来处理流数据。在这样的系统中，操作符的位置是至关重要的，因为它深刻地影响查询的执行。通常，不同的基于流的应用程序需要根据其特定目标专门放置查询图，例如，带宽不低于500mbit /s，成本不超过1个成本单位。这一事实限制了操作符的放置。现有的方法没有考虑特定于应用程序的目标，因此没有反映特定于应用程序的放置决定。由于目标之间可能会相互冲突，因此运营商的位置受制于微妙的权衡，例如带宽最大化比降低成本更重要。因此，挑战在于找到一个考虑特定于应用程序的目标及其权衡的解决方案。我们提出了M-TOP，一个qos感知的数据流系统的多目标操作员放置框架。特别是，我们提出了一种考虑特定应用目标的操作员安置策略，包括目标、各自的权衡规范、瓶颈条件和排名方案，以计算合适的安置。我们将M-TOP集成到我们的分布式数据流处理中间件NexusDS中，并提供了一个实验评估来证明M-TOP的有效性。

{"title":"M-TOP: multi-target operator placement of query graphs for data streams","authors":"N. Cipriani, O. Schiller, B. Mitschang","doi":"10.1145/2076623.2076631","DOIUrl":"https://doi.org/10.1145/2076623.2076631","url":null,"abstract":"Nowadays, many applications processes stream-based data, such as financial market analysis, network intrusion detection, or visualization applications. To process stream-based data in an application-independent manner, distributed stream processing systems emerged. They typically translate a query to an operator graph, place the operators to stream processing nodes, and execute them to process the streamed data. The operator placement is crucial in such systems, as it deeply influences query execution. Often, different stream-based applications require dedicated placement of query graphs according to their specific objectives, e.g. bandwidth not less than 500 MBit/s and costs not more that 1 cost unit. This fact constraints operator placement. Existing approaches do not take into account application-specific objectives, thus not reflecting application-specific placement decisions. As objectives might conflict among each other, operator placement is subject to delicate trade-offs, such as bandwidth maximization is more important than cost reduction. Thus, the challenge is to find a solution which considers the application-specific objectives and their trade-offs.\u0000 We present M-TOP, an QoS-aware multi-target operator placement framework for data stream systems. Particularly, we propose an operator placement strategy considering application-specific targets consisting of objectives, their respective trade-offs specifications, bottleneck conditions, and ranking schemes to compute a suitable placement. We integrated M-TOP into NexusDS, our distributed data stream processing middleware, and provide an experimental evaluation to show the effectiveness of M-TOP.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"88 1","pages":"52-60"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80264455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

ParallelGDB: a parallel graph database based on cache specialization ParallelGDB:一个基于缓存专门化的并行图形数据库

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076643

Luis Barguñó, V. Muntés-Mulero, David Dominguez-Sal, P. Valduriez

The need for managing massive attributed graphs is becoming common in many areas such as recommendation systems, proteomics analysis, social network analysis or bibliographic analysis. This is making it necessary to move towards parallel systems that allow managing graph databases containing millions of vertices and edges. Previous work on distributed graph databases has focused on finding ways to partition the graph to reduce network traffic and improve execution time. However, partitioning a graph and keeping the information regarding the location of vertices might be unrealistic for massive graphs. In this paper, we propose Parallel-GDB, a new system based on specializing the local caches of any node in this system, providing a better cache hit ratio. ParallelGDB uses a random graph partitioning, avoiding complex partition methods based on the graph topology, that usually require managing extra data structures. This proposed system provides an efficient environment for distributed graph databases.

在推荐系统、蛋白质组学分析、社会网络分析或书目分析等许多领域，管理大量属性图的需求正变得越来越普遍。这使得有必要转向并行系统，以允许管理包含数百万个顶点和边的图形数据库。以前关于分布式图数据库的工作主要集中在寻找划分图的方法，以减少网络流量和提高执行时间。然而，对一个图进行分区并保留关于顶点位置的信息对于大规模图来说可能是不现实的。在本文中，我们提出了Parallel-GDB，一种基于专门化任何节点的本地缓存的新系统，提供了更好的缓存命中率。ParallelGDB使用随机图分区，避免了基于图拓扑的复杂分区方法，这通常需要管理额外的数据结构。该系统为分布式图数据库提供了一个高效的运行环境。

引用次数: 12

A statically typed query language for property graphs 属性图的静态类型查询语言

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076653

Norbert Tausch, M. Philippsen, Josef Adersberger

Applications that work on network-oriented data often use property graph models. Although their graph data is represented by an object-oriented model, current approaches cannot define statically typed vertex and edge sets. Thus, custom graph operations use untyped input and output sets and cannot exploit crucial concepts like polymorphism. Not only do illegal calling contexts or arguments result in runtime errors or unexpected query results, but also the resulting code tends to be error prone, unclear, and thus hard to maintain. To solve these problems, we extend the property graph model with typed graph classes and open it up to polymorphism. Our approach is an internal domain specific language for graph traversals based on the object-oriented and functional programming language Scala. A case study emphasizes the usability of our framework.

处理面向网络数据的应用程序通常使用属性图模型。虽然它们的图形数据由面向对象模型表示，但目前的方法不能定义静态类型的顶点集和边集。因此，自定义图操作使用无类型的输入和输出集，不能利用多态性等关键概念。非法调用上下文或参数不仅会导致运行时错误或意外的查询结果，而且生成的代码也容易出错、不清晰，因此难以维护。为了解决这些问题，我们将属性图模型扩展为类型化的图类，并将其开放为多态性。我们的方法是一种内部领域特定的语言，用于基于面向对象和函数式编程语言Scala的图形遍历。案例研究强调了我们框架的可用性。

引用次数: 5

Chimera: data sharing flexibility, shared nothing simplicity 奇美拉:数据共享灵活，无共享简单

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076642

U. F. Minhas, D. Lomet, C. Thekkath

The current database market is fairly evenly split between shared nothing and data sharing systems. While shared nothing systems are easier to build and scale, data sharing systems have advantages in load balancing. In this paper we explore adding data sharing functionality as an extension to a shared nothing database system. Our approach isolates the data sharing functionality from the rest of the system and relies on well-studied, robust techniques to provide the data sharing extension. This reduces the difficulty in providing data sharing functionality, yet provides much of the flexibility of a data sharing system. We present the design and implementation of Chimera -- a hybrid database system, targeted at load balancing for many workloads, and scale-out for read-mostly workloads. The results of our experiments demonstrate that we can achieve almost linear scalability and effective load balancing with less than 2% overhead during normal operation.

当前的数据库市场在无共享系统和数据共享系统之间平分。虽然无共享系统更容易构建和扩展，但数据共享系统在负载平衡方面具有优势。在本文中，我们将探索添加数据共享功能作为无共享数据库系统的扩展。我们的方法将数据共享功能与系统的其余部分隔离开来，并依赖于经过充分研究的健壮技术来提供数据共享扩展。这降低了提供数据共享功能的难度，同时提供了数据共享系统的许多灵活性。我们介绍了Chimera的设计和实现，这是一个混合数据库系统，旨在为许多工作负载实现负载平衡，并为大多数读取工作负载进行扩展。我们的实验结果表明，我们可以在正常运行时以不到2%的开销实现几乎线性的可扩展性和有效的负载平衡。

引用次数: 5

Flexible approximate counting 灵活的近似计数

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076655

S. Mitchell, D. Day

Approximate counting [18] is useful for data stream and database summarization. It can help in many settings that allow only one pass over the data, want low memory usage, and can accept some relative error. Approximate counters use fewer bits; we focus on 8-bits but our results are general. These small counters represent a sparse sequence of larger numbers. Counters are incremented probabilistically based on the spacing between the numbers they represent. Our contributions are a customized distribution of counter values and efficient strategies for deciding when to increment them. At run-time, users may independently select the spacing (accuracy) of the approximate counter for small, medium, and large values. We allow the user to select the maximum number to count up to, and our algorithm will select the exponential base of the spacing. These provide additional flexibility over both classic and Csűrös's [4] floating-point approximate counting. These provide additional structure, a useful schema for users, over Kruskal and Greenberg [13]. We describe two new and efficient strategies for incrementing approximate counters: use a deterministic countdown or sample from a geometric distribution. In Csűrös all increments are powers of two, so random bits rather than full random numbers can be used. We also provide the option to use powers-of-two but retain flexibility. We show when each strategy is fastest in our implementation.

近似计数[18]对数据流和数据库汇总很有用。它可以在许多设置中提供帮助，这些设置只允许一次传递数据，需要低内存使用，并且可以接受一些相对错误。近似计数器使用更少的位;我们专注于8位，但我们的结果是一般的。这些小计数器表示较大数字的稀疏序列。计数器根据它们所代表的数字之间的间隔按概率递增。我们的贡献是定制计数器值的分布和决定何时增加它们的有效策略。在运行时，用户可以独立地为小、中、大值选择近似计数器的间距(精度)。我们允许用户选择要计数到的最大数字，我们的算法将选择间隔的指数基数。这为经典和Csűrös[4]的浮点近似计数提供了额外的灵活性。在Kruskal和Greenberg[13]之上，这些提供了额外的结构，为用户提供了有用的模式。我们描述了增加近似计数器的两种新的有效策略:使用确定性倒计时或来自几何分布的样本。在Csűrös中，所有的增量都是2的幂，所以可以使用随机位而不是全随机数。我们还提供了使用2的幂的选项，但保留了灵活性。我们展示了每个策略在我们的实施中何时是最快的。

{"title":"Flexible approximate counting","authors":"S. Mitchell, D. Day","doi":"10.1145/2076623.2076655","DOIUrl":"https://doi.org/10.1145/2076623.2076655","url":null,"abstract":"Approximate counting [18] is useful for data stream and database summarization. It can help in many settings that allow only one pass over the data, want low memory usage, and can accept some relative error. Approximate counters use fewer bits; we focus on 8-bits but our results are general. These small counters represent a sparse sequence of larger numbers. Counters are incremented probabilistically based on the spacing between the numbers they represent. Our contributions are a customized distribution of counter values and efficient strategies for deciding when to increment them.\u0000 At run-time, users may independently select the spacing (accuracy) of the approximate counter for small, medium, and large values. We allow the user to select the maximum number to count up to, and our algorithm will select the exponential base of the spacing. These provide additional flexibility over both classic and Csűrös's [4] floating-point approximate counting. These provide additional structure, a useful schema for users, over Kruskal and Greenberg [13].\u0000 We describe two new and efficient strategies for incrementing approximate counters: use a deterministic countdown or sample from a geometric distribution. In Csűrös all increments are powers of two, so random bits rather than full random numbers can be used. We also provide the option to use powers-of-two but retain flexibility. We show when each strategy is fastest in our implementation.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"14 1","pages":"233-239"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73540828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Databases on the web: national web domain survey 网络上的数据库:全国网络域调查

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076646

Denis Shestakov

The deep Web, the part of the Web consisting of web pages filled with information from myriads of online databases, is to date relatively unexplored. Even its basic characteristics such as, for instance, the number of searchable databases on the Web are disputable. In this paper, we address the problem of accurate estimation of the deep Web by sampling one national web domain. We report some of our results obtained when surveying the Russian Web. The survey findings, namely the size estimates of the deep Web, could be useful for further studies to handle data in the deep Web.

深网是网络的一部分，由网页组成，其中充满了来自无数在线数据库的信息，迄今为止还相对未被开发。即使是它的基本特征，例如，网络上可搜索数据库的数量，也是有争议的。在本文中，我们通过采样一个国家网络域来解决深度网络的准确估计问题。我们报告了我们在调查俄罗斯网络时获得的一些结果。调查结果，即深度网络的大小估计，可能对进一步研究处理深度网络中的数据有用。

引用次数: 30

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. International Database Engineering and Applications Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀