首页 > 最新文献

Proceedings. International Database Engineering and Applications Symposium最新文献

英文 中文
A data masking technique for data warehouses 用于数据仓库的数据屏蔽技术
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076632
R. Santos, Jorge Bernardino, M. Vieira
Data Warehouses (DWs) are the enterprise's most valuable asset in what concerns critical business information, making them an appealing target for attackers. Packaged database encryption solutions are considered the best solution to protect sensitive data. However, given the volume of data typically processed by DW queries, the existing encryption solutions heavily increase storage space and introduce very large overheads in query response time, due to decryption costs. In many cases, this performance degradation makes encryption unfeasible for use in DWs. In this paper we propose a transparent data masking solution for numerical values in DWs based on the mathematical modulus operator, which can be used without changing user application and DBMS source code. Our solution provides strong data security while introducing small overheads in both storage space and database performance. Several experimental evaluations using the TPC-H decision support benchmark and a real-world DW are included. The results show the overall efficiency of our proposal, demonstrating that it is a valid alternative to existing standard encryption routines for enforcing data confidentiality in DWs.
数据仓库(dw)是企业在关键业务信息方面最有价值的资产,使其成为攻击者的一个有吸引力的目标。打包的数据库加密解决方案被认为是保护敏感数据的最佳解决方案。然而,考虑到DW查询通常处理的数据量,由于解密成本,现有的加密解决方案大大增加了存储空间,并在查询响应时间上引入了非常大的开销。在许多情况下,这种性能下降使得加密不适合在dw中使用。本文提出了一种基于数学模数运算符的透明数据屏蔽方法,该方法可以在不改变用户应用程序和DBMS源代码的情况下使用。我们的解决方案提供了强大的数据安全性,同时在存储空间和数据库性能方面引入了很小的开销。包括使用TPC-H决策支持基准和实际DW的几个实验评估。结果显示了我们的建议的总体效率,表明它是现有标准加密例程的有效替代方案,用于在dw中强制执行数据机密性。
{"title":"A data masking technique for data warehouses","authors":"R. Santos, Jorge Bernardino, M. Vieira","doi":"10.1145/2076623.2076632","DOIUrl":"https://doi.org/10.1145/2076623.2076632","url":null,"abstract":"Data Warehouses (DWs) are the enterprise's most valuable asset in what concerns critical business information, making them an appealing target for attackers. Packaged database encryption solutions are considered the best solution to protect sensitive data. However, given the volume of data typically processed by DW queries, the existing encryption solutions heavily increase storage space and introduce very large overheads in query response time, due to decryption costs. In many cases, this performance degradation makes encryption unfeasible for use in DWs. In this paper we propose a transparent data masking solution for numerical values in DWs based on the mathematical modulus operator, which can be used without changing user application and DBMS source code. Our solution provides strong data security while introducing small overheads in both storage space and database performance. Several experimental evaluations using the TPC-H decision support benchmark and a real-world DW are included. The results show the overall efficiency of our proposal, demonstrating that it is a valid alternative to existing standard encryption routines for enforcing data confidentiality in DWs.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"3 1","pages":"61-69"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91041675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Query language constructs for provenance 查询语言构造的来源
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076661
Murali Mani, M. Alawa, A. Kalyanasundaram
Provenance that records the derivation history of data is useful for a wide variety of applications, including those where an audit trail needs to be provided, where the sources and the trust-level attributed to the sources contribute to determining the trust-level in results etc. There have been different efforts in the past for representing provenance information, the most notable being the Open Provenance Model (OPM). OPM defines structures for representing the provenance information as a graph with nodes and edges, and also specifies inference queries. Our work builds on these by proposing query language constructs, that the users will find useful for manipulating the provenance information. Rather than specifying a query language, we define two classes of algebraic constructs: content-based operators that operate on the content of nodes and edges, and structure-based operators that operate on the graph structure of the provenance graph. These content-based and the structure-based constructs can be combined to express a wide variety of interesting queries on the provenance data that go much beyond simple inference queries as expressible using Datalog/SQL.
记录数据派生历史的来源对于各种各样的应用程序都很有用,包括那些需要提供审计跟踪的应用程序,在这些应用程序中,源和归属于源的信任级别有助于确定结果中的信任级别等。在过去有不同的方法来表示来源信息,最著名的是开放来源模型(OPM)。OPM定义了将来源信息表示为带有节点和边的图的结构,还指定了推理查询。我们的工作建立在这些基础上,提出了查询语言结构,用户会发现这些查询语言结构对操作来源信息很有用。我们没有指定查询语言,而是定义了两类代数构造:基于内容的操作符(操作节点和边的内容)和基于结构的操作符(操作源图的图结构)。这些基于内容的构造和基于结构的构造可以组合在一起,以表达对来源数据的各种有趣查询,这些查询远远超出了使用Datalog/SQL可表达的简单推理查询。
{"title":"Query language constructs for provenance","authors":"Murali Mani, M. Alawa, A. Kalyanasundaram","doi":"10.1145/2076623.2076661","DOIUrl":"https://doi.org/10.1145/2076623.2076661","url":null,"abstract":"Provenance that records the derivation history of data is useful for a wide variety of applications, including those where an audit trail needs to be provided, where the sources and the trust-level attributed to the sources contribute to determining the trust-level in results etc. There have been different efforts in the past for representing provenance information, the most notable being the Open Provenance Model (OPM). OPM defines structures for representing the provenance information as a graph with nodes and edges, and also specifies inference queries. Our work builds on these by proposing query language constructs, that the users will find useful for manipulating the provenance information. Rather than specifying a query language, we define two classes of algebraic constructs: content-based operators that operate on the content of nodes and edges, and structure-based operators that operate on the graph structure of the provenance graph. These content-based and the structure-based constructs can be combined to express a wide variety of interesting queries on the provenance data that go much beyond simple inference queries as expressible using Datalog/SQL.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"29 1","pages":"254-255"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85494770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Trajectory data analysis using complex networks 利用复杂网络进行轨迹数据分析
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076627
Igo Ramalho Brilhante, J. Macêdo, C. Renso, M. Casanova
A massive amount of data on moving object trajectories is available today. However, it is still a major challenge to process such information in order to explain moving object interactions, which could help in revealing non-trivial behavioral patterns. To that end, we consider a complex networks-based representation of trajectory data. Frequent encounters among moving objects (trajectory encounters) are used to create the network edges whereas nodes represent trajectories. A real trajectory dataset of vehicles moving within the City of Milan allows us to study the structure of vehicle interactions and validate our method. We create seven networks and compute the clustering coefficient, and the average shortest path length comparing them with those of the Erdős-Rényi model. Our analysis shows that all computed trajectory networks have the small world effect and the scale-free feature similar to the internet and biological networks. Finally, we discuss how these results could be interpreted in the light of the traffic application domain.
现在有大量关于移动物体轨迹的数据。然而,为了解释移动物体的相互作用,处理这些信息仍然是一个主要的挑战,这可能有助于揭示重要的行为模式。为此,我们考虑了基于复杂网络的轨迹数据表示。移动对象之间的频繁相遇(轨迹相遇)用于创建网络边缘,而节点表示轨迹。在米兰市内移动的车辆的真实轨迹数据集允许我们研究车辆相互作用的结构并验证我们的方法。我们创建了七个网络,并计算了聚类系数,并将它们与Erdős-Rényi模型的平均最短路径长度进行了比较。我们的分析表明,所有计算轨迹网络都具有类似于互联网和生物网络的小世界效应和无标度特征。最后,我们讨论了如何根据交通应用领域来解释这些结果。
{"title":"Trajectory data analysis using complex networks","authors":"Igo Ramalho Brilhante, J. Macêdo, C. Renso, M. Casanova","doi":"10.1145/2076623.2076627","DOIUrl":"https://doi.org/10.1145/2076623.2076627","url":null,"abstract":"A massive amount of data on moving object trajectories is available today. However, it is still a major challenge to process such information in order to explain moving object interactions, which could help in revealing non-trivial behavioral patterns. To that end, we consider a complex networks-based representation of trajectory data. Frequent encounters among moving objects (trajectory encounters) are used to create the network edges whereas nodes represent trajectories. A real trajectory dataset of vehicles moving within the City of Milan allows us to study the structure of vehicle interactions and validate our method. We create seven networks and compute the clustering coefficient, and the average shortest path length comparing them with those of the Erdős-Rényi model. Our analysis shows that all computed trajectory networks have the small world effect and the scale-free feature similar to the internet and biological networks. Finally, we discuss how these results could be interpreted in the light of the traffic application domain.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"18 1","pages":"17-25"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89609696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A landmark-model based system for mining frequent patterns from uncertain data streams 基于里程碑模型的不确定数据流频繁模式挖掘系统
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076659
C. Leung, Fan Jiang, Y. Hayduk
Huge volumes of streaming data have been generated by sensors for applications such as environment surveillance. Partially due to the inherited limitation of sensors, these continuous streaming data can be uncertain. Over the past few years, algorithms have been proposed to apply the sliding window or time-fading window model to mine frequent patterns from streams of uncertain data. However, there are also other models to process data streams. In this paper, we propose a landmark-model based system for mining frequent patterns from streams of uncertain data.
环境监测等应用的传感器产生了大量的流数据。部分由于传感器的固有限制,这些连续流数据可能是不确定的。在过去的几年中,已经提出了应用滑动窗口或时间衰落窗口模型从不确定数据流中挖掘频繁模式的算法。然而,也有其他模型来处理数据流。在本文中,我们提出了一个基于里程碑模型的系统,用于从不确定数据流中挖掘频繁模式。
{"title":"A landmark-model based system for mining frequent patterns from uncertain data streams","authors":"C. Leung, Fan Jiang, Y. Hayduk","doi":"10.1145/2076623.2076659","DOIUrl":"https://doi.org/10.1145/2076623.2076659","url":null,"abstract":"Huge volumes of streaming data have been generated by sensors for applications such as environment surveillance. Partially due to the inherited limitation of sensors, these continuous streaming data can be uncertain. Over the past few years, algorithms have been proposed to apply the sliding window or time-fading window model to mine frequent patterns from streams of uncertain data. However, there are also other models to process data streams. In this paper, we propose a landmark-model based system for mining frequent patterns from streams of uncertain data.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"81 1","pages":"249-250"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89640995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A clustering-based visualization of colocation patterns 基于聚类的托管模式可视化
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076633
Elise Desmier, Frédéric Flouvat, D. Gay, Nazha Selmaoui-Folcher
Extraction of interesting colocations in geo-referenced data is one of the major tasks in spatial pattern mining. The goal is to find sets of spatial object-types with instances located in the same neighborhood. In this context, the main drawback is the visualization and interpretation of extracted patterns by domain experts. Indeed, common textual representation of colocations loses important spatial information such as the position, the orientation or the spatial distribution of the patterns. To overcome this problem, we propose a new clustering-based visualization technique deeply integrated in the colocation mining algorithm. This new simple, concise and intuitive cartographic visualization considers both spatial information and expert practices. This proposition has been integrated in a Geographic Information System and experimented on a real-world geological data set. Domain experts confirm the added-value of this visualization approach.
空间模式挖掘的主要任务之一是在地理参考数据中提取有趣的位置。目标是找到具有位于相同邻域的实例的空间对象类型集。在这种情况下,主要的缺点是由领域专家对提取的模式进行可视化和解释。事实上,常见的搭配文本表示会丢失重要的空间信息,如模式的位置、方向或空间分布。为了克服这一问题,我们提出了一种新的基于聚类的可视化技术,该技术与主机挖掘算法深度集成。这种新的简单、简洁和直观的地图可视化考虑了空间信息和专家实践。这个命题已经被整合到一个地理信息系统中,并在一个真实的地质数据集上进行了实验。领域专家证实了这种可视化方法的附加价值。
{"title":"A clustering-based visualization of colocation patterns","authors":"Elise Desmier, Frédéric Flouvat, D. Gay, Nazha Selmaoui-Folcher","doi":"10.1145/2076623.2076633","DOIUrl":"https://doi.org/10.1145/2076623.2076633","url":null,"abstract":"Extraction of interesting colocations in geo-referenced data is one of the major tasks in spatial pattern mining. The goal is to find sets of spatial object-types with instances located in the same neighborhood. In this context, the main drawback is the visualization and interpretation of extracted patterns by domain experts. Indeed, common textual representation of colocations loses important spatial information such as the position, the orientation or the spatial distribution of the patterns. To overcome this problem, we propose a new clustering-based visualization technique deeply integrated in the colocation mining algorithm. This new simple, concise and intuitive cartographic visualization considers both spatial information and expert practices. This proposition has been integrated in a Geographic Information System and experimented on a real-world geological data set. Domain experts confirm the added-value of this visualization approach.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"130 1","pages":"70-78"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74694518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Cache-conscious data placement in an in-memory key-value store 在内存中的键值存储中具有缓存意识的数据放置
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076640
Christian Tinnefeld, A. Zeier, H. Plattner
Key-value stores which keep the data entirely in main memory can serve applications whose performance criteria cannot be met by disk-based key-value stores. This paper evaluates the performance implications of cache-conscious data placement in an in-memory key-value store by examining how many values have to be stored consecutively in blocks in order to fully exploit memory locality during bandwidth-bound operations. We contribute by introducing a random block traversal main memory access pattern, by describing the corresponding memory access costs as well as by formally and experimentally deriving the correlation between block size and throughput. Our calculations and experiments vary the value and block sizes as well as their placement in the memory and derive their impact on cache-misses throughout the different memory hierarchies, the ability to prefetch data, and the number of needed CPU cycles to perform a certain set of data operations. The paper closes with the insight that a block-wise grouping of relatively few key-value pairs increases the throughput up to a factor six and with a discussion which implications a block-wise grouping of data has on the system design key-value store.
将数据完全保存在主存中的键值存储可以服务于基于磁盘的键值存储无法满足性能标准的应用程序。本文通过检查在带宽受限的操作期间,为了充分利用内存局域性,必须在块中连续存储多少值,从而评估内存中具有缓存意识的数据放置对性能的影响。我们通过引入随机块遍历主内存访问模式,通过描述相应的内存访问成本以及通过正式和实验推导块大小和吞吐量之间的相关性来做出贡献。我们的计算和实验改变了值和块大小以及它们在内存中的位置,并得出它们对不同内存层次中的缓存丢失、预取数据的能力以及执行特定数据操作所需的CPU周期数的影响。本文最后指出,相对较少的键值对的块分组将吞吐量提高到六倍,并讨论了数据块分组对系统设计键值存储的影响。
{"title":"Cache-conscious data placement in an in-memory key-value store","authors":"Christian Tinnefeld, A. Zeier, H. Plattner","doi":"10.1145/2076623.2076640","DOIUrl":"https://doi.org/10.1145/2076623.2076640","url":null,"abstract":"Key-value stores which keep the data entirely in main memory can serve applications whose performance criteria cannot be met by disk-based key-value stores. This paper evaluates the performance implications of cache-conscious data placement in an in-memory key-value store by examining how many values have to be stored consecutively in blocks in order to fully exploit memory locality during bandwidth-bound operations. We contribute by introducing a random block traversal main memory access pattern, by describing the corresponding memory access costs as well as by formally and experimentally deriving the correlation between block size and throughput. Our calculations and experiments vary the value and block sizes as well as their placement in the memory and derive their impact on cache-misses throughout the different memory hierarchies, the ability to prefetch data, and the number of needed CPU cycles to perform a certain set of data operations. The paper closes with the insight that a block-wise grouping of relatively few key-value pairs increases the throughput up to a factor six and with a discussion which implications a block-wise grouping of data has on the system design key-value store.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"24 1","pages":"134-142"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87003396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Extend core UDF framework for GPU-enabled analytical query evaluation 为支持gpu的分析查询评估扩展核心UDF框架
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076641
Qiming Chen, R. Wu, M. Hsu, Bin Zhang
To achieve scalable data intensive analytics, we investigate methods to integrate general purpose analytic computation into a query pipeline using User Defined Functions (UDFs). However, an existing UDF cannot act as a block operator with chunk-wise input along the tuple-wise query processing pipeline, therefore unable to deal with the application semantics definable on the set of incoming tuples representing a single object or falling in a time window, and unable to leverage external computation engines for efficient batch processing. To enable the data intensive computation pipeline, we introduce a new kind of UDFs called Set-In Set-Out (SISO) UDFs. A SISO UDF is a block operator for processing the input tuples and returning the resulting tuples chunk by chunk. Operated in the query processing pipeline, a SISO UDF pools a chunk of input tuples, dispatches them to GPUs or an analytic engine in batch, materializes and then streams out the results. This behavior differentiates SISO UDF from all the existing ones, and makes efficient integration of analytic computation and data management feasible. We have implemented the SISO UDF framework by extending the PostgreSQL query engine, and further demonstrated the use of SISO UDF with GPU-enabled analytical query evaluation. Our experiments show that the proposed approach is scalable and efficient.
为了实现可扩展的数据密集型分析,我们研究了使用用户定义函数(udf)将通用分析计算集成到查询管道中的方法。但是,现有的UDF不能作为块操作符,在元组查询处理管道上使用块输入,因此不能处理在表示单个对象或落在时间窗口内的传入元组集合上可定义的应用程序语义,也不能利用外部计算引擎进行高效的批处理。为了实现数据密集型计算管道,我们引入了一种新的udf,称为Set-In - Set-Out (SISO) udf。SISO UDF是一个块操作符,用于处理输入元组并逐块返回结果元组。在查询处理管道中操作,SISO UDF将输入元组的块池化,将它们分批分配给gpu或分析引擎,实现然后输出结果。这种行为使SISO UDF区别于所有现有的UDF,并使分析计算和数据管理的有效集成成为可能。我们通过扩展PostgreSQL查询引擎实现了SISO UDF框架,并进一步演示了在支持gpu的分析查询评估中使用SISO UDF。实验结果表明,该方法具有良好的可扩展性和有效性。
{"title":"Extend core UDF framework for GPU-enabled analytical query evaluation","authors":"Qiming Chen, R. Wu, M. Hsu, Bin Zhang","doi":"10.1145/2076623.2076641","DOIUrl":"https://doi.org/10.1145/2076623.2076641","url":null,"abstract":"To achieve scalable data intensive analytics, we investigate methods to integrate general purpose analytic computation into a query pipeline using User Defined Functions (UDFs). However, an existing UDF cannot act as a block operator with chunk-wise input along the tuple-wise query processing pipeline, therefore unable to deal with the application semantics definable on the set of incoming tuples representing a single object or falling in a time window, and unable to leverage external computation engines for efficient batch processing.\u0000 To enable the data intensive computation pipeline, we introduce a new kind of UDFs called Set-In Set-Out (SISO) UDFs. A SISO UDF is a block operator for processing the input tuples and returning the resulting tuples chunk by chunk. Operated in the query processing pipeline, a SISO UDF pools a chunk of input tuples, dispatches them to GPUs or an analytic engine in batch, materializes and then streams out the results. This behavior differentiates SISO UDF from all the existing ones, and makes efficient integration of analytic computation and data management feasible. We have implemented the SISO UDF framework by extending the PostgreSQL query engine, and further demonstrated the use of SISO UDF with GPU-enabled analytical query evaluation. Our experiments show that the proposed approach is scalable and efficient.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"341 1","pages":"143-151"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79544022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Top-k query processing for combinatorial objects using Euclidean distance 基于欧氏距离的组合对象Top-k查询处理
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076651
Takanobu Suzuki, A. Takasu, J. Adachi
Conventional search techniques are mainly designed to return a ranked list of single objects that are relevant to a given query. However, they do not meet the criteria for retrieving a combination of objects that is close to the query. This paper presents top-k query processing in which Euclidean distance is used as the scoring function for combinatorial objects. We also propose a pruning method based on clustering and efficiently select object combinations by pruning clusters that do not contain potential candidates for the top-k results. We compared the proposed method with the method that enumerates all the combinatorial objects and calculates the distance to the query. Experimental results revealed that the proposed method improves the processing efficiency to about 95% at maximum.
传统的搜索技术主要用于返回与给定查询相关的单个对象的排序列表。但是,它们不满足检索接近查询的对象组合的标准。本文提出了用欧氏距离作为组合对象评分函数的top-k查询处理方法。我们还提出了一种基于聚类的修剪方法,通过修剪不包含top-k结果的潜在候选簇来有效地选择对象组合。我们将提出的方法与枚举所有组合对象并计算到查询的距离的方法进行了比较。实验结果表明,该方法的处理效率最高可达95%左右。
{"title":"Top-k query processing for combinatorial objects using Euclidean distance","authors":"Takanobu Suzuki, A. Takasu, J. Adachi","doi":"10.1145/2076623.2076651","DOIUrl":"https://doi.org/10.1145/2076623.2076651","url":null,"abstract":"Conventional search techniques are mainly designed to return a ranked list of single objects that are relevant to a given query. However, they do not meet the criteria for retrieving a combination of objects that is close to the query. This paper presents top-k query processing in which Euclidean distance is used as the scoring function for combinatorial objects. We also propose a pruning method based on clustering and efficiently select object combinations by pruning clusters that do not contain potential candidates for the top-k results. We compared the proposed method with the method that enumerates all the combinatorial objects and calculates the distance to the query. Experimental results revealed that the proposed method improves the processing efficiency to about 95% at maximum.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"51 1","pages":"209-213"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74416904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mining semantic data for solving first-rater and cold-start problems in recommender systems 挖掘语义数据以解决推荐系统中的一流和冷启动问题
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076662
M. García, S. Segrera, V. F. L. Batista, María Dolores Muñoz Vicente, Angel L. Sánchez
Recommender systems are becoming very popular in recent years, mainly in the e-commerce sites, although they are increasing in importance in other areas such as e-learning, tourism, news pages, etc. These systems are endowed with intelligent mechanisms to personalize recommendations about products or services. However, they present some serious drawbacks that impact in user satisfaction. First-rater and cold-start problems are two important drawbacks that take place respectively when new products or new users are introduced in the system. The lack of rating about these products or from these users prevents from making recommendations. Nowadays, traditional collaborative filtering methods have being replaced by web mining techniques in order to deal with scalability and performance problems, but first-rater and cold-start ones require a different strategy. In this work, we propose a methodology that combines data mining techniques with semantic data in order to overcome these two important shortcomings.
近年来,推荐系统变得非常流行,主要是在电子商务网站,尽管它们在其他领域(如电子学习、旅游、新闻页面等)的重要性也在增加。这些系统被赋予了智能机制来个性化推荐产品或服务。然而,它们也存在一些影响用户满意度的严重缺陷。一流问题和冷启动问题是系统引入新产品或新用户时分别出现的两个重要缺陷。缺乏对这些产品或这些用户的评级阻止了我们提出建议。目前,为了解决可扩展性和性能问题,传统的协同过滤方法已经被web挖掘技术所取代,但一流的和冷启动的过滤方法需要不同的策略。在这项工作中,我们提出了一种将数据挖掘技术与语义数据相结合的方法,以克服这两个重要的缺点。
{"title":"Mining semantic data for solving first-rater and cold-start problems in recommender systems","authors":"M. García, S. Segrera, V. F. L. Batista, María Dolores Muñoz Vicente, Angel L. Sánchez","doi":"10.1145/2076623.2076662","DOIUrl":"https://doi.org/10.1145/2076623.2076662","url":null,"abstract":"Recommender systems are becoming very popular in recent years, mainly in the e-commerce sites, although they are increasing in importance in other areas such as e-learning, tourism, news pages, etc. These systems are endowed with intelligent mechanisms to personalize recommendations about products or services. However, they present some serious drawbacks that impact in user satisfaction. First-rater and cold-start problems are two important drawbacks that take place respectively when new products or new users are introduced in the system. The lack of rating about these products or from these users prevents from making recommendations. Nowadays, traditional collaborative filtering methods have being replaced by web mining techniques in order to deal with scalability and performance problems, but first-rater and cold-start ones require a different strategy. In this work, we propose a methodology that combines data mining techniques with semantic data in order to overcome these two important shortcomings.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"R-34 1","pages":"256-257"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84556944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Scrubbing query results from probabilistic databases 从概率数据库中清除查询结果
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076634
Jianwen Chen, Ling Feng, Wenwei Xue
Queries over probabilistic databases lead to probabilistic results. As the process of arriving at these results is based on underlying data probabilities, we believe involving a user in the loop of query processing and leveraging the user's personal knowledge to deal with uncertain data, will enable the system to scrub (correct) and tailor its probabilistic query results towards a better quality from the perspective of the specific user. In this paper, we propose to open the black box of a probabilistic database query engine, and explain to the user how the engine comes up with the probabilistic query result as well as which uncertain tuples in the database the result is derived from. In this way, the user based on his/her knowledge about uncertain information can not only decide how much confidence to be placed on the query engine, but also help clarify some uncertain information so that the query engine can re-generate an improved query result. Two particular issues associated with such a probabilistic database query framework are addressed: (i) how to interact with a user for answer explanation and uncertainty clarification without bringing much burden to the user, and (ii) how to scrub/correct the query result without incurring much computation overhead to the query engine. Our performance study demonstrates the accuracy effectiveness and computational efficiency achieved by the proposed framework.
对概率数据库的查询导致概率结果。由于获得这些结果的过程是基于底层数据概率的,我们相信让用户参与查询处理的循环,并利用用户的个人知识来处理不确定的数据,将使系统能够从特定用户的角度来筛选(纠正)和定制其概率查询结果,以获得更好的质量。在本文中,我们建议打开概率数据库查询引擎的黑匣子,并向用户解释引擎如何得出概率查询结果以及结果来自数据库中的哪些不确定元组。这样,用户根据自己对不确定信息的了解,不仅可以决定对查询引擎的置信度,还可以帮助澄清一些不确定信息,以便查询引擎重新生成改进的查询结果。解决了与这种概率数据库查询框架相关的两个特定问题:(i)如何与用户交互以进行答案解释和不确定性澄清,而不会给用户带来太多负担,以及(ii)如何在不给查询引擎带来太多计算开销的情况下清除/纠正查询结果。我们的性能研究证明了该框架的准确性、有效性和计算效率。
{"title":"Scrubbing query results from probabilistic databases","authors":"Jianwen Chen, Ling Feng, Wenwei Xue","doi":"10.1145/2076623.2076634","DOIUrl":"https://doi.org/10.1145/2076623.2076634","url":null,"abstract":"Queries over probabilistic databases lead to probabilistic results. As the process of arriving at these results is based on underlying data probabilities, we believe involving a user in the loop of query processing and leveraging the user's personal knowledge to deal with uncertain data, will enable the system to scrub (correct) and tailor its probabilistic query results towards a better quality from the perspective of the specific user. In this paper, we propose to open the black box of a probabilistic database query engine, and explain to the user how the engine comes up with the probabilistic query result as well as which uncertain tuples in the database the result is derived from. In this way, the user based on his/her knowledge about uncertain information can not only decide how much confidence to be placed on the query engine, but also help clarify some uncertain information so that the query engine can re-generate an improved query result. Two particular issues associated with such a probabilistic database query framework are addressed: (i) how to interact with a user for answer explanation and uncertainty clarification without bringing much burden to the user, and (ii) how to scrub/correct the query result without incurring much computation overhead to the query engine. Our performance study demonstrates the accuracy effectiveness and computational efficiency achieved by the proposed framework.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"20 1","pages":"79-87"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85670920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings. International Database Engineering and Applications Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1