Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management最新文献

英文中文

A case study in optimizing continuous queries using the magic update technique 一个使用神奇更新技术优化连续查询的案例研究

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618285

Andreas Behrend, Gereon Schüller

The evaluation of continuous queries over data streams often becomes difficult as soon as static context data must be combined with dynamic stream data. This is especially the case if the context data is organized in form of view hierarchies and thus computed from some base facts. In this scenario, typical algebraic optimization strategies fail in providing a well-optimized query evaluation plan which effectively combines the stream and classical view subparts of the given query. The Magic Update method represents a possible solution to this problem as it allows for dynamically generating new selection conditions from the data stream which are pushed into the view hierarchy of context data. In this paper we present a case study in which the performance gain of this technique is shown when optimizing anomaly detection views in an air-traffic surveillance scenario.

一旦静态上下文数据必须与动态流数据相结合，对数据流上的连续查询的评估就会变得困难。如果上下文数据以视图层次结构的形式组织，从而根据一些基本事实计算，则尤其如此。在这种情况下，典型的代数优化策略无法提供优化良好的查询评估计划，该计划无法有效地将给定查询的流和经典视图子部分结合起来。Magic Update方法代表了这个问题的一种可能的解决方案，因为它允许从数据流动态生成新的选择条件，这些选择条件被推送到上下文数据的视图层次结构中。在本文中，我们提出了一个案例研究，在优化空中交通监视场景中的异常检测视图时，显示了该技术的性能增益。

引用次数: 9

Data perturbation for outlier detection ensembles 离群检测系统的数据摄动

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618257

A. Zimek, R. Campello, J. Sander

Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Building an ensemble requires learning of diverse models and combining these diverse models in an appropriate way. We propose data perturbation as a new technique to induce diversity in individual outlier detectors as well as a rank accumulation method for the combination of the individual outlier rankings in order to construct an outlier detection ensemble. In an extensive evaluation, we study the impact, potential, and shortcomings of this new approach for outlier detection ensembles. We show that this ensemble can significantly improve over weak performing base methods.

离群点检测和集成学习是数据挖掘中较为成熟的研究方向，但集成技术在离群点检测中的应用研究却很少。构建集成需要学习不同的模型，并以适当的方式组合这些不同的模型。我们提出了一种新的数据扰动技术来诱导单个离群点检测器的多样性，并提出了一种秩累积方法来组合单个离群点的排名，以构建一个离群点检测集合。在广泛的评估中，我们研究了这种新方法对异常值检测集合的影响、潜力和缺点。我们表明，这种集成可以显着改善性能较弱的基本方法。

引用次数: 40

SLACID - sparse linear algebra in a column-oriented in-memory database system SLACID——面向列的内存数据库系统中的稀疏线性代数

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618254

D. Kernert, F. Köhler, Wolfgang Lehner

Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs.

科学计算和分析业务应用通常基于对大型稀疏矩阵的线性代数运算。随着主存储从磁盘到内存的硬件转移，现在可以直接在数据库引擎中执行线性代数查询。本文提出并比较了在面向列的内存数据库系统中存储稀疏矩阵的不同方法。我们表明，从压缩稀疏行表示派生的系统布局与柱状数据库设计集成得很好，并且当使用字典编码时，所得到的架构还适用于广泛的非数值用例。动态矩阵操作操作，如在线插入或删除元素，不包括在大多数线性代数框架中。因此，我们提出了一个由读优化主结构和写优化增量结构组成的混合架构，并通过应用核科学和网络图的工作流来评估动态稀疏矩阵工作负载的性能。

{"title":"SLACID - sparse linear algebra in a column-oriented in-memory database system","authors":"D. Kernert, F. Köhler, Wolfgang Lehner","doi":"10.1145/2618243.2618254","DOIUrl":"https://doi.org/10.1145/2618243.2618254","url":null,"abstract":"Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"23 1","pages":"11:1-11:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72900372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A provable algorithmic approach to product selection problems for market entry and sustainability 市场进入和可持续性的产品选择问题的可证明算法方法

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618250

Silei Xu, Yishi Lin, Hong Xie, John C.S. Lui

Given the globalized economy, how to process the heterogeneous web data so to extract customers' purchase behavior is crucial to manufacturers who want to enter or sustain in a competitive market. To maximize the sales, manufacturers not only need to decide what products to produce so to meet diverse customers' requirements, but at the same time, compete with competitors' products. In this paper, we present a general framework for the following product selection problems: (1) k-BSP problem, which is for a manufacturer to enter a competitive market, and (2) k-BBP problem, which is for a manufacturer to sustain in a competitive market. We propose several product adoption models to describe the complex purchase behavior of customers, and formally show that these problems are NP-hard in general. To tackle these problems, we propose computationally efficient greedy-based approximation algorithms. Based on the submodularity analysis, we prove that our algorithms can guarantee a (1--1/e)-approximation ratio as compared to the optimal solutions. We perform large scale data analysis to show the efficiency and accuracy of our framework. In our experiments, we observe 1,300 to 250,000 times speedup as compared to the exhaustive algorithms, and our solutions can achieve on average 96% of solution quality as compared to the optimal solutions. Finally, we apply our algorithms on web dataset to show the impact of customers' different purchase behavior on the results of product selection.

在经济全球化的背景下，如何处理异构的网络数据，提取客户的购买行为，对于想要进入竞争激烈的市场或在竞争中维持生存的制造商来说至关重要。为了使销售最大化，制造商不仅需要决定生产什么产品以满足不同客户的要求，同时还要与竞争对手的产品进行竞争。本文提出了以下产品选择问题的一般框架:(1)制造商进入竞争市场的k-BSP问题，(2)制造商在竞争市场中维持的k-BBP问题。我们提出了几个产品采用模型来描述客户的复杂购买行为，并正式表明这些问题通常是np困难的。为了解决这些问题，我们提出了计算效率高的基于贪婪的近似算法。基于子模块化分析，我们证明了与最优解相比，我们的算法可以保证(1—1/e)-近似比。我们进行了大规模的数据分析，以显示我们的框架的效率和准确性。在我们的实验中，我们观察到与穷举算法相比，我们的解决方案的速度提高了1300到25万倍，与最优解决方案相比，我们的解决方案平均可以达到96%的解决方案质量。最后，我们将我们的算法应用于web数据集，以显示客户不同的购买行为对产品选择结果的影响。

{"title":"A provable algorithmic approach to product selection problems for market entry and sustainability","authors":"Silei Xu, Yishi Lin, Hong Xie, John C.S. Lui","doi":"10.1145/2618243.2618250","DOIUrl":"https://doi.org/10.1145/2618243.2618250","url":null,"abstract":"Given the globalized economy, how to process the heterogeneous web data so to extract customers' purchase behavior is crucial to manufacturers who want to enter or sustain in a competitive market. To maximize the sales, manufacturers not only need to decide what products to produce so to meet diverse customers' requirements, but at the same time, compete with competitors' products. In this paper, we present a general framework for the following product selection problems: (1) k-BSP problem, which is for a manufacturer to enter a competitive market, and (2) k-BBP problem, which is for a manufacturer to sustain in a competitive market. We propose several product adoption models to describe the complex purchase behavior of customers, and formally show that these problems are NP-hard in general. To tackle these problems, we propose computationally efficient greedy-based approximation algorithms. Based on the submodularity analysis, we prove that our algorithms can guarantee a (1--1/e)-approximation ratio as compared to the optimal solutions. We perform large scale data analysis to show the efficiency and accuracy of our framework. In our experiments, we observe 1,300 to 250,000 times speedup as compared to the exhaustive algorithms, and our solutions can achieve on average 96% of solution quality as compared to the optimal solutions. Finally, we apply our algorithms on web dataset to show the impact of customers' different purchase behavior on the results of product selection.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"29 1","pages":"19:1-19:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82806787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Protection of sensitive trajectory datasets through spatial and temporal exchange 通过时空交换保护敏感轨迹数据集

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618278

Elham Naghizade, L. Kulik, E. Tanin

Privacy concerns place a great impediment to publishing and/or exchanging trajectory data across companies and institutions. This has urged researchers to address privacy issues prior to trajectory data release. Currently, privacy preserving solutions distort original data unnecessarily, hence, degrade data utility and make such data less useful for third parties. We consider a trajectory as a sequence of stops and moves, and propose an approach that exploits features of a trajectory as means for preserving privacy while maintaining a high level of utility. We introduce the concept of sensitivity for stops based on the assumption that they are more vulnerable to privacy threats. We propose an efficient algorithm that either substitutes sensitive stop points of a trajectory with moves from the same trajectory or introduces a minimal detour if a less sensitive stop can not be found on the same route. Our experiments shows that our method balances user privacy and data utility: it protects privacy through preventing an adversary from making inferences about sensitive stops while maintaining a high level of data similarity to the original dataset.

隐私问题对公司和机构之间发布和/或交换轨迹数据造成了很大的障碍。这促使研究人员在轨迹数据发布之前解决隐私问题。目前，隐私保护解决方案不必要地扭曲了原始数据，从而降低了数据的效用，使这些数据对第三方的有用性降低。我们将轨迹视为一系列的停止和移动，并提出了一种利用轨迹特征作为保护隐私同时保持高水平效用的方法。基于站点更容易受到隐私威胁的假设，我们引入了站点敏感性的概念。我们提出了一种有效的算法，该算法要么用来自同一轨迹的移动替代轨迹上的敏感停靠点，要么在同一路线上找不到不那么敏感的停靠点时引入最小绕行。我们的实验表明，我们的方法平衡了用户隐私和数据效用:它通过防止对手对敏感停止进行推断来保护隐私，同时保持与原始数据集的高水平数据相似性。

{"title":"Protection of sensitive trajectory datasets through spatial and temporal exchange","authors":"Elham Naghizade, L. Kulik, E. Tanin","doi":"10.1145/2618243.2618278","DOIUrl":"https://doi.org/10.1145/2618243.2618278","url":null,"abstract":"Privacy concerns place a great impediment to publishing and/or exchanging trajectory data across companies and institutions. This has urged researchers to address privacy issues prior to trajectory data release. Currently, privacy preserving solutions distort original data unnecessarily, hence, degrade data utility and make such data less useful for third parties. We consider a trajectory as a sequence of stops and moves, and propose an approach that exploits features of a trajectory as means for preserving privacy while maintaining a high level of utility. We introduce the concept of sensitivity for stops based on the assumption that they are more vulnerable to privacy threats. We propose an efficient algorithm that either substitutes sensitive stop points of a trajectory with moves from the same trajectory or introduces a minimal detour if a less sensitive stop can not be found on the same route. Our experiments shows that our method balances user privacy and data utility: it protects privacy through preventing an adversary from making inferences about sensitive stops while maintaining a high level of data similarity to the original dataset.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"43 1","pages":"40:1-40:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87105648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

On efficiently generating realistic social media timeline structures 有效地生成现实的社交媒体时间线结构

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618272

Chengcheng Yu, Fan Xia, Weining Qian, Aoying Zhou, Jianlong Chang

A framework of synthetic data generator to generate social media timeline structures is proposed in this paper, which is useful for benchmarking query processing over social media data, and validating hypothesis over users' behavior. It is flexible to generate synthetic data with different distributions. With the help of its asynchronized parallel processing model and delayed update strategy, it is efficient to feed out timeline structure with high throughput. We show in experiments that our method can generate realistic social media timeline structures efficiently.

本文提出了一个生成社交媒体时间线结构的合成数据生成器框架，该框架可用于对社交媒体数据的查询处理进行基准测试，并验证对用户行为的假设。它可以灵活地生成具有不同分布的合成数据。利用异步并行处理模型和延迟更新策略，可以有效地输出高吞吐量的时间线结构。实验表明，我们的方法可以有效地生成真实的社交媒体时间线结构。

引用次数: 2

A system for efficient and simultaneous processing of moving K nearest neighbor and spatial keyword queries 一个有效的系统，同时处理移动K近邻和空间关键字查询

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618290

Chongsheng Zhang

We study the efficient, generic processing of moving K nearest neighbor (MKNN) and top-K spatial keyword (MKSK) queries. Such generic processing is attractive during high query loads. We propose GridVoronoi--an index that enables users to find the spatial nearest neighbor (NN) from uniformly distributed datasets in almost O(1) time. GridVoronoi is based upon Voronoi diagram which has proven to be highly efficient in exploring the local neighborhood of a given Voronoi cell. However, Voronoi diagram needs a method to promptly find out which Voronoi cell contains the query point. So we add a virtual (i.e., conceptual) grid to the Voronoi diagram. For any query point, GridVoronoi first uses the grid to compute which Voronoi cell contains the query, next utilizes Voronoi diagram to quickly find the NN and KNN (i.e., K nearest neighbors) of the query. Upon GridVoronoi we introduce UniSpatial framework that is able to simultaneously process MKNN and MKSK queries. For each keyword, UniSpatial builds a GridVoronoi index that enables the fast retrieval of the spatial Web objects containing this keyword. UniSpatial employs the same method to process MKNN and MKSK queries, but for MKSK queries it needs to rank the retrieved objects by their proximity to the query location and textual relevance to the input keywords. In the demo, we will use real datasets to show the functionality and performance of UniSpatial.

我们研究了移动K个最近邻(MKNN)和顶部K个空间关键字(MKSK)查询的高效、通用处理。这种通用处理在高查询负载期间很有吸引力。我们提出了GridVoronoi——一个使用户能够在几乎0(1)时间内从均匀分布的数据集中找到空间最近邻(NN)的索引。GridVoronoi基于Voronoi图，该图已被证明在探索给定Voronoi细胞的局部邻域时非常有效。然而，Voronoi图需要一种方法来快速找出哪个Voronoi单元格包含查询点。因此，我们在Voronoi图中添加了一个虚拟(即概念)网格。对于任何查询点，GridVoronoi首先使用网格计算包含查询的Voronoi单元格，然后利用Voronoi图快速找到查询的NN和KNN(即K个最近邻)。在gridoronoi上，我们引入了能够同时处理MKNN和MKSK查询的UniSpatial框架。对于每个关键字，UniSpatial构建一个gridoronoi索引，该索引支持快速检索包含该关键字的空间Web对象。UniSpatial使用相同的方法来处理MKNN和MKSK查询，但是对于MKSK查询，它需要根据检索对象与查询位置的接近程度以及与输入关键字的文本相关性对检索对象进行排序。在演示中，我们将使用真实的数据集来展示UniSpatial的功能和性能。

{"title":"A system for efficient and simultaneous processing of moving K nearest neighbor and spatial keyword queries","authors":"Chongsheng Zhang","doi":"10.1145/2618243.2618290","DOIUrl":"https://doi.org/10.1145/2618243.2618290","url":null,"abstract":"We study the efficient, generic processing of moving K nearest neighbor (MKNN) and top-K spatial keyword (MKSK) queries. Such generic processing is attractive during high query loads. We propose GridVoronoi--an index that enables users to find the spatial nearest neighbor (NN) from uniformly distributed datasets in almost O(1) time. GridVoronoi is based upon Voronoi diagram which has proven to be highly efficient in exploring the local neighborhood of a given Voronoi cell. However, Voronoi diagram needs a method to promptly find out which Voronoi cell contains the query point. So we add a virtual (i.e., conceptual) grid to the Voronoi diagram. For any query point, GridVoronoi first uses the grid to compute which Voronoi cell contains the query, next utilizes Voronoi diagram to quickly find the NN and KNN (i.e., K nearest neighbors) of the query.\u0000 Upon GridVoronoi we introduce UniSpatial framework that is able to simultaneously process MKNN and MKSK queries. For each keyword, UniSpatial builds a GridVoronoi index that enables the fast retrieval of the spatial Web objects containing this keyword. UniSpatial employs the same method to process MKNN and MKSK queries, but for MKSK queries it needs to rank the retrieved objects by their proximity to the query location and textual relevance to the input keywords. In the demo, we will use real datasets to show the functionality and performance of UniSpatial.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"25 1","pages":"50:1-50:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78223579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Managing evolving shapes in sensor networks 在传感器网络中管理不断变化的形状

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618264

Besim Avci, Goce Trajcevski, P. Scheuermann

This work addresses the problem of efficient distributed detection and tracking of mobile and evolving/deformable spatial shapes in Wireless Sensor Networks (WSN). The shapes correspond to contiguous regions bounding the locations of sensors in which the readings of the sensors satisfy a particular threshold-based criterion related to the values of a physical phenomenon that they measure. We formalize the predicates representing the shapes in such settings and present detection algorithms. In addition, we provide a light-weight protocol and aggregation methods for energy-efficient distributed execution of those algorithms. Another contribution of this work is that we developed efficient techniques for detecting a co-occurrence of shapes within a given proximity from each other. Our experiments demonstrate that, when compared to the centralized techniques -- which is, predicates being detected in a dedicated sink -- as well as distributed periodic contours construction, our methodologies yield significant energy/communication savings.

这项工作解决了无线传感器网络(WSN)中移动和演化/变形空间形状的高效分布式检测和跟踪问题。形状对应于传感器位置的相邻区域，其中传感器的读数满足与它们测量的物理现象值相关的基于特定阈值的标准。我们形式化了在这种情况下表示形状的谓词，并提出了检测算法。此外，我们还提供了轻量级协议和聚合方法，以实现这些算法的高效分布式执行。这项工作的另一个贡献是我们开发了有效的技术来检测给定距离内形状的共现。我们的实验表明，与集中式技术(即在专用接收器中检测谓词)以及分布式周期性轮廓构建相比，我们的方法可以显著节省能源/通信。

{"title":"Managing evolving shapes in sensor networks","authors":"Besim Avci, Goce Trajcevski, P. Scheuermann","doi":"10.1145/2618243.2618264","DOIUrl":"https://doi.org/10.1145/2618243.2618264","url":null,"abstract":"This work addresses the problem of efficient distributed detection and tracking of mobile and evolving/deformable spatial shapes in Wireless Sensor Networks (WSN). The shapes correspond to contiguous regions bounding the locations of sensors in which the readings of the sensors satisfy a particular threshold-based criterion related to the values of a physical phenomenon that they measure. We formalize the predicates representing the shapes in such settings and present detection algorithms. In addition, we provide a light-weight protocol and aggregation methods for energy-efficient distributed execution of those algorithms. Another contribution of this work is that we developed efficient techniques for detecting a co-occurrence of shapes within a given proximity from each other. Our experiments demonstrate that, when compared to the centralized techniques -- which is, predicates being detected in a dedicated sink -- as well as distributed periodic contours construction, our methodologies yield significant energy/communication savings.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"37 5","pages":"22:1-22:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91551195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Toward efficient and reliable genome analysis using main-memory database systems 利用内存数据库系统进行高效可靠的基因组分析

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618276

Sebastian Dorok, S. Breß, H. Läpple, G. Saake

Improvements in DNA sequencing technologies allow to sequence complete human genomes in a short time and at acceptable cost. Hence, the vision of genome analysis as standard procedure to support and improve medical treatment becomes reachable. In this vision paper, we describe important data-management challenges that have to be met to make this vision come true. Besides genome-analysis performance, data-management capabilities such as data provenance and data integrity become increasingly important to enable comprehensible and reliable genome analysis. We argue to meet these challenges by using main-memory database technologies, which combine fast processing capabilities with extensive data-management capabilities. Finally, we discuss possibilities of integrating genome-analysis tasks into DBMSs and derive new research questions.

DNA测序技术的改进使我们能够在短时间内以可接受的成本完成人类基因组的测序。因此，基因组分析作为支持和改善医疗的标准程序的愿景是可以实现的。在这篇愿景论文中，我们描述了实现这一愿景必须面对的重要数据管理挑战。除了基因组分析性能，数据来源和数据完整性等数据管理能力对于实现可理解和可靠的基因组分析也变得越来越重要。我们主张通过使用主存数据库技术来应对这些挑战，该技术将快速处理能力与广泛的数据管理能力相结合。最后，我们讨论了将基因组分析任务集成到数据库管理系统中的可能性，并提出了新的研究问题。

引用次数: 6

Exploring subspace clustering for recommendations 探索子空间聚类以获得推荐

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618283

Katharina Rausch, Eirini Ntoutsi, K. Stefanidis, H. Kriegel

Typically, recommendations are computed by considering users similar to the user in question. However, scanning the whole database of users for locating similar users is expensive. Existing approaches build user profiles by employing full-dimensional clustering to find sets of similar users. As the datasets we deal with are high-dimensional and incomplete, full-dimensional clustering is not the best option. To this end, we explore the fault tolerance subspace clustering approach that detects clusters of similar users in subspaces of the original feature space and also allows for missing values. Our experiments on real movie datasets show that the diversification of the similar users through subspace clustering results in better recommendations comparing to traditional collaborative filtering and full dimensional clustering approaches.

通常，通过考虑与所讨论的用户相似的用户来计算推荐。但是，扫描整个用户数据库来定位相似的用户是非常昂贵的。现有方法通过使用全维聚类来寻找相似用户集来构建用户配置文件。由于我们处理的数据集是高维且不完整的，因此全维聚类并不是最好的选择。为此，我们探索了容错子空间聚类方法，该方法在原始特征空间的子空间中检测相似用户的聚类，并允许缺失值。我们在真实电影数据集上的实验表明，与传统的协同过滤和全维聚类方法相比，通过子空间聚类实现相似用户的多样化可以获得更好的推荐。

引用次数: 8

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀