Xuntao Cheng, Bingsheng He, Mian Lu, C. Lau, Huynh Phung Huynh, R. Goh
Recently, Intel Xeon Phi is emerging as a many-core processor with up to 61 x86 cores. In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel database performance on future many-core processors. With the trend towards many-core architectures, query operator optimizations, and efficient query scheduling on such many-core architectures remain as challenging issues. This motivates us to redesign and evaluate query processors. In PhiDB, we apply Xeon Phi aware optimizations on query operators to exploit hardware features of Xeon Phi, and design a heuristic algorithm to schedule the concurrent execution of query operators for better performance, to demonstrate the performance impact of Xeon Phi aware optimizations. We have also developed a user interface for users to explore the underlying performance impacts of hardware-conscious optimizations and scheduling plans.
{"title":"Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor","authors":"Xuntao Cheng, Bingsheng He, Mian Lu, C. Lau, Huynh Phung Huynh, R. Goh","doi":"10.1145/2882903.2899407","DOIUrl":"https://doi.org/10.1145/2882903.2899407","url":null,"abstract":"Recently, Intel Xeon Phi is emerging as a many-core processor with up to 61 x86 cores. In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel database performance on future many-core processors. With the trend towards many-core architectures, query operator optimizations, and efficient query scheduling on such many-core architectures remain as challenging issues. This motivates us to redesign and evaluate query processors. In PhiDB, we apply Xeon Phi aware optimizations on query operators to exploit hardware features of Xeon Phi, and design a heuristic algorithm to schedule the concurrent execution of query operators for better performance, to demonstrate the performance impact of Xeon Phi aware optimizations. We have also developed a user interface for users to explore the underlying performance impacts of hardware-conscious optimizations and scheduling plans.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77356105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Searchlight enables search and exploration of large, multi-dimensional data sets interactively. It allows users to explore by specifying rich constraints for the "objects" they are interested in identifying. Constraints can express a variety of properties, including a shape of the object (e.g., a waveform interval of length 10-100ms), its aggregate properties (e.g., the average amplitude of the signal over the interval is greater than 10), and similarity to another object (e.g., the distance between the interval's waveform and the query waveform is less than 5). Searchlight allows users to specify an arbitrary number of such constraints, with mixing different types of constraints in the same query. Searchlight enhances the query execution engine of an array DBMS (currently SciDB) with the ability to perform sophisticated search using the power of Constraint Programming (CP). This allows an existing CP solver from Or-Tools (an open-source suite of operations research tools from Google) to directly access data inside the DBMS without the need to extract and transform it. This demo will illustrate the rich search and exploration capabilities of Searchlight, and its innovative technical features, by using the real-world MIMIC II data set, which contains waveform data for multi-parameter recordings of ICU patients, such as ABP (Arterial Blood Pressure) and ECG (electrocardiogram). Users will be able to search for interesting waveform intervals by specifying aggregate properties of the corresponding signals. In addition, they will be able to search for intervals similar to already found, where similarity is defined as a distance between the signal sequences.
{"title":"Interactive Search and Exploration of Waveform Data with Searchlight","authors":"A. Kalinin, U. Çetintemel, S. Zdonik","doi":"10.1145/2882903.2899404","DOIUrl":"https://doi.org/10.1145/2882903.2899404","url":null,"abstract":"Searchlight enables search and exploration of large, multi-dimensional data sets interactively. It allows users to explore by specifying rich constraints for the \"objects\" they are interested in identifying. Constraints can express a variety of properties, including a shape of the object (e.g., a waveform interval of length 10-100ms), its aggregate properties (e.g., the average amplitude of the signal over the interval is greater than 10), and similarity to another object (e.g., the distance between the interval's waveform and the query waveform is less than 5). Searchlight allows users to specify an arbitrary number of such constraints, with mixing different types of constraints in the same query. Searchlight enhances the query execution engine of an array DBMS (currently SciDB) with the ability to perform sophisticated search using the power of Constraint Programming (CP). This allows an existing CP solver from Or-Tools (an open-source suite of operations research tools from Google) to directly access data inside the DBMS without the need to extract and transform it. This demo will illustrate the rich search and exploration capabilities of Searchlight, and its innovative technical features, by using the real-world MIMIC II data set, which contains waveform data for multi-parameter recordings of ICU patients, such as ABP (Arterial Blood Pressure) and ECG (electrocardiogram). Users will be able to search for interesting waveform intervals by specifying aggregate properties of the corresponding signals. In addition, they will be able to search for intervals similar to already found, where similarity is defined as a distance between the signal sequences.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84292933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee
In this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search. A natural generalization of continuous obstructed nearest-neighbor (CONN), an RONN query retrieves the obstructed nearest neighbor for every point in a specified range. To process RONN, we first propose a CONN-Based (CONNB) algorithm as our baseline, which reduces the RONN query into a range query and four CONN queries processed using an R-tree. To address the shortcomings of the CONNB algorithm, we then propose a new RONN by R-tree Filtering (RONN-RF) algorithm, which explores effective filtering, also using R-tree. Next, we propose a new index, called O-tree, dedicated for indexing objects in the obstructed space. The novelty of O-tree lies in the idea of dividing the obstructed space into non-obstructed subspaces, aiming to efficiently retrieve highly qualified candidates for RONN processing. We develop an O-tree construction algorithm and propose a space division scheme, called optimal obstacle balance (OOB) scheme, to address the tree balance problem. Accordingly, we propose an efficient algorithm, called RONN by O-tree Acceleration (RONN-OA), which exploits O-tree to accelerate query processing of RONN. In addition, we extend O-tree for indexing polygons. At last, we conduct a comprehensive performance evaluation using both real and synthetic datasets to validate our ideas and the proposed algorithms. The experimental result shows that the RONN-OA algorithm outperforms the two R-tree based algorithms significantly. Moreover, we show that the OOB scheme achieves the best tree balance in O-tree and outperforms two baseline schemes.
本文研究了一种新的受阻最近邻查询的变体,即基于范围的受阻最近邻(RONN)搜索。连续受阻最近邻(CONN)的自然泛化,一个RONN查询检索指定范围内每个点的受阻最近邻。为了处理RONN,我们首先提出了一种基于CONN (CONNB)的算法作为基准,该算法将RONN查询减少为一个范围查询和使用r树处理的四个CONN查询。为了解决CONNB算法的不足,我们提出了一种新的RONN by R-tree Filtering (RONN- rf)算法,该算法也使用R-tree来探索有效的滤波。接下来,我们提出了一个新的索引,称为O-tree,专门用于索引阻塞空间中的对象。o树的新颖之处在于将阻塞空间划分为非阻塞子空间,旨在高效地检索出高质量的候选对象进行RONN处理。我们开发了一种o树构造算法,并提出了一种称为最优障碍平衡(OOB)方案的空间划分方案来解决树平衡问题。为此,我们提出了一种利用o树加速RONN (RONN- oa)算法,该算法利用o树加速RONN的查询处理。此外,我们扩展了O-tree用于索引多边形。最后,我们使用真实数据集和合成数据集进行了综合性能评估,以验证我们的想法和提出的算法。实验结果表明,RONN-OA算法明显优于两种基于r树的算法。此外,我们还证明了OOB方案在o树中达到了最佳的树平衡,并且优于两种基线方案。
{"title":"Range-based Obstructed Nearest Neighbor Queries","authors":"Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee","doi":"10.1145/2882903.2915234","DOIUrl":"https://doi.org/10.1145/2882903.2915234","url":null,"abstract":"In this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search. A natural generalization of continuous obstructed nearest-neighbor (CONN), an RONN query retrieves the obstructed nearest neighbor for every point in a specified range. To process RONN, we first propose a CONN-Based (CONNB) algorithm as our baseline, which reduces the RONN query into a range query and four CONN queries processed using an R-tree. To address the shortcomings of the CONNB algorithm, we then propose a new RONN by R-tree Filtering (RONN-RF) algorithm, which explores effective filtering, also using R-tree. Next, we propose a new index, called O-tree, dedicated for indexing objects in the obstructed space. The novelty of O-tree lies in the idea of dividing the obstructed space into non-obstructed subspaces, aiming to efficiently retrieve highly qualified candidates for RONN processing. We develop an O-tree construction algorithm and propose a space division scheme, called optimal obstacle balance (OOB) scheme, to address the tree balance problem. Accordingly, we propose an efficient algorithm, called RONN by O-tree Acceleration (RONN-OA), which exploits O-tree to accelerate query processing of RONN. In addition, we extend O-tree for indexing polygons. At last, we conduct a comprehensive performance evaluation using both real and synthetic datasets to validate our ideas and the proposed algorithms. The experimental result shows that the RONN-OA algorithm outperforms the two R-tree based algorithms significantly. Moreover, we show that the OOB scheme achieves the best tree balance in O-tree and outperforms two baseline schemes.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84605111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Kharlamov, S. Brandt, Ernesto Jiménez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. Özçep, C. Pinkel, C. Svingos, D. Zheleznyakov, Ian Horrocks, Y. Ioannidis, R. Möller
Real-time processing of data coming from multiple heterogeneous data streams and static databases is a typical task in many industrial scenarios such as diagnostics of large machines. A complex diagnostic task may require a collection of up to hundreds of queries over such data. Although many of these queries retrieve data of the same kind, such as temperature measurements, they access structurally different data sources. In this work we show how Semantic Technologies implemented in our system optique can simplify such complex diagnostics by providing an abstraction layer---ontology---that integrates heterogeneous data. In a nutshell, optique allows complex diagnostic tasks to be expressed with just a few high-level semantic queries. The system can then automatically enrich these queries, translate them into a collection with a large number of low-level data queries, and finally optimise and efficiently execute the collection in a heavily distributed environment. We will demo the benefits of optique on a real world scenario from Siemens.
{"title":"Ontology-Based Integration of Streaming and Static Relational Data with Optique","authors":"E. Kharlamov, S. Brandt, Ernesto Jiménez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. Özçep, C. Pinkel, C. Svingos, D. Zheleznyakov, Ian Horrocks, Y. Ioannidis, R. Möller","doi":"10.1145/2882903.2899385","DOIUrl":"https://doi.org/10.1145/2882903.2899385","url":null,"abstract":"Real-time processing of data coming from multiple heterogeneous data streams and static databases is a typical task in many industrial scenarios such as diagnostics of large machines. A complex diagnostic task may require a collection of up to hundreds of queries over such data. Although many of these queries retrieve data of the same kind, such as temperature measurements, they access structurally different data sources. In this work we show how Semantic Technologies implemented in our system optique can simplify such complex diagnostics by providing an abstraction layer---ontology---that integrates heterogeneous data. In a nutshell, optique allows complex diagnostic tasks to be expressed with just a few high-level semantic queries. The system can then automatically enrich these queries, translate them into a collection with a large number of low-level data queries, and finally optimise and efficiently execute the collection in a heavily distributed environment. We will demo the benefits of optique on a real world scenario from Siemens.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82531060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Histograms are implemented and used in any database system, usually defined on a single-column of a database table. However, one of the most desired statistical data in such systems are statistics on the correlation among columns. In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes. The join histogram is built only from the input histograms without accessing the base data or computing the join relation. Under certain restrictions, a q-error guarantee can be placed on the produced join histogram. It is possible to construct adversarial input histograms that produce arbitrarily large q-error in the resulting join histogram, but across several experiments, this type of input does not occur in either randomly generated data or real-world data. Our construction algorithm runs in linear time with respect to the size of the input histograms, and produces a join histogram that is at most as large as the sum of the sizes of the input histograms. These join histograms can be used to efficiently and accurately estimate the cardinality of join queries.
{"title":"Constructing Join Histograms from Histograms with q-error Guarantees","authors":"Kaleb Alway, A. Nica","doi":"10.1145/2882903.2914828","DOIUrl":"https://doi.org/10.1145/2882903.2914828","url":null,"abstract":"Histograms are implemented and used in any database system, usually defined on a single-column of a database table. However, one of the most desired statistical data in such systems are statistics on the correlation among columns. In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes. The join histogram is built only from the input histograms without accessing the base data or computing the join relation. Under certain restrictions, a q-error guarantee can be placed on the produced join histogram. It is possible to construct adversarial input histograms that produce arbitrarily large q-error in the resulting join histogram, but across several experiments, this type of input does not occur in either randomly generated data or real-world data. Our construction algorithm runs in linear time with respect to the size of the input histograms, and produces a join histogram that is at most as large as the sum of the sizes of the input histograms. These join histograms can be used to efficiently and accurately estimate the cardinality of join queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87674076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Donatello Santoro, Patricia C. Arocena, Boris Glavic, G. Mecca, Renée J. Miller, Paolo Papotti
Repairing erroneous or conflicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.
{"title":"BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems","authors":"Donatello Santoro, Patricia C. Arocena, Boris Glavic, G. Mecca, Renée J. Miller, Paolo Papotti","doi":"10.1145/2882903.2899397","DOIUrl":"https://doi.org/10.1145/2882903.2899397","url":null,"abstract":"Repairing erroneous or conflicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87081767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, predefined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. bio-medical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.
{"title":"Automatic Entity Recognition and Typing in Massive Text Data","authors":"Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han","doi":"10.1145/2882903.2912567","DOIUrl":"https://doi.org/10.1145/2882903.2912567","url":null,"abstract":"In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, predefined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. bio-medical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82945693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kukjin Lee, A. König, Vivek R. Narasayya, Bolin Ding, S. Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma V. Nehme, Jiexing Li, J. Naughton
We describe the design and implementation of the new Live Query Statistics (LQS) feature in Microsoft SQL Server 2016. The functionality includes the display of overall query progress as well as progress of individual operators in the query execution plan. We describe the overall functionality of LQS, give usage examples and detail all areas where we had to extend the current state-of-the-art to build the complete LQS feature. Finally, we evaluate the effect these extensions have on progress estimation accuracy with a series of experiments using a large set of synthetic and real workloads.
我们描述了Microsoft SQL Server 2016中新的实时查询统计(LQS)功能的设计和实现。该功能包括显示总体查询进度以及查询执行计划中单个操作符的进度。我们描述了LQS的整体功能,给出了使用示例,并详细介绍了我们必须扩展当前最先进技术以构建完整LQS功能的所有领域。最后,我们通过一系列使用大量合成和真实工作负载的实验来评估这些扩展对进度估计精度的影响。
{"title":"Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics","authors":"Kukjin Lee, A. König, Vivek R. Narasayya, Bolin Ding, S. Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma V. Nehme, Jiexing Li, J. Naughton","doi":"10.1145/2882903.2903728","DOIUrl":"https://doi.org/10.1145/2882903.2903728","url":null,"abstract":"We describe the design and implementation of the new Live Query Statistics (LQS) feature in Microsoft SQL Server 2016. The functionality includes the display of overall query progress as well as progress of individual operators in the query execution plan. We describe the overall functionality of LQS, give usage examples and detail all areas where we had to extend the current state-of-the-art to build the complete LQS feature. Finally, we evaluate the effect these extensions have on progress estimation accuracy with a series of experiments using a large set of synthetic and real workloads.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88866216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. V. Emani, Karthik Ramachandra, S. Bhattacharya, S. Sudarshan
Optimizing the performance of database applications is an area of practical importance, and has received significant attention in recent years. In this paper we present an approach to this problem which is based on extracting a concise algebraic representation of (parts of) an application, which may include imperative code as well as SQL queries. The algebraic representation can then be translated into SQL to improve application performance, by reducing the volume of data transferred, as well as reducing latency by minimizing the number of network round trips. Our techniques can be used for performing optimizations of database applications that techniques proposed earlier cannot perform. The algebraic representations can also be used for other purposes such as extracting equivalent queries for keyword search on form results. Our experiments indicate that the techniques we present are widely applicable to real world database applications, in terms of successfully extracting algebraic representations of application behavior, as well as in terms of providing performance benefits when used for optimization.
{"title":"Extracting Equivalent SQL from Imperative Code in Database Applications","authors":"K. V. Emani, Karthik Ramachandra, S. Bhattacharya, S. Sudarshan","doi":"10.1145/2882903.2882926","DOIUrl":"https://doi.org/10.1145/2882903.2882926","url":null,"abstract":"Optimizing the performance of database applications is an area of practical importance, and has received significant attention in recent years. In this paper we present an approach to this problem which is based on extracting a concise algebraic representation of (parts of) an application, which may include imperative code as well as SQL queries. The algebraic representation can then be translated into SQL to improve application performance, by reducing the volume of data transferred, as well as reducing latency by minimizing the number of network round trips. Our techniques can be used for performing optimizations of database applications that techniques proposed earlier cannot perform. The algebraic representations can also be used for other purposes such as extracting equivalent queries for keyword search on form results. Our experiments indicate that the techniques we present are widely applicable to real world database applications, in terms of successfully extracting algebraic representations of application behavior, as well as in terms of providing performance benefits when used for optimization.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89673657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiang Ying, Chaokun Wang, M. Wang, J. Yu, Jun Zhang
Community detection has attracted great interest in graph analysis and mining during the past decade, and a great number of approaches have been developed to address this problem. However, the lack of a uniform framework and a reasonable evaluation method makes it a puzzle to analyze, compare and evaluate the extensive work, let alone picking out a best one when necessary. In this paper, we design a tool called CoDAR, which reveals the generalized procedure of community detection and monitors the real-time structural changes of network during the detection process. Moreover, CoDAR adopts 12 recognized metrics and builds a rating model for performance evaluation of communities to recom- mend the best-performing algorithm. Finally, the tool also provides nice interactive windows for display.
{"title":"CoDAR: Revealing the Generalized Procedure & Recommending Algorithms of Community Detection","authors":"Xiang Ying, Chaokun Wang, M. Wang, J. Yu, Jun Zhang","doi":"10.1145/2882903.2899386","DOIUrl":"https://doi.org/10.1145/2882903.2899386","url":null,"abstract":"Community detection has attracted great interest in graph analysis and mining during the past decade, and a great number of approaches have been developed to address this problem. However, the lack of a uniform framework and a reasonable evaluation method makes it a puzzle to analyze, compare and evaluate the extensive work, let alone picking out a best one when necessary. In this paper, we design a tool called CoDAR, which reveals the generalized procedure of community detection and monitors the real-time structural changes of network during the detection process. Moreover, CoDAR adopts 12 recognized metrics and builds a rating model for performance evaluation of communities to recom- mend the best-performing algorithm. Finally, the tool also provides nice interactive windows for display.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78589538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}