After a brief survey on well established methods for image classification, we focus on a recently proposed Multiple Istance Learning (MIL) method which is suitable for applications in image processing. In particular the method is based on a mixed integer nonlinear formulation of the optimization problem to be solved for MIL purposes. The algorithm is applied to a set of color images (Red, Green, Blue, RGB) with the objective of classifying the images containing some specific pattern. The results of our experimentation are reported.
{"title":"A Multiple Instance Learning Algorithm for Color Images Classification","authors":"A. Astorino, A. Fuduli, M. Gaudioso, E. Vocaturo","doi":"10.1145/3216122.3216144","DOIUrl":"https://doi.org/10.1145/3216122.3216144","url":null,"abstract":"After a brief survey on well established methods for image classification, we focus on a recently proposed Multiple Istance Learning (MIL) method which is suitable for applications in image processing. In particular the method is based on a mixed integer nonlinear formulation of the optimization problem to be solved for MIL purposes. The algorithm is applied to a set of color images (Red, Green, Blue, RGB) with the objective of classifying the images containing some specific pattern. The results of our experimentation are reported.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131004196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent surge in ancient scripts has resulted in huge image libraries of ancient texts. Data mining of the collected images enables the study of the evolution of these ancient scripts. In particular, the origin of the Indus Valley script is highly debated. We use convolutional neural networks to test which Phoenician alphabet letters and Brahmi symbols are closest to the Indus Valley script symbols. Surprisingly, our analysis shows that overall the Phoenician alphabet is much closer than the Brahmi script to the Indus Valley script symbols.
{"title":"Data Mining Ancient Script Image Data Using Convolutional Neural Networks","authors":"Shruti Daggumati, P. Revesz","doi":"10.1145/3216122.3216163","DOIUrl":"https://doi.org/10.1145/3216122.3216163","url":null,"abstract":"The recent surge in ancient scripts has resulted in huge image libraries of ancient texts. Data mining of the collected images enables the study of the evolution of these ancient scripts. In particular, the origin of the Indus Valley script is highly debated. We use convolutional neural networks to test which Phoenician alphabet letters and Brahmi symbols are closest to the Indus Valley script symbols. Surprisingly, our analysis shows that overall the Phoenician alphabet is much closer than the Brahmi script to the Indus Valley script symbols.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116734765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Count constraint is a data dependency that requires the results of given count operations on a relation to be within a certain range. By means of count constraints a new decisional problem, called the Inverse OLAP, has been recently introduced: given a flat fact table, does there exist an instance satisfying a set of given count constraints? This paper focuses on a special case of Inverse OLAP, called Inverse Tree-OLAP, for which the flat fact table key is modeled by a Dimensional Fact Model (DFM) with a tree structure. The count constraints define aggregation patterns to be respected by both the many-to-many relationship among the basic dimensions and the one-to-many relationships within dimension hierarchies. A count constraint is required to have a particular structure so that the problem of handling fact table projections with duplicates is avoided. The simplified structure enables the invention of an effective method for its solution that consists of three main steps: (1) using some of the count constraints to extract a subproblem that is formulated as a known data mining problem (inverse frequent itemset mining), (2) solving the subproblem using a recent method that has been shown to be effective in practical situations also for large size instances and (3) enforcing the remaining count constraints on the solution returned by step 2 using a system of linear equations. The overall proposed approach can be effectively used to generate OLAP cubes for benchmarking that reflect patterns of real datasets.
{"title":"The Inverse Tree-OLAP Problem: Definitions, Models, Complexity Analysis, and a Possible Solution","authors":"D. Saccá, Edoardo Serra, A. Cuzzocrea","doi":"10.1145/3216122.3216129","DOIUrl":"https://doi.org/10.1145/3216122.3216129","url":null,"abstract":"Count constraint is a data dependency that requires the results of given count operations on a relation to be within a certain range. By means of count constraints a new decisional problem, called the Inverse OLAP, has been recently introduced: given a flat fact table, does there exist an instance satisfying a set of given count constraints? This paper focuses on a special case of Inverse OLAP, called Inverse Tree-OLAP, for which the flat fact table key is modeled by a Dimensional Fact Model (DFM) with a tree structure. The count constraints define aggregation patterns to be respected by both the many-to-many relationship among the basic dimensions and the one-to-many relationships within dimension hierarchies. A count constraint is required to have a particular structure so that the problem of handling fact table projections with duplicates is avoided. The simplified structure enables the invention of an effective method for its solution that consists of three main steps: (1) using some of the count constraints to extract a subproblem that is formulated as a known data mining problem (inverse frequent itemset mining), (2) solving the subproblem using a recent method that has been shown to be effective in practical situations also for large size instances and (3) enforcing the remaining count constraints on the solution returned by step 2 using a system of linear equations. The overall proposed approach can be effectively used to generate OLAP cubes for benchmarking that reflect patterns of real datasets.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CalcuList (Calculator with List manipulation), is an educational language for teaching functional programming extended with some imperative and side-effect features, which are enabled under explicit request by the programmer. In addition to strings and lists, the language natively supports json objects and may be effectively used to implement generic MapReduce recursive procedures to manipulate json lists. MapReduce is a popular model in distributed computing that underpins many NoSQL systems and a json list can be thought of as a dataset of a document NoSQL datastore. It turns out that CalcuList can be used as a tool for teaching advanced query algorithms for document datastores such as MongoDB and CouchDB.
{"title":"Using CalcuList To MapReduce Jsons","authors":"D. Saccá, A. Furfaro","doi":"10.1145/3216122.3216164","DOIUrl":"https://doi.org/10.1145/3216122.3216164","url":null,"abstract":"CalcuList (Calculator with List manipulation), is an educational language for teaching functional programming extended with some imperative and side-effect features, which are enabled under explicit request by the programmer. In addition to strings and lists, the language natively supports json objects and may be effectively used to implement generic MapReduce recursive procedures to manipulate json lists. MapReduce is a popular model in distributed computing that underpins many NoSQL systems and a json list can be thought of as a dataset of a document NoSQL datastore. It turns out that CalcuList can be used as a tool for teaching advanced query algorithms for document datastores such as MongoDB and CouchDB.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131509690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-world networks typically evolve through time, which means there are various events occurring, such as edge additions or attribute changes. In order to understand the events, one must be able to discriminate between different events. Existing approaches typically discriminate whole graphs, which are, in addition, mostly static. We propose a new algorithm WalDis for mining discriminate patterns of events in dynamic graphs. This algorithm uses sampling by random walks and greedy approaches in order to keep the performance high. Furthermore, it does not require the time to be discretized as other algorithms commonly do. We have evaluated the algorithm on three real-world graph datasets.
{"title":"WalDis: Mining Discriminative Patterns within Dynamic Graphs","authors":"Karel Vaculík, L. Popelínský","doi":"10.1145/3216122.3216172","DOIUrl":"https://doi.org/10.1145/3216122.3216172","url":null,"abstract":"Real-world networks typically evolve through time, which means there are various events occurring, such as edge additions or attribute changes. In order to understand the events, one must be able to discriminate between different events. Existing approaches typically discriminate whole graphs, which are, in addition, mostly static. We propose a new algorithm WalDis for mining discriminate patterns of events in dynamic graphs. This algorithm uses sampling by random walks and greedy approaches in order to keep the performance high. Furthermore, it does not require the time to be discretized as other algorithms commonly do. We have evaluated the algorithm on three real-world graph datasets.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121735364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Two Phase Locking with High Priority (2PL-HP) concurrency control protocol addresses the transaction scheduling issue in a distributed real-time database system (DRTDBS). Although the 2PL-HP protocol is free from priority inversion, it may suffer from the problems such as deadlock, cyclic restart, and starvation of lengthy transactions. In this paper, a Controlled Avoidance of deadlock and starvation causing Resourceful Conflict resolution between Transactions (CART) concurrency control protocol has been proposed to minimize the transactions miss percentage by reducing the wastage of system resources through avoiding the deadlock due to controlled locking and starvation to some extent by ensuring a fairness in the allocation of resources for their completion. DRTDBS is simulated and CART outperforms as compared with previous other protocols.
高优先级两阶段锁定(2PL-HP)并发控制协议解决了分布式实时数据库系统(DRTDBS)中的事务调度问题。尽管2PL-HP协议没有优先级反转,但它可能会遇到死锁、循环重启和长时间事务耗尽等问题。本文提出了一种可控避免死锁和饥饿导致的事务间资源冲突解决(resource - conflictresolution between Transactions, CART)并发控制协议,通过保证资源分配的公平性,在一定程度上避免可控锁定和饥饿导致的死锁,从而减少系统资源的浪费,从而最大限度地降低事务错过率。对DRTDBS进行了仿真,CART的性能优于以往的其他协议。
{"title":"CART: A Real-Time Concurrency Control Protocol","authors":"Sarvesh Pandey, Udai Shanker","doi":"10.1145/3216122.3216161","DOIUrl":"https://doi.org/10.1145/3216122.3216161","url":null,"abstract":"The Two Phase Locking with High Priority (2PL-HP) concurrency control protocol addresses the transaction scheduling issue in a distributed real-time database system (DRTDBS). Although the 2PL-HP protocol is free from priority inversion, it may suffer from the problems such as deadlock, cyclic restart, and starvation of lengthy transactions. In this paper, a Controlled Avoidance of deadlock and starvation causing Resourceful Conflict resolution between Transactions (CART) concurrency control protocol has been proposed to minimize the transactions miss percentage by reducing the wastage of system resources through avoiding the deadlock due to controlled locking and starvation to some extent by ensuring a fairness in the allocation of resources for their completion. DRTDBS is simulated and CART outperforms as compared with previous other protocols.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127581560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The combination of data and technology is having a high impact on the way we live. The world is getting smarter thanks to the quantity of collected and analyzed data. However, it is necessary to consider that such amount of data is continuously increasing and it is necessary to deal with novel requirements related to variety, volume, velocity, and veracity issues. In this paper we focus on veracity that is related to the presence of uncertain or imprecise data: errors, missing or invalid data can compromise the usefulness of the collected values. In such a scenario, new methods and techniques able to evaluate the quality of the available data are needed. In fact, the literature provides many data quality assessment and improvement techniques, especially for structured data, but in the Big Data era new algorithms have to be designed. We aim to provide an overview of the issues and challenges related to Data Quality assessment in the Big Data scenario. We also propose a possible solution developed by considering a smart city case study and we describe the lessons learned in the design and implementation phases.
{"title":"Quality awareness for a Successful Big Data Exploitation","authors":"C. Cappiello, Walter Samá, Monica Vitali","doi":"10.1145/3216122.3216124","DOIUrl":"https://doi.org/10.1145/3216122.3216124","url":null,"abstract":"The combination of data and technology is having a high impact on the way we live. The world is getting smarter thanks to the quantity of collected and analyzed data. However, it is necessary to consider that such amount of data is continuously increasing and it is necessary to deal with novel requirements related to variety, volume, velocity, and veracity issues. In this paper we focus on veracity that is related to the presence of uncertain or imprecise data: errors, missing or invalid data can compromise the usefulness of the collected values. In such a scenario, new methods and techniques able to evaluate the quality of the available data are needed. In fact, the literature provides many data quality assessment and improvement techniques, especially for structured data, but in the Big Data era new algorithms have to be designed. We aim to provide an overview of the issues and challenges related to Data Quality assessment in the Big Data scenario. We also propose a possible solution developed by considering a smart city case study and we describe the lessons learned in the design and implementation phases.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133592502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The reverse k-nearest neighbours search is a fundamental primitive in multi-dimensional (i.e. multi-attribute) databases with applications in location-based services, online recommendations, statistical classification, pat-tern recognition, graph algorithms, computer games development, and so on. Despite the relevance and popularity of the query, no solution has yet been put forward that supports it in encrypted databases while protecting at the same time the privacy of both the data and the queries. With the outsourcing of massive datasets in the cloud, it has become urgent to find ways of ensuring the fast and secure processing of this query in untrustworthy cloud computing. This paper presents searchable encryption schemes which can efficiently and securely enable the processing of the reverse k-nearest neighbours query over encrypted multi-dimensional data, including index-based search schemes which can carry out fast query response that preserves data confidentiality and query privacy. The proposed schemes resist practical attacks operating on the basis of powerful background knowledge and their efficiency is confirmed by a theoretical analysis and extensive simulation experiments.
{"title":"Secure Reverse k-Nearest Neighbours Search over Encrypted Multi-dimensional Databases","authors":"T. Tzouramanis, Y. Manolopoulos","doi":"10.1145/3216122.3216170","DOIUrl":"https://doi.org/10.1145/3216122.3216170","url":null,"abstract":"The reverse k-nearest neighbours search is a fundamental primitive in multi-dimensional (i.e. multi-attribute) databases with applications in location-based services, online recommendations, statistical classification, pat-tern recognition, graph algorithms, computer games development, and so on. Despite the relevance and popularity of the query, no solution has yet been put forward that supports it in encrypted databases while protecting at the same time the privacy of both the data and the queries. With the outsourcing of massive datasets in the cloud, it has become urgent to find ways of ensuring the fast and secure processing of this query in untrustworthy cloud computing. This paper presents searchable encryption schemes which can efficiently and securely enable the processing of the reverse k-nearest neighbours query over encrypted multi-dimensional data, including index-based search schemes which can carry out fast query response that preserves data confidentiality and query privacy. The proposed schemes resist practical attacks operating on the basis of powerful background knowledge and their efficiency is confirmed by a theoretical analysis and extensive simulation experiments.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133918862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed systems provide users with powerful capabilities to store and process their data in third-party machines. However, the privacy of the outsourced data is not guaranteed. One solution for protecting the user data against privacy attacks is to encrypt the sensitive data before sending to the nodes of the distributed system. Then, the main problem is to evaluate user queries over the encrypted data. In this paper, we propose a complete solution for processing top-k queries over encrypted databases stored across the nodes of a distributed system. The problem of distributed top-k query processing has been well addressed over plaintext (non encrypted) data. However, the proposed approaches cannot be used in the case of encrypted data.
{"title":"Top-k Query Processing over Distributed Sensitive Data","authors":"S. Mahboubi, Reza Akbarinia, P. Valduriez","doi":"10.1145/3216122.3216153","DOIUrl":"https://doi.org/10.1145/3216122.3216153","url":null,"abstract":"Distributed systems provide users with powerful capabilities to store and process their data in third-party machines. However, the privacy of the outsourced data is not guaranteed. One solution for protecting the user data against privacy attacks is to encrypt the sensitive data before sending to the nodes of the distributed system. Then, the main problem is to evaluate user queries over the encrypted data. In this paper, we propose a complete solution for processing top-k queries over encrypted databases stored across the nodes of a distributed system. The problem of distributed top-k query processing has been well addressed over plaintext (non encrypted) data. However, the proposed approaches cannot be used in the case of encrypted data.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131983197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Cuzzocrea, Francesco Folino, M. Guarascio, L. Pontieri
In many application contexts, a business process' executions are subject to performance constraints expressed in an aggregated form, usually over predefined time windows, and detecting a likely violation to such a constraint in advance could help undertake corrective measures for preventing it. This paper illustrates a prediction-aware event processing framework that addresses the problem of estimating whether the process instances of a given (unfinished) window w will violate an aggregate performance constraint, based on the continuous learning and application of an ensemble of models, capable each of making and integrating two kinds of predictions: single-instance predictions concerning the ongoing process instances of w, and time-series predictions concerning the "future" process instances of w (i.e. those that have not started yet, but will start by the end of w). Notably, the framework can continuously update the ensemble, fully exploiting the raw event data produced by the process under monitoring, suitably lifted to an adequate level of abstraction. The framework has been validated against historical event data coming from real-life business processes, showing promising results in terms of both accuracy and efficiency.
{"title":"A Predictive Learning Framework for Monitoring Aggregated Performance Indicators over Business Process Events","authors":"A. Cuzzocrea, Francesco Folino, M. Guarascio, L. Pontieri","doi":"10.1145/3216122.3216143","DOIUrl":"https://doi.org/10.1145/3216122.3216143","url":null,"abstract":"In many application contexts, a business process' executions are subject to performance constraints expressed in an aggregated form, usually over predefined time windows, and detecting a likely violation to such a constraint in advance could help undertake corrective measures for preventing it. This paper illustrates a prediction-aware event processing framework that addresses the problem of estimating whether the process instances of a given (unfinished) window w will violate an aggregate performance constraint, based on the continuous learning and application of an ensemble of models, capable each of making and integrating two kinds of predictions: single-instance predictions concerning the ongoing process instances of w, and time-series predictions concerning the \"future\" process instances of w (i.e. those that have not started yet, but will start by the end of w). Notably, the framework can continuously update the ensemble, fully exploiting the raw event data produced by the process under monitoring, suitably lifted to an adequate level of abstraction. The framework has been validated against historical event data coming from real-life business processes, showing promising results in terms of both accuracy and efficiency.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132833601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}