After a brief survey on well established methods for image classification, we focus on a recently proposed Multiple Istance Learning (MIL) method which is suitable for applications in image processing. In particular the method is based on a mixed integer nonlinear formulation of the optimization problem to be solved for MIL purposes. The algorithm is applied to a set of color images (Red, Green, Blue, RGB) with the objective of classifying the images containing some specific pattern. The results of our experimentation are reported.
{"title":"A Multiple Instance Learning Algorithm for Color Images Classification","authors":"A. Astorino, A. Fuduli, M. Gaudioso, E. Vocaturo","doi":"10.1145/3216122.3216144","DOIUrl":"https://doi.org/10.1145/3216122.3216144","url":null,"abstract":"After a brief survey on well established methods for image classification, we focus on a recently proposed Multiple Istance Learning (MIL) method which is suitable for applications in image processing. In particular the method is based on a mixed integer nonlinear formulation of the optimization problem to be solved for MIL purposes. The algorithm is applied to a set of color images (Red, Green, Blue, RGB) with the objective of classifying the images containing some specific pattern. The results of our experimentation are reported.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131004196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent surge in ancient scripts has resulted in huge image libraries of ancient texts. Data mining of the collected images enables the study of the evolution of these ancient scripts. In particular, the origin of the Indus Valley script is highly debated. We use convolutional neural networks to test which Phoenician alphabet letters and Brahmi symbols are closest to the Indus Valley script symbols. Surprisingly, our analysis shows that overall the Phoenician alphabet is much closer than the Brahmi script to the Indus Valley script symbols.
{"title":"Data Mining Ancient Script Image Data Using Convolutional Neural Networks","authors":"Shruti Daggumati, P. Revesz","doi":"10.1145/3216122.3216163","DOIUrl":"https://doi.org/10.1145/3216122.3216163","url":null,"abstract":"The recent surge in ancient scripts has resulted in huge image libraries of ancient texts. Data mining of the collected images enables the study of the evolution of these ancient scripts. In particular, the origin of the Indus Valley script is highly debated. We use convolutional neural networks to test which Phoenician alphabet letters and Brahmi symbols are closest to the Indus Valley script symbols. Surprisingly, our analysis shows that overall the Phoenician alphabet is much closer than the Brahmi script to the Indus Valley script symbols.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116734765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Count constraint is a data dependency that requires the results of given count operations on a relation to be within a certain range. By means of count constraints a new decisional problem, called the Inverse OLAP, has been recently introduced: given a flat fact table, does there exist an instance satisfying a set of given count constraints? This paper focuses on a special case of Inverse OLAP, called Inverse Tree-OLAP, for which the flat fact table key is modeled by a Dimensional Fact Model (DFM) with a tree structure. The count constraints define aggregation patterns to be respected by both the many-to-many relationship among the basic dimensions and the one-to-many relationships within dimension hierarchies. A count constraint is required to have a particular structure so that the problem of handling fact table projections with duplicates is avoided. The simplified structure enables the invention of an effective method for its solution that consists of three main steps: (1) using some of the count constraints to extract a subproblem that is formulated as a known data mining problem (inverse frequent itemset mining), (2) solving the subproblem using a recent method that has been shown to be effective in practical situations also for large size instances and (3) enforcing the remaining count constraints on the solution returned by step 2 using a system of linear equations. The overall proposed approach can be effectively used to generate OLAP cubes for benchmarking that reflect patterns of real datasets.
{"title":"The Inverse Tree-OLAP Problem: Definitions, Models, Complexity Analysis, and a Possible Solution","authors":"D. Saccá, Edoardo Serra, A. Cuzzocrea","doi":"10.1145/3216122.3216129","DOIUrl":"https://doi.org/10.1145/3216122.3216129","url":null,"abstract":"Count constraint is a data dependency that requires the results of given count operations on a relation to be within a certain range. By means of count constraints a new decisional problem, called the Inverse OLAP, has been recently introduced: given a flat fact table, does there exist an instance satisfying a set of given count constraints? This paper focuses on a special case of Inverse OLAP, called Inverse Tree-OLAP, for which the flat fact table key is modeled by a Dimensional Fact Model (DFM) with a tree structure. The count constraints define aggregation patterns to be respected by both the many-to-many relationship among the basic dimensions and the one-to-many relationships within dimension hierarchies. A count constraint is required to have a particular structure so that the problem of handling fact table projections with duplicates is avoided. The simplified structure enables the invention of an effective method for its solution that consists of three main steps: (1) using some of the count constraints to extract a subproblem that is formulated as a known data mining problem (inverse frequent itemset mining), (2) solving the subproblem using a recent method that has been shown to be effective in practical situations also for large size instances and (3) enforcing the remaining count constraints on the solution returned by step 2 using a system of linear equations. The overall proposed approach can be effectively used to generate OLAP cubes for benchmarking that reflect patterns of real datasets.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CalcuList (Calculator with List manipulation), is an educational language for teaching functional programming extended with some imperative and side-effect features, which are enabled under explicit request by the programmer. In addition to strings and lists, the language natively supports json objects and may be effectively used to implement generic MapReduce recursive procedures to manipulate json lists. MapReduce is a popular model in distributed computing that underpins many NoSQL systems and a json list can be thought of as a dataset of a document NoSQL datastore. It turns out that CalcuList can be used as a tool for teaching advanced query algorithms for document datastores such as MongoDB and CouchDB.
{"title":"Using CalcuList To MapReduce Jsons","authors":"D. Saccá, A. Furfaro","doi":"10.1145/3216122.3216164","DOIUrl":"https://doi.org/10.1145/3216122.3216164","url":null,"abstract":"CalcuList (Calculator with List manipulation), is an educational language for teaching functional programming extended with some imperative and side-effect features, which are enabled under explicit request by the programmer. In addition to strings and lists, the language natively supports json objects and may be effectively used to implement generic MapReduce recursive procedures to manipulate json lists. MapReduce is a popular model in distributed computing that underpins many NoSQL systems and a json list can be thought of as a dataset of a document NoSQL datastore. It turns out that CalcuList can be used as a tool for teaching advanced query algorithms for document datastores such as MongoDB and CouchDB.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131509690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-world networks typically evolve through time, which means there are various events occurring, such as edge additions or attribute changes. In order to understand the events, one must be able to discriminate between different events. Existing approaches typically discriminate whole graphs, which are, in addition, mostly static. We propose a new algorithm WalDis for mining discriminate patterns of events in dynamic graphs. This algorithm uses sampling by random walks and greedy approaches in order to keep the performance high. Furthermore, it does not require the time to be discretized as other algorithms commonly do. We have evaluated the algorithm on three real-world graph datasets.
{"title":"WalDis: Mining Discriminative Patterns within Dynamic Graphs","authors":"Karel Vaculík, L. Popelínský","doi":"10.1145/3216122.3216172","DOIUrl":"https://doi.org/10.1145/3216122.3216172","url":null,"abstract":"Real-world networks typically evolve through time, which means there are various events occurring, such as edge additions or attribute changes. In order to understand the events, one must be able to discriminate between different events. Existing approaches typically discriminate whole graphs, which are, in addition, mostly static. We propose a new algorithm WalDis for mining discriminate patterns of events in dynamic graphs. This algorithm uses sampling by random walks and greedy approaches in order to keep the performance high. Furthermore, it does not require the time to be discretized as other algorithms commonly do. We have evaluated the algorithm on three real-world graph datasets.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121735364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Two Phase Locking with High Priority (2PL-HP) concurrency control protocol addresses the transaction scheduling issue in a distributed real-time database system (DRTDBS). Although the 2PL-HP protocol is free from priority inversion, it may suffer from the problems such as deadlock, cyclic restart, and starvation of lengthy transactions. In this paper, a Controlled Avoidance of deadlock and starvation causing Resourceful Conflict resolution between Transactions (CART) concurrency control protocol has been proposed to minimize the transactions miss percentage by reducing the wastage of system resources through avoiding the deadlock due to controlled locking and starvation to some extent by ensuring a fairness in the allocation of resources for their completion. DRTDBS is simulated and CART outperforms as compared with previous other protocols.
高优先级两阶段锁定(2PL-HP)并发控制协议解决了分布式实时数据库系统(DRTDBS)中的事务调度问题。尽管2PL-HP协议没有优先级反转,但它可能会遇到死锁、循环重启和长时间事务耗尽等问题。本文提出了一种可控避免死锁和饥饿导致的事务间资源冲突解决(resource - conflictresolution between Transactions, CART)并发控制协议,通过保证资源分配的公平性,在一定程度上避免可控锁定和饥饿导致的死锁,从而减少系统资源的浪费,从而最大限度地降低事务错过率。对DRTDBS进行了仿真,CART的性能优于以往的其他协议。
{"title":"CART: A Real-Time Concurrency Control Protocol","authors":"Sarvesh Pandey, Udai Shanker","doi":"10.1145/3216122.3216161","DOIUrl":"https://doi.org/10.1145/3216122.3216161","url":null,"abstract":"The Two Phase Locking with High Priority (2PL-HP) concurrency control protocol addresses the transaction scheduling issue in a distributed real-time database system (DRTDBS). Although the 2PL-HP protocol is free from priority inversion, it may suffer from the problems such as deadlock, cyclic restart, and starvation of lengthy transactions. In this paper, a Controlled Avoidance of deadlock and starvation causing Resourceful Conflict resolution between Transactions (CART) concurrency control protocol has been proposed to minimize the transactions miss percentage by reducing the wastage of system resources through avoiding the deadlock due to controlled locking and starvation to some extent by ensuring a fairness in the allocation of resources for their completion. DRTDBS is simulated and CART outperforms as compared with previous other protocols.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127581560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The combination of data and technology is having a high impact on the way we live. The world is getting smarter thanks to the quantity of collected and analyzed data. However, it is necessary to consider that such amount of data is continuously increasing and it is necessary to deal with novel requirements related to variety, volume, velocity, and veracity issues. In this paper we focus on veracity that is related to the presence of uncertain or imprecise data: errors, missing or invalid data can compromise the usefulness of the collected values. In such a scenario, new methods and techniques able to evaluate the quality of the available data are needed. In fact, the literature provides many data quality assessment and improvement techniques, especially for structured data, but in the Big Data era new algorithms have to be designed. We aim to provide an overview of the issues and challenges related to Data Quality assessment in the Big Data scenario. We also propose a possible solution developed by considering a smart city case study and we describe the lessons learned in the design and implementation phases.
{"title":"Quality awareness for a Successful Big Data Exploitation","authors":"C. Cappiello, Walter Samá, Monica Vitali","doi":"10.1145/3216122.3216124","DOIUrl":"https://doi.org/10.1145/3216122.3216124","url":null,"abstract":"The combination of data and technology is having a high impact on the way we live. The world is getting smarter thanks to the quantity of collected and analyzed data. However, it is necessary to consider that such amount of data is continuously increasing and it is necessary to deal with novel requirements related to variety, volume, velocity, and veracity issues. In this paper we focus on veracity that is related to the presence of uncertain or imprecise data: errors, missing or invalid data can compromise the usefulness of the collected values. In such a scenario, new methods and techniques able to evaluate the quality of the available data are needed. In fact, the literature provides many data quality assessment and improvement techniques, especially for structured data, but in the Big Data era new algorithms have to be designed. We aim to provide an overview of the issues and challenges related to Data Quality assessment in the Big Data scenario. We also propose a possible solution developed by considering a smart city case study and we describe the lessons learned in the design and implementation phases.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133592502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The reverse k-nearest neighbours search is a fundamental primitive in multi-dimensional (i.e. multi-attribute) databases with applications in location-based services, online recommendations, statistical classification, pat-tern recognition, graph algorithms, computer games development, and so on. Despite the relevance and popularity of the query, no solution has yet been put forward that supports it in encrypted databases while protecting at the same time the privacy of both the data and the queries. With the outsourcing of massive datasets in the cloud, it has become urgent to find ways of ensuring the fast and secure processing of this query in untrustworthy cloud computing. This paper presents searchable encryption schemes which can efficiently and securely enable the processing of the reverse k-nearest neighbours query over encrypted multi-dimensional data, including index-based search schemes which can carry out fast query response that preserves data confidentiality and query privacy. The proposed schemes resist practical attacks operating on the basis of powerful background knowledge and their efficiency is confirmed by a theoretical analysis and extensive simulation experiments.
{"title":"Secure Reverse k-Nearest Neighbours Search over Encrypted Multi-dimensional Databases","authors":"T. Tzouramanis, Y. Manolopoulos","doi":"10.1145/3216122.3216170","DOIUrl":"https://doi.org/10.1145/3216122.3216170","url":null,"abstract":"The reverse k-nearest neighbours search is a fundamental primitive in multi-dimensional (i.e. multi-attribute) databases with applications in location-based services, online recommendations, statistical classification, pat-tern recognition, graph algorithms, computer games development, and so on. Despite the relevance and popularity of the query, no solution has yet been put forward that supports it in encrypted databases while protecting at the same time the privacy of both the data and the queries. With the outsourcing of massive datasets in the cloud, it has become urgent to find ways of ensuring the fast and secure processing of this query in untrustworthy cloud computing. This paper presents searchable encryption schemes which can efficiently and securely enable the processing of the reverse k-nearest neighbours query over encrypted multi-dimensional data, including index-based search schemes which can carry out fast query response that preserves data confidentiality and query privacy. The proposed schemes resist practical attacks operating on the basis of powerful background knowledge and their efficiency is confirmed by a theoretical analysis and extensive simulation experiments.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133918862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Vaira, Mario Alessandro Bochicchio, Matteo Conte, Francesco Margiotta Casaluci, A. Melpignano
Artificial intelligence is transforming healthcare with a profound paradigm shift impacting diagnostic techniques, drug discovery, health analytics, interventions and much more. In this paper we focus on exploiting AI-based chatbot systems, mainly based on machine learning algorithms and Natural Language Processing, to understand and respond to needs of patients and their families. In particular, we describe an application scenario for an AI-chatbot delivering support to pregnant women, mothers, and families with young children, by giving them help and instructions in relevant situations.
{"title":"MamaBot: a System based on ML and NLP for supporting Women and Families during Pregnancy","authors":"L. Vaira, Mario Alessandro Bochicchio, Matteo Conte, Francesco Margiotta Casaluci, A. Melpignano","doi":"10.1145/3216122.3216173","DOIUrl":"https://doi.org/10.1145/3216122.3216173","url":null,"abstract":"Artificial intelligence is transforming healthcare with a profound paradigm shift impacting diagnostic techniques, drug discovery, health analytics, interventions and much more. In this paper we focus on exploiting AI-based chatbot systems, mainly based on machine learning algorithms and Natural Language Processing, to understand and respond to needs of patients and their families. In particular, we describe an application scenario for an AI-chatbot delivering support to pregnant women, mothers, and families with young children, by giving them help and instructions in relevant situations.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123308747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a context-driven query system for urban computing where users are responsible for defining their own restrictions over which datalog-like queries are built. Instead of imposing constraints on databases, our goal is to filter consistent data during the query process. Our query language is able to express aggregates in recursive rules, allowing it to capture network properties typical of graph analysis. This paper presents our query system and analyzes its capabilities using use cases in Urban Computing.
{"title":"A Context-driven Querying System for Urban Graph Analysis","authors":"Jacques Chabin, L. Gomes, Mirian Halfeld-Ferrari","doi":"10.1145/3216122.3216148","DOIUrl":"https://doi.org/10.1145/3216122.3216148","url":null,"abstract":"This paper presents a context-driven query system for urban computing where users are responsible for defining their own restrictions over which datalog-like queries are built. Instead of imposing constraints on databases, our goal is to filter consistent data during the query process. Our query language is able to express aggregates in recursive rules, allowing it to capture network properties typical of graph analysis. This paper presents our query system and analyzes its capabilities using use cases in Urban Computing.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121946673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}