This paper presents a context-driven query system for urban computing where users are responsible for defining their own restrictions over which datalog-like queries are built. Instead of imposing constraints on databases, our goal is to filter consistent data during the query process. Our query language is able to express aggregates in recursive rules, allowing it to capture network properties typical of graph analysis. This paper presents our query system and analyzes its capabilities using use cases in Urban Computing.
{"title":"A Context-driven Querying System for Urban Graph Analysis","authors":"Jacques Chabin, L. Gomes, Mirian Halfeld-Ferrari","doi":"10.1145/3216122.3216148","DOIUrl":"https://doi.org/10.1145/3216122.3216148","url":null,"abstract":"This paper presents a context-driven query system for urban computing where users are responsible for defining their own restrictions over which datalog-like queries are built. Instead of imposing constraints on databases, our goal is to filter consistent data during the query process. Our query language is able to express aggregates in recursive rules, allowing it to capture network properties typical of graph analysis. This paper presents our query system and analyzes its capabilities using use cases in Urban Computing.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121946673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional metadata model designed for an industrial heritage data lake presenting a lack of schema evolutivity, we propose in this paper to use ensemble modeling, and more precisely a data vault, to address this issue. To illustrate the feasibility of this approach, we instantiate our metadata conceptual model into relational and document-oriented logical and physical models, respectively. We also compare the physical models in terms of metadata storage and query response time.
{"title":"Modeling Data Lake Metadata with a Data Vault","authors":"I. D. Nogueira, Maram Romdhane, J. Darmont","doi":"10.1145/3216122.3216130","DOIUrl":"https://doi.org/10.1145/3216122.3216130","url":null,"abstract":"With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional metadata model designed for an industrial heritage data lake presenting a lack of schema evolutivity, we propose in this paper to use ensemble modeling, and more precisely a data vault, to address this issue. To illustrate the feasibility of this approach, we instantiate our metadata conceptual model into relational and document-oriented logical and physical models, respectively. We also compare the physical models in terms of metadata storage and query response time.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123855262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When exploring data or communicating it to other people, data is currently visualized through flat diagrams, tables, graphs, etc. Visualization of data in three dimensions (3D) offers more immersive and intuitive representations of the data and, through the added dimension, allows for more compact representations. Still, when representing large amounts of data in 3D, a fine control of the layout becomes a must. Current tools for 3D visualization do not allow for easy and fine tuned control of this layout. SuperSQL is an extension of the SQL language allowing users to declaratively and concisely specify the layout of, and generate structured documents such as web pages. In this work we extend SuperSQL to allow the generation of 3D data representations in the Unity game engine. With this system, users can represent their data through basic shapes, colors, and animations, or even their own custom 3D assets, by writing simple SQL-like queries.
{"title":"3D Visualization of data using SuperSQL and Unity","authors":"Tatsuki Fujimoto, Kento Goto, Motomichi Toyama","doi":"10.1145/3216122.3216145","DOIUrl":"https://doi.org/10.1145/3216122.3216145","url":null,"abstract":"When exploring data or communicating it to other people, data is currently visualized through flat diagrams, tables, graphs, etc. Visualization of data in three dimensions (3D) offers more immersive and intuitive representations of the data and, through the added dimension, allows for more compact representations. Still, when representing large amounts of data in 3D, a fine control of the layout becomes a must. Current tools for 3D visualization do not allow for easy and fine tuned control of this layout. SuperSQL is an extension of the SQL language allowing users to declaratively and concisely specify the layout of, and generate structured documents such as web pages. In this work we extend SuperSQL to allow the generation of 3D data representations in the Unity game engine. With this system, users can represent their data through basic shapes, colors, and animations, or even their own custom 3D assets, by writing simple SQL-like queries.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Vaira, Mario Alessandro Bochicchio, Matteo Conte, Francesco Margiotta Casaluci, A. Melpignano
Artificial intelligence is transforming healthcare with a profound paradigm shift impacting diagnostic techniques, drug discovery, health analytics, interventions and much more. In this paper we focus on exploiting AI-based chatbot systems, mainly based on machine learning algorithms and Natural Language Processing, to understand and respond to needs of patients and their families. In particular, we describe an application scenario for an AI-chatbot delivering support to pregnant women, mothers, and families with young children, by giving them help and instructions in relevant situations.
{"title":"MamaBot: a System based on ML and NLP for supporting Women and Families during Pregnancy","authors":"L. Vaira, Mario Alessandro Bochicchio, Matteo Conte, Francesco Margiotta Casaluci, A. Melpignano","doi":"10.1145/3216122.3216173","DOIUrl":"https://doi.org/10.1145/3216122.3216173","url":null,"abstract":"Artificial intelligence is transforming healthcare with a profound paradigm shift impacting diagnostic techniques, drug discovery, health analytics, interventions and much more. In this paper we focus on exploiting AI-based chatbot systems, mainly based on machine learning algorithms and Natural Language Processing, to understand and respond to needs of patients and their families. In particular, we describe an application scenario for an AI-chatbot delivering support to pregnant women, mothers, and families with young children, by giving them help and instructions in relevant situations.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123308747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Extract-Transform-Load (ETL) process in data warehousing involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex transformations involving sources and targets that use different schemas, databases, and technologies, which make ETL implementations fault-prone. In this paper, we present an approach for validating ETL processes using automated balancing tests that check for various types of discrepancies between the source and target data. We formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Our approach uses the rules provided in the ETL specifications to generate source-to-target mappings, from which balancing test assertions are generated for each property. We evaluated the approach on a real-world health data warehouse project and revealed 11 previously undetected faults. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse.
{"title":"An Approach for Testing the Extract-Transform-Load Process in Data Warehouse Systems","authors":"Hajar Homayouni, Sudipto Ghosh, I. Ray","doi":"10.1145/3216122.3216149","DOIUrl":"https://doi.org/10.1145/3216122.3216149","url":null,"abstract":"The Extract-Transform-Load (ETL) process in data warehousing involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex transformations involving sources and targets that use different schemas, databases, and technologies, which make ETL implementations fault-prone. In this paper, we present an approach for validating ETL processes using automated balancing tests that check for various types of discrepancies between the source and target data. We formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Our approach uses the rules provided in the ETL specifications to generate source-to-target mappings, from which balancing test assertions are generated for each property. We evaluated the approach on a real-world health data warehouse project and revealed 11 previously undetected faults. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129152378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgio Baldassarre, Paolo Lo Giudice, Lorenzo Musarella, D. Ursino
The Internet of Things (IoT) is currently considered the new frontier of the Internet. One of the most effective ways to investigate and implement IoT is based on the use of the social network paradigm. In the last years, social network researchers have introduced new models capable of capturing the growing complexity of this scenario. One of the most known of them is the Social Internetworking System, which models a scenario comprising several related social networks. In this paper, we investigate the possibility of applying the ideas characterizing the Social Internetworking System to IoT and we propose a new paradigm capable of modelling this scenario and of favoring the cooperation of objects belonging to different IoTs. Furthermore, in order to give an idea of both the potentialities and the complexity of this new paradigm, we illustrate in more detail one of the most interesting issues regarding it, namely the redefinition of the betweenness centrality measure.
{"title":"A paradigm for the cooperation of objects belonging to different IoTs","authors":"Giorgio Baldassarre, Paolo Lo Giudice, Lorenzo Musarella, D. Ursino","doi":"10.1145/3216122.3216171","DOIUrl":"https://doi.org/10.1145/3216122.3216171","url":null,"abstract":"The Internet of Things (IoT) is currently considered the new frontier of the Internet. One of the most effective ways to investigate and implement IoT is based on the use of the social network paradigm. In the last years, social network researchers have introduced new models capable of capturing the growing complexity of this scenario. One of the most known of them is the Social Internetworking System, which models a scenario comprising several related social networks. In this paper, we investigate the possibility of applying the ideas characterizing the Social Internetworking System to IoT and we propose a new paradigm capable of modelling this scenario and of favoring the cooperation of objects belonging to different IoTs. Furthermore, in order to give an idea of both the potentialities and the complexity of this new paradigm, we illustrate in more detail one of the most interesting issues regarding it, namely the redefinition of the betweenness centrality measure.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131257389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Location dependent information services (LDIS) can be characterized as the applications that coordinate a cell phone's area or position with other data to give enhanced value of services to the client at right place in the right time from anywhere. In this paper, an algorithm Caching Efficiency with Next Location Prediction Based (CELPB) has been developed that uses a newly developed metric i.e. caching efficiency with next location prediction (CELP) for the computation of valid scope in prediction interval. This metric takes account the future movement behavior of client with the help of Sequential Pattern Mining and Clustering. The mobility rules have also been framed for the prediction of an accurate next location, which can be used in estimating the future movement path (edges) of client if he reached in valid scope area of any data item. Simulation results show that proposed policy achieves up to 10 percent performance improvement compared to earlier cache invalidation policy (CEBAB) for LDIS.
{"title":"CELPB: A Cache Invalidation Policy for Location Dependent Data in Mobile Environment","authors":"Ajay K. Gupta, Udai Shanker","doi":"10.1145/3216122.3216147","DOIUrl":"https://doi.org/10.1145/3216122.3216147","url":null,"abstract":"Location dependent information services (LDIS) can be characterized as the applications that coordinate a cell phone's area or position with other data to give enhanced value of services to the client at right place in the right time from anywhere. In this paper, an algorithm Caching Efficiency with Next Location Prediction Based (CELPB) has been developed that uses a newly developed metric i.e. caching efficiency with next location prediction (CELP) for the computation of valid scope in prediction interval. This metric takes account the future movement behavior of client with the help of Sequential Pattern Mining and Clustering. The mobility rules have also been framed for the prediction of an accurate next location, which can be used in estimating the future movement path (edges) of client if he reached in valid scope area of any data item. Simulation results show that proposed policy achieves up to 10 percent performance improvement compared to earlier cache invalidation policy (CEBAB) for LDIS.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117334629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our work focuses on inductive transfer learning, a setting in which one assumes that both source and target tasks share the same features and label spaces. We demonstrate that transfer learning can be successfully used for feature reduction and hence for more efficient classification performance. Further, our experiments show that this approach increases the precision of the classification task as well.
{"title":"Feature Reduction Improves Classification Accuracy in Healthcare","authors":"Maha Asiri, Hamid R. Nemati, F. Sadri","doi":"10.1145/3216122.3216165","DOIUrl":"https://doi.org/10.1145/3216122.3216165","url":null,"abstract":"Our work focuses on inductive transfer learning, a setting in which one assumes that both source and target tasks share the same features and label spaces. We demonstrate that transfer learning can be successfully used for feature reduction and hence for more efficient classification performance. Further, our experiments show that this approach increases the precision of the classification task as well.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125584957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently there has been an effort to solve the problems caused by the infamous NULL in relational databases, by systematically applying Kleene's three-valued logic to SQL. The third truth-value is unknown. In this paper we show that by using a fourth truth-value inconsistent, all the advantages of the three-valued approach can be retained, and that negation can be given a constructive, intuitionistic meaning that allows negative knowledge to be specified in the logic explicitly, without having to resort to extra-logical notions of stratification or to non-monotonic reasoning. The four-valued approach also allows for a computationally efficient treatment of query answering in the presence of inconsistencies. This is in contrast to the computationally intractable repair approach to inconsistency management. From a practical view-point we show that the Cylindric Star Algebra, developed by the authors, is particularly well suited for evaluating First Order queries on four-valued databases, and that the framework of data exchange can smoothly adapted to the four truth-values.
{"title":"A useful four-valued database logic","authors":"G. Grahne, A. Moallemi","doi":"10.1145/3216122.3216157","DOIUrl":"https://doi.org/10.1145/3216122.3216157","url":null,"abstract":"Recently there has been an effort to solve the problems caused by the infamous NULL in relational databases, by systematically applying Kleene's three-valued logic to SQL. The third truth-value is unknown. In this paper we show that by using a fourth truth-value inconsistent, all the advantages of the three-valued approach can be retained, and that negation can be given a constructive, intuitionistic meaning that allows negative knowledge to be specified in the logic explicitly, without having to resort to extra-logical notions of stratification or to non-monotonic reasoning. The four-valued approach also allows for a computationally efficient treatment of query answering in the presence of inconsistencies. This is in contrast to the computationally intractable repair approach to inconsistency management. From a practical view-point we show that the Cylindric Star Algebra, developed by the authors, is particularly well suited for evaluating First Order queries on four-valued databases, and that the framework of data exchange can smoothly adapted to the four truth-values.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134088734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Influenza surveillance through social media data is becoming an important research topic because it could enhance the capabilities of official surveillance systems in monitoring the outbreak of seasonal flu, by providing healthcare organization with improved situational awareness. In this paper, the two influenza seasons 2016-2017 and 2017-2018, restricted to Italy, are investigated by analyzing the tweets posted by users regarding influenza-like illness. Two types of analysis are performed. The first studies the correlation between the tweets containing the most frequent flu related words with the data provided by the Italian InfluNet surveillance system. The second one examines the sentiment of people on the medicines used to heal flu. We show that there is a strict correlation between the reports published on the InfluNet system, and the contents posted by Twitter users about their symptoms and health state. Moreover, we found that the sentiment expressed by people regarding the treatment, in terms of medicines, taken to heal seems rather negative.
{"title":"Twitter-based Influenza Surveillance: An Analysis of the 2016-2017 and 2017-2018 Seasons in Italy","authors":"C. Comito, Agostino Forestiero, C. Pizzuti","doi":"10.1145/3216122.3216128","DOIUrl":"https://doi.org/10.1145/3216122.3216128","url":null,"abstract":"Influenza surveillance through social media data is becoming an important research topic because it could enhance the capabilities of official surveillance systems in monitoring the outbreak of seasonal flu, by providing healthcare organization with improved situational awareness. In this paper, the two influenza seasons 2016-2017 and 2017-2018, restricted to Italy, are investigated by analyzing the tweets posted by users regarding influenza-like illness. Two types of analysis are performed. The first studies the correlation between the tweets containing the most frequent flu related words with the data provided by the Italian InfluNet surveillance system. The second one examines the sentiment of people on the medicines used to heal flu. We show that there is a strict correlation between the reports published on the InfluNet system, and the contents posted by Twitter users about their symptoms and health state. Moreover, we found that the sentiment expressed by people regarding the treatment, in terms of medicines, taken to heal seems rather negative.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133672488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}