This paper proposes innovative theories and theorems in the context of a state-of-the-art paper that computes privacy-preserving OLAP cubes via modeling and analyzing query workloads. The work contributes to actual literature by devising a solid theoretical framework that can be used for future optimization opportunities.
{"title":"Privacy-Preserving OLAP via Modeling and Analysis of Query Workloads: Innovative Theories and Theorems","authors":"A. Cuzzocrea","doi":"10.1145/3603719.3603735","DOIUrl":"https://doi.org/10.1145/3603719.3603735","url":null,"abstract":"This paper proposes innovative theories and theorems in the context of a state-of-the-art paper that computes privacy-preserving OLAP cubes via modeling and analyzing query workloads. The work contributes to actual literature by devising a solid theoretical framework that can be used for future optimization opportunities.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127515627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quan Wan, Shuo Yin, Xiangyue Liu, Jianliang Gao, Yuhui Zhong
In the field of financial prediction, most studies focus on individual stocks or stock indices. Stock sectors are collections of stocks with similar characteristics and the indices of sectors have more stable trends and predictability compared to individual stocks. Additionally, stock sectors are subsets of stock indices, which implies that investment portfolios based on stock sectors have a greater potential to achieve excess returns. In this paper, we propose a new method, Time-aware Graph Structure Learning Network (TGSLN), to address the problem of stock sector ranking recommendation. In this model, we use an indicator called Relative Price Strength (RPS) to describe the ranking change trend of the sectors. To construct the inherent connection between sectors, we construct a multi-variable time series that consists of multi-scale RPS sequences and effective indicators filtered through the factor selector. We also build a stock sector relation graph based on authoritative stock sector classifications. Specially, we design a time-aware graph structure learner, which can mine the sector relations from time series, and enhance the initial graph through graph fusion. Our model outperforms state-of-the-art baselines in both A-share and NASDAQ markets.
{"title":"TGSLN : Time-aware Graph Structure Learning Network for Multi-variates Stock Sector Ranking Recommendation","authors":"Quan Wan, Shuo Yin, Xiangyue Liu, Jianliang Gao, Yuhui Zhong","doi":"10.1145/3603719.3603741","DOIUrl":"https://doi.org/10.1145/3603719.3603741","url":null,"abstract":"In the field of financial prediction, most studies focus on individual stocks or stock indices. Stock sectors are collections of stocks with similar characteristics and the indices of sectors have more stable trends and predictability compared to individual stocks. Additionally, stock sectors are subsets of stock indices, which implies that investment portfolios based on stock sectors have a greater potential to achieve excess returns. In this paper, we propose a new method, Time-aware Graph Structure Learning Network (TGSLN), to address the problem of stock sector ranking recommendation. In this model, we use an indicator called Relative Price Strength (RPS) to describe the ranking change trend of the sectors. To construct the inherent connection between sectors, we construct a multi-variable time series that consists of multi-scale RPS sequences and effective indicators filtered through the factor selector. We also build a stock sector relation graph based on authoritative stock sector classifications. Specially, we design a time-aware graph structure learner, which can mine the sector relations from time series, and enhance the initial graph through graph fusion. Our model outperforms state-of-the-art baselines in both A-share and NASDAQ markets.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127245073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikayla Biggs, Yaohua Wang, N. Soni, S. Priya, G. Bathla, G. Canahuate
Malignant brain tumors including parenchymal metastatic (MET) lesions, glioblastomas (GBM), and lymphomas (LYM) account for 29.7% of brain cancers. However, the characterization of these tumors from MRI imaging is difficult due to the similarity of their radiologically observed image features. Radiomics is the extraction of quantitative imaging features to characterize tumor intensity, shape, and texture. Applying machine learning over radiomic features could aid diagnostics by improving the classification of these common brain tumors. However, since the number of radiomic features is typically larger than the number of patients in the study, dimensionality reduction is needed to balance feature dimensionality and model complexity. Autoencoders are a form of unsupervised representation learning that can be used for dimensionality reduction. It is similar to PCA but uses a more complex and non-linear model to learn a compact latent space. In this work, we examine the effectiveness of autoencoders for dimensionality reduction on the radiomic feature space of multiparametric MRI images and the classification of malignant brain tumors: GBM, LYM, and MET. We further aim to address the class imbalances imposed by the rarity of lymphomas by examining different approaches to increase overall predictive performance through multiclass decomposition strategies.
{"title":"Evaluating Autoencoders for Dimensionality Reduction of MRI-derived Radiomics and Classification of Malignant Brain Tumors","authors":"Mikayla Biggs, Yaohua Wang, N. Soni, S. Priya, G. Bathla, G. Canahuate","doi":"10.1145/3603719.3603737","DOIUrl":"https://doi.org/10.1145/3603719.3603737","url":null,"abstract":"Malignant brain tumors including parenchymal metastatic (MET) lesions, glioblastomas (GBM), and lymphomas (LYM) account for 29.7% of brain cancers. However, the characterization of these tumors from MRI imaging is difficult due to the similarity of their radiologically observed image features. Radiomics is the extraction of quantitative imaging features to characterize tumor intensity, shape, and texture. Applying machine learning over radiomic features could aid diagnostics by improving the classification of these common brain tumors. However, since the number of radiomic features is typically larger than the number of patients in the study, dimensionality reduction is needed to balance feature dimensionality and model complexity. Autoencoders are a form of unsupervised representation learning that can be used for dimensionality reduction. It is similar to PCA but uses a more complex and non-linear model to learn a compact latent space. In this work, we examine the effectiveness of autoencoders for dimensionality reduction on the radiomic feature space of multiparametric MRI images and the classification of malignant brain tumors: GBM, LYM, and MET. We further aim to address the class imbalances imposed by the rarity of lymphomas by examining different approaches to increase overall predictive performance through multiclass decomposition strategies.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130806919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In various real-world applications such as weather forecasting, energy consumption planning, and traffic flow prediction, time serves as a critical variable. These applications can be collectively referred to as time-series prediction problems. Despite recent advancements with Transformer-based solutions yielding improved results, these solutions often struggle to capture the semantic dependencies in time-series data, resulting predominantly in temporal dependencies. This shortfall often hinders their ability to effectively capture long-term series patterns. In this research, we apply time-series decomposition to address this issue of long-term series forecasting. Our method involves implementing a time-series forecasting approach with deep series decomposition, which further decomposes the long-term trend components generated after the initial decomposition. This technique significantly enhances the forecasting accuracy of the model. For long-term time-series forecasting (LTSF), our proposed method exhibits commendable prediction accuracy on four publicly available datasets—Weather, Electricity, Traffic, ILI—when compared to prevailing methods. The code for our method is accessible at https://github.com/wangyang970508/LSTF_MD.
{"title":"A Long-term Time Series Forecasting method with Multiple Decomposition","authors":"Y. Wang, Xu Chen, Y. Wang, Jun Yong Jing","doi":"10.1145/3603719.3603738","DOIUrl":"https://doi.org/10.1145/3603719.3603738","url":null,"abstract":"In various real-world applications such as weather forecasting, energy consumption planning, and traffic flow prediction, time serves as a critical variable. These applications can be collectively referred to as time-series prediction problems. Despite recent advancements with Transformer-based solutions yielding improved results, these solutions often struggle to capture the semantic dependencies in time-series data, resulting predominantly in temporal dependencies. This shortfall often hinders their ability to effectively capture long-term series patterns. In this research, we apply time-series decomposition to address this issue of long-term series forecasting. Our method involves implementing a time-series forecasting approach with deep series decomposition, which further decomposes the long-term trend components generated after the initial decomposition. This technique significantly enhances the forecasting accuracy of the model. For long-term time-series forecasting (LTSF), our proposed method exhibits commendable prediction accuracy on four publicly available datasets—Weather, Electricity, Traffic, ILI—when compared to prevailing methods. The code for our method is accessible at https://github.com/wangyang970508/LSTF_MD.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114247446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.
{"title":"LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization","authors":"Ivan Carvalho, Ramon Lawrence","doi":"10.1145/3603719.3603731","DOIUrl":"https://doi.org/10.1145/3603719.3603731","url":null,"abstract":"This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133428236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The generation of large amounts of healthcare data has motivated the use of Machine Learning (ML) to train robust models for clinical tasks. However, limitations of local datasets and restrictions on sharing patient data impede the use of traditional ML workflows. Consequently, Federated Learning (FL) has emerged as a potential solution for training ML models among multiple healthcare centers. In this study, we focus on the binary classification task of early ICU mortality prediction using Multivariate Time Series data and a deep neural network architecture. We evaluate the performance of two FL algorithms (FedAvg and FedProx) on this task, utilizing a real world multi-center benchmark database. Our results show that FL models outperform local ML models in a realistic scenario with non-identically distributed data, thus indicating that FL is a promising solution for analogous problems within the healthcare domain. Nevertheless, in this experimental scenario, they do not approximate the ideal performance of a centralized ML model.
{"title":"Early ICU Mortality Prediction with Deep Federated Learning: A Real-World Scenario","authors":"Athanasios Georgoutsos, Paraskevas Kerasiotis, Verena Kantere","doi":"10.1145/3603719.3603723","DOIUrl":"https://doi.org/10.1145/3603719.3603723","url":null,"abstract":"The generation of large amounts of healthcare data has motivated the use of Machine Learning (ML) to train robust models for clinical tasks. However, limitations of local datasets and restrictions on sharing patient data impede the use of traditional ML workflows. Consequently, Federated Learning (FL) has emerged as a potential solution for training ML models among multiple healthcare centers. In this study, we focus on the binary classification task of early ICU mortality prediction using Multivariate Time Series data and a deep neural network architecture. We evaluate the performance of two FL algorithms (FedAvg and FedProx) on this task, utilizing a real world multi-center benchmark database. Our results show that FL models outperform local ML models in a realistic scenario with non-identically distributed data, thus indicating that FL is a promising solution for analogous problems within the healthcare domain. Nevertheless, in this experimental scenario, they do not approximate the ideal performance of a centralized ML model.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianliang Gao, Changlong He, Jiamin Chen, Qiutong Li, Yili Wang
To alleviate the over-smoothing problem caused by deep graph neural networks, decoupled graph neural networks (DGNNs) are proposed. DGNNs decouple the graph neural network into two atomic operations, the propagation (P) operation and the transformation (T) operation. Since manually designing the architecture of DGNNs is a time-consuming and expert-dependent process, the DF-GNAS method is designed, which can automatically construct the architecture of DGNNs with fixed propagation operation and deep layers. The propagation operation is a key process for DGNNs to aggregate graph structure information. However, DF-GNAS automatically designs DGNN architecture using fixed propagation operation for different graph structures will cause performance loss. Meanwhile, DF-GNAS designs deep DGNNs for graphs with simple distributions, which may lead to overfitting problems. To solve the above challenges, we propose the Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth (DGNAS-PD) method. In DGNAS-PD, we design a DGNN operation space with variable efficient propagation operations in order to better aggregate information on different graph structures. We build an effective genetic search strategy to adaptively design appropriate DGNN depths instead of deep DGNNs for the graph with simple distributions in DGNAS-PD. The experiments on five real-world graphs show that DGNAS-PD outperforms state-of-art baseline methods.
{"title":"Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth","authors":"Jianliang Gao, Changlong He, Jiamin Chen, Qiutong Li, Yili Wang","doi":"10.1145/3603719.3603729","DOIUrl":"https://doi.org/10.1145/3603719.3603729","url":null,"abstract":"To alleviate the over-smoothing problem caused by deep graph neural networks, decoupled graph neural networks (DGNNs) are proposed. DGNNs decouple the graph neural network into two atomic operations, the propagation (P) operation and the transformation (T) operation. Since manually designing the architecture of DGNNs is a time-consuming and expert-dependent process, the DF-GNAS method is designed, which can automatically construct the architecture of DGNNs with fixed propagation operation and deep layers. The propagation operation is a key process for DGNNs to aggregate graph structure information. However, DF-GNAS automatically designs DGNN architecture using fixed propagation operation for different graph structures will cause performance loss. Meanwhile, DF-GNAS designs deep DGNNs for graphs with simple distributions, which may lead to overfitting problems. To solve the above challenges, we propose the Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth (DGNAS-PD) method. In DGNAS-PD, we design a DGNN operation space with variable efficient propagation operations in order to better aggregate information on different graph structures. We build an effective genetic search strategy to adaptively design appropriate DGNN depths instead of deep DGNNs for the graph with simple distributions in DGNAS-PD. The experiments on five real-world graphs show that DGNAS-PD outperforms state-of-art baseline methods.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124135850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both. To address the problem, in this paper, we argue for a source code analysis approach for log redaction. To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code. Appropriate redaction rules are further applied depending on the sensitiveness of the data sources to preserve the privacy information in the logs. We conducted experimental evaluation and comparison with other popular baselines. The results demonstrate that our approach can significantly improve the detection precision of the sensitive information and reduce both false positives and negatives.
{"title":"Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis","authors":"Lixi Zhou, Lei Yu, Jia Zou, Hong Min","doi":"10.1145/3603719.3603734","DOIUrl":"https://doi.org/10.1145/3603719.3603734","url":null,"abstract":"Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both. To address the problem, in this paper, we argue for a source code analysis approach for log redaction. To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code. Appropriate redaction rules are further applied depending on the sensitiveness of the data sources to preserve the privacy information in the logs. We conducted experimental evaluation and comparison with other popular baselines. The results demonstrate that our approach can significantly improve the detection precision of the sensitive information and reduce both false positives and negatives.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114323939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The proliferation of fake news in social media has been recognized as a severe problem for society, and substantial attempts have been devoted to fake news detection to alleviate the detrimental impacts. Knowledge graphs (KGs) comprise rich factual relations among real entities, which could be utilized as ground-truth databases and enhance fake news detection. However, most of the existing methods only leveraged natural language processing and graph mining techniques to extract features of fake news for detection and rarely explored the ground knowledge in knowledge graphs. In this work, we propose a novel Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection (HGNNR4FD). The devised framework has four major components: 1) A heterogeneous graph (HG) built upon news content, including three types of nodes, i.e., news, entities, and topics, and their relations. 2) A KG that provides the factual basis for detecting fake news by generating embeddings via relations in the KG. 3) A novel attention-based heterogeneous graph neural network that can aggregate information from HG and KG, and 4) a fake news detector, which is capable of identifying fake news based on the news embeddings generated by HGNNR4FD. We further validate the performance of our method by comparison with seven state-of-art baselines and verify the effectiveness of the components through a thorough ablation analysis. From the results, we empirically demonstrate that our framework achieves superior results and yields improvement over the baselines regarding evaluation metrics of accuracy, precision, recall, and F1-score on four real-world datasets.
{"title":"Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection","authors":"Bingbing Xie, Xiaoxia Ma, Jia Wu, Jian Yang, Shan Xue, Hao Fan","doi":"10.1145/3603719.3603736","DOIUrl":"https://doi.org/10.1145/3603719.3603736","url":null,"abstract":"The proliferation of fake news in social media has been recognized as a severe problem for society, and substantial attempts have been devoted to fake news detection to alleviate the detrimental impacts. Knowledge graphs (KGs) comprise rich factual relations among real entities, which could be utilized as ground-truth databases and enhance fake news detection. However, most of the existing methods only leveraged natural language processing and graph mining techniques to extract features of fake news for detection and rarely explored the ground knowledge in knowledge graphs. In this work, we propose a novel Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection (HGNNR4FD). The devised framework has four major components: 1) A heterogeneous graph (HG) built upon news content, including three types of nodes, i.e., news, entities, and topics, and their relations. 2) A KG that provides the factual basis for detecting fake news by generating embeddings via relations in the KG. 3) A novel attention-based heterogeneous graph neural network that can aggregate information from HG and KG, and 4) a fake news detector, which is capable of identifying fake news based on the news embeddings generated by HGNNR4FD. We further validate the performance of our method by comparison with seven state-of-art baselines and verify the effectiveness of the components through a thorough ablation analysis. From the results, we empirically demonstrate that our framework achieves superior results and yields improvement over the baselines regarding evaluation metrics of accuracy, precision, recall, and F1-score on four real-world datasets.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114924355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, the amount of data is growing rapidly. Through data mining and analysis, information and knowledge can be derived based on this growing volume of data. Different tools have been introduced in the past to specify data analysis scenarios in a graphical manner, for instance, PowerBI, Knime, or RapidMiner. However, when it comes to specifying complex data analysis scenarios, e.g., in larger companies, domain experts can easily become overwhelmed by the extensive functionality and configuration possibilities of these tools. In addition, the tools vary significantly regarding their powerfulness and functionality, which could lead to the need to use different tools for the same scenario. In this demo paper, we introduce our novel user-centric interactive data mashup tool that supports domain experts in interactively creating their analysis scenarios and introduces essential functionalities that are lacking in similar tools, such as direct feedback of data quality issues or recommendation of suitable data sources not yet considered.
{"title":"Interactive Data Mashups for User-Centric Data Analysis","authors":"M. Behringer, Pascal Hirmer","doi":"10.1145/3603719.3603742","DOIUrl":"https://doi.org/10.1145/3603719.3603742","url":null,"abstract":"Nowadays, the amount of data is growing rapidly. Through data mining and analysis, information and knowledge can be derived based on this growing volume of data. Different tools have been introduced in the past to specify data analysis scenarios in a graphical manner, for instance, PowerBI, Knime, or RapidMiner. However, when it comes to specifying complex data analysis scenarios, e.g., in larger companies, domain experts can easily become overwhelmed by the extensive functionality and configuration possibilities of these tools. In addition, the tools vary significantly regarding their powerfulness and functionality, which could lead to the need to use different tools for the same scenario. In this demo paper, we introduce our novel user-centric interactive data mashup tool that supports domain experts in interactively creating their analysis scenarios and introduces essential functionalities that are lacking in similar tools, such as direct feedback of data quality issues or recommendation of suitable data sources not yet considered.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128556146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}