首页 > 最新文献

Proceedings of the 35th International Conference on Scientific and Statistical Database Management最新文献

英文 中文
Privacy-Preserving OLAP via Modeling and Analysis of Query Workloads: Innovative Theories and Theorems 基于查询工作负载建模和分析的隐私保护OLAP:创新理论和定理
A. Cuzzocrea
This paper proposes innovative theories and theorems in the context of a state-of-the-art paper that computes privacy-preserving OLAP cubes via modeling and analyzing query workloads. The work contributes to actual literature by devising a solid theoretical framework that can be used for future optimization opportunities.
本文在一篇通过建模和分析查询工作负载来计算保护隐私的OLAP多维数据集的最新论文中提出了创新的理论和定理。这项工作通过设计一个坚实的理论框架来为未来的优化机会做出贡献。
{"title":"Privacy-Preserving OLAP via Modeling and Analysis of Query Workloads: Innovative Theories and Theorems","authors":"A. Cuzzocrea","doi":"10.1145/3603719.3603735","DOIUrl":"https://doi.org/10.1145/3603719.3603735","url":null,"abstract":"This paper proposes innovative theories and theorems in the context of a state-of-the-art paper that computes privacy-preserving OLAP cubes via modeling and analyzing query workloads. The work contributes to actual literature by devising a solid theoretical framework that can be used for future optimization opportunities.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127515627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TGSLN : Time-aware Graph Structure Learning Network for Multi-variates Stock Sector Ranking Recommendation 多变量股票排名推荐的时间感知图结构学习网络
Quan Wan, Shuo Yin, Xiangyue Liu, Jianliang Gao, Yuhui Zhong
In the field of financial prediction, most studies focus on individual stocks or stock indices. Stock sectors are collections of stocks with similar characteristics and the indices of sectors have more stable trends and predictability compared to individual stocks. Additionally, stock sectors are subsets of stock indices, which implies that investment portfolios based on stock sectors have a greater potential to achieve excess returns. In this paper, we propose a new method, Time-aware Graph Structure Learning Network (TGSLN), to address the problem of stock sector ranking recommendation. In this model, we use an indicator called Relative Price Strength (RPS) to describe the ranking change trend of the sectors. To construct the inherent connection between sectors, we construct a multi-variable time series that consists of multi-scale RPS sequences and effective indicators filtered through the factor selector. We also build a stock sector relation graph based on authoritative stock sector classifications. Specially, we design a time-aware graph structure learner, which can mine the sector relations from time series, and enhance the initial graph through graph fusion. Our model outperforms state-of-the-art baselines in both A-share and NASDAQ markets.
在金融预测领域,大多数研究都集中在个股或股指上。股票行业是具有相似特征的股票的集合,行业指数相对于个股具有更稳定的趋势和可预测性。此外,股票行业是股票指数的子集,这意味着基于股票行业的投资组合更有可能实现超额回报。在本文中,我们提出了一种新的方法——时间感知图结构学习网络(TGSLN)来解决股票板块排名推荐问题。在这个模型中,我们使用一个称为相对价格强度(RPS)的指标来描述行业排名的变化趋势。为了构建部门之间的内在联系,我们构建了一个多变量时间序列,该序列由多尺度RPS序列和通过因子选择器过滤的有效指标组成。基于权威的股票行业分类,构建了股票行业关系图。特别地,我们设计了一个时间感知的图结构学习器,它可以从时间序列中挖掘扇区关系,并通过图融合来增强初始图。我们的模型在a股和纳斯达克市场都优于最先进的基准。
{"title":"TGSLN : Time-aware Graph Structure Learning Network for Multi-variates Stock Sector Ranking Recommendation","authors":"Quan Wan, Shuo Yin, Xiangyue Liu, Jianliang Gao, Yuhui Zhong","doi":"10.1145/3603719.3603741","DOIUrl":"https://doi.org/10.1145/3603719.3603741","url":null,"abstract":"In the field of financial prediction, most studies focus on individual stocks or stock indices. Stock sectors are collections of stocks with similar characteristics and the indices of sectors have more stable trends and predictability compared to individual stocks. Additionally, stock sectors are subsets of stock indices, which implies that investment portfolios based on stock sectors have a greater potential to achieve excess returns. In this paper, we propose a new method, Time-aware Graph Structure Learning Network (TGSLN), to address the problem of stock sector ranking recommendation. In this model, we use an indicator called Relative Price Strength (RPS) to describe the ranking change trend of the sectors. To construct the inherent connection between sectors, we construct a multi-variable time series that consists of multi-scale RPS sequences and effective indicators filtered through the factor selector. We also build a stock sector relation graph based on authoritative stock sector classifications. Specially, we design a time-aware graph structure learner, which can mine the sector relations from time series, and enhance the initial graph through graph fusion. Our model outperforms state-of-the-art baselines in both A-share and NASDAQ markets.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127245073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Autoencoders for Dimensionality Reduction of MRI-derived Radiomics and Classification of Malignant Brain Tumors 评估自编码器在mri衍生放射组学降维和恶性脑肿瘤分类中的应用
Mikayla Biggs, Yaohua Wang, N. Soni, S. Priya, G. Bathla, G. Canahuate
Malignant brain tumors including parenchymal metastatic (MET) lesions, glioblastomas (GBM), and lymphomas (LYM) account for 29.7% of brain cancers. However, the characterization of these tumors from MRI imaging is difficult due to the similarity of their radiologically observed image features. Radiomics is the extraction of quantitative imaging features to characterize tumor intensity, shape, and texture. Applying machine learning over radiomic features could aid diagnostics by improving the classification of these common brain tumors. However, since the number of radiomic features is typically larger than the number of patients in the study, dimensionality reduction is needed to balance feature dimensionality and model complexity. Autoencoders are a form of unsupervised representation learning that can be used for dimensionality reduction. It is similar to PCA but uses a more complex and non-linear model to learn a compact latent space. In this work, we examine the effectiveness of autoencoders for dimensionality reduction on the radiomic feature space of multiparametric MRI images and the classification of malignant brain tumors: GBM, LYM, and MET. We further aim to address the class imbalances imposed by the rarity of lymphomas by examining different approaches to increase overall predictive performance through multiclass decomposition strategies.
恶性脑肿瘤包括实质转移(MET)病变、胶质母细胞瘤(GBM)和淋巴瘤(LYM)占脑癌的29.7%。然而,由于其放射学观察到的图像特征的相似性,从MRI成像表征这些肿瘤是困难的。放射组学是提取定量成像特征来表征肿瘤的强度、形状和纹理。将机器学习应用于放射学特征可以通过改进这些常见脑肿瘤的分类来帮助诊断。然而,由于放射学特征的数量通常大于研究中的患者数量,因此需要降维以平衡特征维度和模型复杂性。自编码器是一种无监督表示学习的形式,可用于降维。它类似于PCA,但使用更复杂的非线性模型来学习紧凑的潜在空间。在这项工作中,我们研究了自编码器在多参数MRI图像的放射特征空间降维的有效性,以及恶性脑肿瘤的分类:GBM, LYM和MET。我们进一步的目标是通过研究不同的方法,通过多类别分解策略来提高整体预测性能,从而解决淋巴瘤罕见性所带来的类别不平衡。
{"title":"Evaluating Autoencoders for Dimensionality Reduction of MRI-derived Radiomics and Classification of Malignant Brain Tumors","authors":"Mikayla Biggs, Yaohua Wang, N. Soni, S. Priya, G. Bathla, G. Canahuate","doi":"10.1145/3603719.3603737","DOIUrl":"https://doi.org/10.1145/3603719.3603737","url":null,"abstract":"Malignant brain tumors including parenchymal metastatic (MET) lesions, glioblastomas (GBM), and lymphomas (LYM) account for 29.7% of brain cancers. However, the characterization of these tumors from MRI imaging is difficult due to the similarity of their radiologically observed image features. Radiomics is the extraction of quantitative imaging features to characterize tumor intensity, shape, and texture. Applying machine learning over radiomic features could aid diagnostics by improving the classification of these common brain tumors. However, since the number of radiomic features is typically larger than the number of patients in the study, dimensionality reduction is needed to balance feature dimensionality and model complexity. Autoencoders are a form of unsupervised representation learning that can be used for dimensionality reduction. It is similar to PCA but uses a more complex and non-linear model to learn a compact latent space. In this work, we examine the effectiveness of autoencoders for dimensionality reduction on the radiomic feature space of multiparametric MRI images and the classification of malignant brain tumors: GBM, LYM, and MET. We further aim to address the class imbalances imposed by the rarity of lymphomas by examining different approaches to increase overall predictive performance through multiclass decomposition strategies.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130806919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Long-term Time Series Forecasting method with Multiple Decomposition 基于多重分解的长期时间序列预测方法
Y. Wang, Xu Chen, Y. Wang, Jun Yong Jing
In various real-world applications such as weather forecasting, energy consumption planning, and traffic flow prediction, time serves as a critical variable. These applications can be collectively referred to as time-series prediction problems. Despite recent advancements with Transformer-based solutions yielding improved results, these solutions often struggle to capture the semantic dependencies in time-series data, resulting predominantly in temporal dependencies. This shortfall often hinders their ability to effectively capture long-term series patterns. In this research, we apply time-series decomposition to address this issue of long-term series forecasting. Our method involves implementing a time-series forecasting approach with deep series decomposition, which further decomposes the long-term trend components generated after the initial decomposition. This technique significantly enhances the forecasting accuracy of the model. For long-term time-series forecasting (LTSF), our proposed method exhibits commendable prediction accuracy on four publicly available datasets—Weather, Electricity, Traffic, ILI—when compared to prevailing methods. The code for our method is accessible at https://github.com/wangyang970508/LSTF_MD.
在各种现实世界的应用程序中,如天气预报、能源消耗规划和交通流量预测,时间是一个关键变量。这些应用可以统称为时间序列预测问题。尽管基于transformer的解决方案最近取得了进步,产生了改进的结果,但这些解决方案常常难以捕获时间序列数据中的语义依赖关系,导致主要是时间依赖关系。这种不足常常妨碍它们有效地捕捉长期序列模式的能力。在本研究中,我们采用时间序列分解来解决长期序列预测的问题。该方法采用深度序列分解的时间序列预测方法,对初始分解后产生的长期趋势分量进行进一步分解。该技术显著提高了模型的预测精度。对于长期时间序列预测(LTSF),与流行的方法相比,我们提出的方法在四个公开可用的数据集(天气、电力、交通、交通)上显示出值得称赞的预测精度。我们的方法的代码可以在https://github.com/wangyang970508/LSTF_MD上访问。
{"title":"A Long-term Time Series Forecasting method with Multiple Decomposition","authors":"Y. Wang, Xu Chen, Y. Wang, Jun Yong Jing","doi":"10.1145/3603719.3603738","DOIUrl":"https://doi.org/10.1145/3603719.3603738","url":null,"abstract":"In various real-world applications such as weather forecasting, energy consumption planning, and traffic flow prediction, time serves as a critical variable. These applications can be collectively referred to as time-series prediction problems. Despite recent advancements with Transformer-based solutions yielding improved results, these solutions often struggle to capture the semantic dependencies in time-series data, resulting predominantly in temporal dependencies. This shortfall often hinders their ability to effectively capture long-term series patterns. In this research, we apply time-series decomposition to address this issue of long-term series forecasting. Our method involves implementing a time-series forecasting approach with deep series decomposition, which further decomposes the long-term trend components generated after the initial decomposition. This technique significantly enhances the forecasting accuracy of the model. For long-term time-series forecasting (LTSF), our proposed method exhibits commendable prediction accuracy on four publicly available datasets—Weather, Electricity, Traffic, ILI—when compared to prevailing methods. The code for our method is accessible at https://github.com/wangyang970508/LSTF_MD.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114247446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization LearnedSort作为学习增强的SampleSort:分析和并行化
Ivan Carvalho, Ramon Lawrence
This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.
这项工作分析并并行化了LearnedSort,这是一种使用基于累积分布函数的机器学习模型进行排序的新算法。在预测算法的视角下对LearnedSort进行分析,认为LearnedSort是一种学习增强的SampleSort。将LearnedSort与最先进的SampleSort实现ips40相结合,开发了一个并行LearnedSort算法。合成数据集和真实数据集的基准测试表明,与ips40和其他排序算法相比,并行LearnedSort的并行性能有所提高。
{"title":"LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization","authors":"Ivan Carvalho, Ramon Lawrence","doi":"10.1145/3603719.3603731","DOIUrl":"https://doi.org/10.1145/3603719.3603731","url":null,"abstract":"This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133428236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Early ICU Mortality Prediction with Deep Federated Learning: A Real-World Scenario 基于深度联邦学习的早期ICU死亡率预测:一个真实世界的场景
Athanasios Georgoutsos, Paraskevas Kerasiotis, Verena Kantere
The generation of large amounts of healthcare data has motivated the use of Machine Learning (ML) to train robust models for clinical tasks. However, limitations of local datasets and restrictions on sharing patient data impede the use of traditional ML workflows. Consequently, Federated Learning (FL) has emerged as a potential solution for training ML models among multiple healthcare centers. In this study, we focus on the binary classification task of early ICU mortality prediction using Multivariate Time Series data and a deep neural network architecture. We evaluate the performance of two FL algorithms (FedAvg and FedProx) on this task, utilizing a real world multi-center benchmark database. Our results show that FL models outperform local ML models in a realistic scenario with non-identically distributed data, thus indicating that FL is a promising solution for analogous problems within the healthcare domain. Nevertheless, in this experimental scenario, they do not approximate the ideal performance of a centralized ML model.
大量医疗数据的产生促使人们使用机器学习(ML)来训练用于临床任务的稳健模型。然而,本地数据集的局限性和对共享患者数据的限制阻碍了传统ML工作流程的使用。因此,联邦学习(FL)已成为在多个医疗保健中心训练ML模型的潜在解决方案。在本研究中,我们重点研究了使用多元时间序列数据和深度神经网络架构进行ICU早期死亡率预测的二元分类任务。我们利用真实世界的多中心基准数据库,评估了两种FL算法(fedag和FedProx)在此任务中的性能。我们的结果表明,在具有非相同分布数据的现实场景中,FL模型优于本地ML模型,从而表明FL是医疗保健领域类似问题的有前途的解决方案。然而,在这个实验场景中,它们并没有接近集中式机器学习模型的理想性能。
{"title":"Early ICU Mortality Prediction with Deep Federated Learning: A Real-World Scenario","authors":"Athanasios Georgoutsos, Paraskevas Kerasiotis, Verena Kantere","doi":"10.1145/3603719.3603723","DOIUrl":"https://doi.org/10.1145/3603719.3603723","url":null,"abstract":"The generation of large amounts of healthcare data has motivated the use of Machine Learning (ML) to train robust models for clinical tasks. However, limitations of local datasets and restrictions on sharing patient data impede the use of traditional ML workflows. Consequently, Federated Learning (FL) has emerged as a potential solution for training ML models among multiple healthcare centers. In this study, we focus on the binary classification task of early ICU mortality prediction using Multivariate Time Series data and a deep neural network architecture. We evaluate the performance of two FL algorithms (FedAvg and FedProx) on this task, utilizing a real world multi-center benchmark database. Our results show that FL models outperform local ML models in a realistic scenario with non-identically distributed data, thus indicating that FL is a promising solution for analogous problems within the healthcare domain. Nevertheless, in this experimental scenario, they do not approximate the ideal performance of a centralized ML model.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth 具有可变传播操作和适当深度的解耦图神经结构搜索
Jianliang Gao, Changlong He, Jiamin Chen, Qiutong Li, Yili Wang
To alleviate the over-smoothing problem caused by deep graph neural networks, decoupled graph neural networks (DGNNs) are proposed. DGNNs decouple the graph neural network into two atomic operations, the propagation (P) operation and the transformation (T) operation. Since manually designing the architecture of DGNNs is a time-consuming and expert-dependent process, the DF-GNAS method is designed, which can automatically construct the architecture of DGNNs with fixed propagation operation and deep layers. The propagation operation is a key process for DGNNs to aggregate graph structure information. However, DF-GNAS automatically designs DGNN architecture using fixed propagation operation for different graph structures will cause performance loss. Meanwhile, DF-GNAS designs deep DGNNs for graphs with simple distributions, which may lead to overfitting problems. To solve the above challenges, we propose the Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth (DGNAS-PD) method. In DGNAS-PD, we design a DGNN operation space with variable efficient propagation operations in order to better aggregate information on different graph structures. We build an effective genetic search strategy to adaptively design appropriate DGNN depths instead of deep DGNNs for the graph with simple distributions in DGNAS-PD. The experiments on five real-world graphs show that DGNAS-PD outperforms state-of-art baseline methods.
为了缓解深度图神经网络的过度平滑问题,提出了解耦图神经网络(dgnn)。dgnn将图神经网络解耦为两个原子操作,即传播(P)操作和转换(T)操作。针对人工设计dgnn结构耗时且依赖专家的特点,设计了DF-GNAS方法,该方法能够自动构建具有固定传播运算和深层的dgnn结构。传播运算是dgnn实现图结构信息聚合的关键过程。但是,DF-GNAS对不同的图结构采用固定的传播操作自动设计DGNN架构,会造成性能损失。同时,DF-GNAS为简单分布的图设计了深度dgnn,这可能会导致过拟合问题。为了解决上述问题,我们提出了基于变量传播操作和适当深度的解耦图神经结构搜索(DGNAS-PD)方法。在DGNAS-PD中,为了更好地聚合不同图结构上的信息,我们设计了一个具有可变有效传播操作的DGNN操作空间。我们构建了一种有效的遗传搜索策略来自适应地设计合适的DGNN深度,而不是DGNAS-PD中具有简单分布的图的深度DGNN。在五个真实图形上的实验表明,DGNAS-PD优于最先进的基线方法。
{"title":"Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth","authors":"Jianliang Gao, Changlong He, Jiamin Chen, Qiutong Li, Yili Wang","doi":"10.1145/3603719.3603729","DOIUrl":"https://doi.org/10.1145/3603719.3603729","url":null,"abstract":"To alleviate the over-smoothing problem caused by deep graph neural networks, decoupled graph neural networks (DGNNs) are proposed. DGNNs decouple the graph neural network into two atomic operations, the propagation (P) operation and the transformation (T) operation. Since manually designing the architecture of DGNNs is a time-consuming and expert-dependent process, the DF-GNAS method is designed, which can automatically construct the architecture of DGNNs with fixed propagation operation and deep layers. The propagation operation is a key process for DGNNs to aggregate graph structure information. However, DF-GNAS automatically designs DGNN architecture using fixed propagation operation for different graph structures will cause performance loss. Meanwhile, DF-GNAS designs deep DGNNs for graphs with simple distributions, which may lead to overfitting problems. To solve the above challenges, we propose the Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth (DGNAS-PD) method. In DGNAS-PD, we design a DGNN operation space with variable efficient propagation operations in order to better aggregate information on different graph structures. We build an effective genetic search strategy to adaptively design appropriate DGNN depths instead of deep DGNNs for the graph with simple distributions in DGNAS-PD. The experiments on five real-world graphs show that DGNAS-PD outperforms state-of-art baseline methods.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124135850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis 基于源代码分析的诊断数据保密编校
Lixi Zhou, Lei Yu, Jia Zou, Hong Min
Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both. To address the problem, in this paper, we argue for a source code analysis approach for log redaction. To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code. Appropriate redaction rules are further applied depending on the sensitiveness of the data sources to preserve the privacy information in the logs. We conducted experimental evaluation and comparison with other popular baselines. The results demonstrate that our approach can significantly improve the detection precision of the sensitive information and reduce both false positives and negatives.
在工业软件诊断和调试过程中,保护诊断数据(如日志)中的敏感信息是一个关键问题。虽然开发了许多工具来自动编辑日志以识别和删除敏感信息,但它们有严重的局限性,可能导致过度编辑和丢失关键诊断信息(误报),或泄露敏感信息(误报),或两者兼而有之。为了解决这个问题,在本文中,我们提出了一种用于日志编校的源代码分析方法。为了识别包含敏感信息的日志消息,我们的方法在带有记录器代码增强的源代码中定位相应的日志语句,并检查日志语句是否使用从源代码构建的数据流图从敏感源输出数据。根据数据源的敏感性进一步应用适当的编校规则,以保留日志中的隐私信息。我们进行了实验评估,并与其他流行的基线进行了比较。结果表明,该方法可以显著提高敏感信息的检测精度,减少误报和误报。
{"title":"Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis","authors":"Lixi Zhou, Lei Yu, Jia Zou, Hong Min","doi":"10.1145/3603719.3603734","DOIUrl":"https://doi.org/10.1145/3603719.3603734","url":null,"abstract":"Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both. To address the problem, in this paper, we argue for a source code analysis approach for log redaction. To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code. Appropriate redaction rules are further applied depending on the sensitiveness of the data sources to preserve the privacy information in the logs. We conducted experimental evaluation and comparison with other popular baselines. The results demonstrate that our approach can significantly improve the detection precision of the sensitive information and reduce both false positives and negatives.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114323939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection 基于知识关系的异构图神经网络假新闻检测
Bingbing Xie, Xiaoxia Ma, Jia Wu, Jian Yang, Shan Xue, Hao Fan
The proliferation of fake news in social media has been recognized as a severe problem for society, and substantial attempts have been devoted to fake news detection to alleviate the detrimental impacts. Knowledge graphs (KGs) comprise rich factual relations among real entities, which could be utilized as ground-truth databases and enhance fake news detection. However, most of the existing methods only leveraged natural language processing and graph mining techniques to extract features of fake news for detection and rarely explored the ground knowledge in knowledge graphs. In this work, we propose a novel Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection (HGNNR4FD). The devised framework has four major components: 1) A heterogeneous graph (HG) built upon news content, including three types of nodes, i.e., news, entities, and topics, and their relations. 2) A KG that provides the factual basis for detecting fake news by generating embeddings via relations in the KG. 3) A novel attention-based heterogeneous graph neural network that can aggregate information from HG and KG, and 4) a fake news detector, which is capable of identifying fake news based on the news embeddings generated by HGNNR4FD. We further validate the performance of our method by comparison with seven state-of-art baselines and verify the effectiveness of the components through a thorough ablation analysis. From the results, we empirically demonstrate that our framework achieves superior results and yields improvement over the baselines regarding evaluation metrics of accuracy, precision, recall, and F1-score on four real-world datasets.
社交媒体上假新闻的泛滥已经被认为是一个严重的社会问题,人们一直在努力检测假新闻,以减轻其有害影响。知识图谱(Knowledge graphs, KGs)包含了真实实体之间丰富的事实关系,可以作为基础真相数据库,增强假新闻的检测能力。然而,现有的方法大多只是利用自然语言处理和图挖掘技术来提取假新闻的特征进行检测,很少对知识图中的基础知识进行挖掘。在这项工作中,我们提出了一种新的基于知识关系的异构图神经网络用于假新闻检测(HGNNR4FD)。所设计的框架有四个主要组成部分:1)基于新闻内容构建的异构图(HG),包括三种类型的节点,即新闻、实体和主题及其关系。2) KG,通过KG中的关系生成嵌入,为检测假新闻提供事实基础。3)基于注意力的异质图神经网络,能够聚合HG和KG的信息;4)假新闻检测器,能够基于HGNNR4FD生成的新闻嵌入来识别假新闻。通过与七个最先进的基线进行比较,我们进一步验证了我们方法的性能,并通过彻底的烧蚀分析验证了组件的有效性。从结果中,我们通过经验证明,我们的框架在四个真实数据集上取得了卓越的结果,并且在准确度、精度、召回率和f1分数的评估指标方面优于基线。
{"title":"Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection","authors":"Bingbing Xie, Xiaoxia Ma, Jia Wu, Jian Yang, Shan Xue, Hao Fan","doi":"10.1145/3603719.3603736","DOIUrl":"https://doi.org/10.1145/3603719.3603736","url":null,"abstract":"The proliferation of fake news in social media has been recognized as a severe problem for society, and substantial attempts have been devoted to fake news detection to alleviate the detrimental impacts. Knowledge graphs (KGs) comprise rich factual relations among real entities, which could be utilized as ground-truth databases and enhance fake news detection. However, most of the existing methods only leveraged natural language processing and graph mining techniques to extract features of fake news for detection and rarely explored the ground knowledge in knowledge graphs. In this work, we propose a novel Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection (HGNNR4FD). The devised framework has four major components: 1) A heterogeneous graph (HG) built upon news content, including three types of nodes, i.e., news, entities, and topics, and their relations. 2) A KG that provides the factual basis for detecting fake news by generating embeddings via relations in the KG. 3) A novel attention-based heterogeneous graph neural network that can aggregate information from HG and KG, and 4) a fake news detector, which is capable of identifying fake news based on the news embeddings generated by HGNNR4FD. We further validate the performance of our method by comparison with seven state-of-art baselines and verify the effectiveness of the components through a thorough ablation analysis. From the results, we empirically demonstrate that our framework achieves superior results and yields improvement over the baselines regarding evaluation metrics of accuracy, precision, recall, and F1-score on four real-world datasets.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114924355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Data Mashups for User-Centric Data Analysis 用于以用户为中心的数据分析的交互式数据混搭
M. Behringer, Pascal Hirmer
Nowadays, the amount of data is growing rapidly. Through data mining and analysis, information and knowledge can be derived based on this growing volume of data. Different tools have been introduced in the past to specify data analysis scenarios in a graphical manner, for instance, PowerBI, Knime, or RapidMiner. However, when it comes to specifying complex data analysis scenarios, e.g., in larger companies, domain experts can easily become overwhelmed by the extensive functionality and configuration possibilities of these tools. In addition, the tools vary significantly regarding their powerfulness and functionality, which could lead to the need to use different tools for the same scenario. In this demo paper, we introduce our novel user-centric interactive data mashup tool that supports domain experts in interactively creating their analysis scenarios and introduces essential functionalities that are lacking in similar tools, such as direct feedback of data quality issues or recommendation of suitable data sources not yet considered.
如今,数据量正在迅速增长。通过数据挖掘和分析,可以从不断增长的数据量中获得信息和知识。过去已经引入了不同的工具,以图形化的方式指定数据分析场景,例如PowerBI、Knime或RapidMiner。然而,当涉及到指定复杂的数据分析场景时,例如,在大型公司中,领域专家很容易被这些工具的广泛功能和配置可能性所淹没。此外,这些工具在功能和功能方面差异很大,这可能导致需要为相同的场景使用不同的工具。在这篇演示论文中,我们介绍了我们新颖的以用户为中心的交互式数据mashup工具,该工具支持领域专家以交互方式创建他们的分析场景,并引入了类似工具所缺乏的基本功能,例如对数据质量问题的直接反馈或对尚未考虑的合适数据源的推荐。
{"title":"Interactive Data Mashups for User-Centric Data Analysis","authors":"M. Behringer, Pascal Hirmer","doi":"10.1145/3603719.3603742","DOIUrl":"https://doi.org/10.1145/3603719.3603742","url":null,"abstract":"Nowadays, the amount of data is growing rapidly. Through data mining and analysis, information and knowledge can be derived based on this growing volume of data. Different tools have been introduced in the past to specify data analysis scenarios in a graphical manner, for instance, PowerBI, Knime, or RapidMiner. However, when it comes to specifying complex data analysis scenarios, e.g., in larger companies, domain experts can easily become overwhelmed by the extensive functionality and configuration possibilities of these tools. In addition, the tools vary significantly regarding their powerfulness and functionality, which could lead to the need to use different tools for the same scenario. In this demo paper, we introduce our novel user-centric interactive data mashup tool that supports domain experts in interactively creating their analysis scenarios and introduces essential functionalities that are lacking in similar tools, such as direct feedback of data quality issues or recommendation of suitable data sources not yet considered.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128556146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 35th International Conference on Scientific and Statistical Database Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1