首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
CTAB-GAN+: enhancing tabular data synthesis. CTAB-GAN+:增强表格数据合成。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-08 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1296508
Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen

The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.

由于隐私和法律方面的原因,无法获得原始数据,而合成数据作为真实数据的一种增强工具,其使用势头日益强劲。生成对抗网络(GANs)是生成模型的典范,最初用于图像,后来用于表格数据,它为许多最先进的合成器做出了贡献。随着 GANs 的改进,合成数据与真实数据越来越相似,从而有可能泄露隐私。差分隐私(DP)在理论上保证了隐私不会丢失,但却降低了数据的实用性。如何实现最佳权衡仍是一个具有挑战性的研究问题。在本研究中,我们提出了 CTAB-GAN+ 一种新型条件表式 GAN。CTAB-GAN+ 通过以下方式改进了最先进的技术:(i) 在条件 GAN 中添加下游损失,以在分类和回归领域获得更高的合成数据效用;(ii) 使用带有梯度惩罚的 Wasserstein 损失,以获得更好的训练收敛性;(iii) 引入新型编码器,以混合连续分类变量和具有不平衡或倾斜数据的变量为目标;(iv) 使用 DP 随机梯度下降法进行训练,以提供严格的隐私保证。我们对 CTAB-GAN+ 的统计相似性和机器学习效用进行了广泛评估,并与最先进的表格型 GAN 进行了比较。结果表明,在给定隐私预算的情况下,CTAB-GAN+ 在多个数据集和学习任务中合成的隐私保护数据的机器学习效用(即 F1 分数)至少高出 21.9%。
{"title":"CTAB-GAN+: enhancing tabular data synthesis.","authors":"Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen","doi":"10.3389/fdata.2023.1296508","DOIUrl":"https://doi.org/10.3389/fdata.2023.1296508","url":null,"abstract":"<p><p>The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1296508"},"PeriodicalIF":3.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10801038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139520685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybridization of long short-term memory neural network in fractional time series modeling of inflation 在通货膨胀的分数时间序列建模中混合使用长短期记忆神经网络
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-04 DOI: 10.3389/fdata.2023.1282541
Erman Arif, Elin Herlinawati, D. Devianto, Mutia Yollanda, Dony Permana
Inflation is capable of significantly impacting monetary policy, thereby emphasizing the need for accurate forecasts to guide decisions aimed at stabilizing inflation rates. Given the significant relationship between inflation and monetary, it becomes feasible to detect long-memory patterns within the data. To capture these long-memory patterns, Autoregressive Fractionally Moving Average (ARFIMA) was developed as a valuable tool in data mining. Due to the challenges posed in residual assumptions, time series model has to be developed to address heteroscedasticity. Consequently, the implementation of a suitable model was imperative to rectify this effect within the residual ARFIMA. In this context, a novel hybrid model was proposed, with Generalized Autoregressive Conditional Heteroscedasticity (GARCH) being replaced by Long Short-Term Memory (LSTM) neural network. The network was used as iterative model to address this issue and achieve optimal parameters. Through a sensitivity analysis using mean absolute percentage error (MAPE), mean squared error (MSE), and mean absolute error (MAE), the performance of ARFIMA, ARFIMA-GARCH, and ARFIMA-LSTM models was assessed. The results showed that ARFIMA-LSTM excelled in simulating the inflation rate. This provided further evidence that inflation data showed characteristics of long memory, and the accuracy of the model was improved by integrating LSTM neural network.
通货膨胀能够对货币政策产生重大影响,因此强调需要准确的预测来指导旨在稳定通货膨胀率的决策。鉴于通货膨胀与货币之间的重要关系,在数据中检测长期记忆模式变得可行。为了捕捉这些长记忆模式,自回归分位移平均法(ARFIMA)作为数据挖掘的重要工具应运而生。由于残差假设带来的挑战,必须开发时间序列模型来解决异方差问题。因此,必须实施一个合适的模型来纠正残差 ARFIMA 中的这种效应。在这种情况下,提出了一种新的混合模型,用长短期记忆(LSTM)神经网络取代广义自回归条件异方差(GARCH)。该网络被用作迭代模型来解决这一问题,并获得最佳参数。通过使用平均绝对百分比误差 (MAPE)、平均平方误差 (MSE) 和平均绝对误差 (MAE) 进行敏感性分析,评估了 ARFIMA、ARFIMA-GARCH 和 ARFIMA-LSTM 模型的性能。结果表明,ARFIMA-LSTM 在模拟通货膨胀率方面表现出色。这进一步证明了通货膨胀数据具有长记忆的特点,而通过整合 LSTM 神经网络,模型的准确性得到了提高。
{"title":"Hybridization of long short-term memory neural network in fractional time series modeling of inflation","authors":"Erman Arif, Elin Herlinawati, D. Devianto, Mutia Yollanda, Dony Permana","doi":"10.3389/fdata.2023.1282541","DOIUrl":"https://doi.org/10.3389/fdata.2023.1282541","url":null,"abstract":"Inflation is capable of significantly impacting monetary policy, thereby emphasizing the need for accurate forecasts to guide decisions aimed at stabilizing inflation rates. Given the significant relationship between inflation and monetary, it becomes feasible to detect long-memory patterns within the data. To capture these long-memory patterns, Autoregressive Fractionally Moving Average (ARFIMA) was developed as a valuable tool in data mining. Due to the challenges posed in residual assumptions, time series model has to be developed to address heteroscedasticity. Consequently, the implementation of a suitable model was imperative to rectify this effect within the residual ARFIMA. In this context, a novel hybrid model was proposed, with Generalized Autoregressive Conditional Heteroscedasticity (GARCH) being replaced by Long Short-Term Memory (LSTM) neural network. The network was used as iterative model to address this issue and achieve optimal parameters. Through a sensitivity analysis using mean absolute percentage error (MAPE), mean squared error (MSE), and mean absolute error (MAE), the performance of ARFIMA, ARFIMA-GARCH, and ARFIMA-LSTM models was assessed. The results showed that ARFIMA-LSTM excelled in simulating the inflation rate. This provided further evidence that inflation data showed characteristics of long memory, and the accuracy of the model was improved by integrating LSTM neural network.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"3 3","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139384694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Criminal clickbait: a panel data analysis on the attractiveness of online advertisements offering stolen data 犯罪点击诱饵:关于提供被盗数据的在线广告吸引力的面板数据分析
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-22 DOI: 10.3389/fdata.2023.1320569
Renushka Madarie, Christianne J. de Poot, Marleen Weulen Kranenbarg
Few studies have examined the sales of stolen account credentials on darkweb markets. In this study, we tested how advertisement characteristics affect the popularity of illicit online advertisements offering account credentials. Unlike previous criminological research, we take a novel approach by assessing the applicability of knowledge on regular consumer behaviours instead of theories explaining offender behaviour.We scraped 1,565 unique advertisements offering credentials on a darkweb market. We used this panel data set to predict the simultaneous effects of the asking price, endorsement cues and title elements on advertisement popularity by estimating several hybrid panel data models.Most of our findings disconfirm our hypotheses. Asking price did not affect advertisement popularity. Endorsement cues, including vendor reputation and cumulative sales and views, had mixed and negative relationships, respectively, with advertisement popularity.Our results might suggest that account credentials are not simply regular products, but high-risk commodities that, paradoxically, become less attractive as they gain popularity. This study highlights the necessity of a deeper understanding of illicit online market dynamics to improve theories on illicit consumer behaviours and assist cybersecurity experts in disrupting criminal business models more effectively. We propose several avenues for future experimental research to gain further insights into these illicit processes.
很少有研究对暗网市场上被盗账户凭据的销售情况进行调查。在本研究中,我们测试了广告特征如何影响提供账户凭证的非法网络广告的受欢迎程度。与以往的犯罪学研究不同,我们采用了一种新颖的方法,即评估常规消费者行为知识的适用性,而不是解释犯罪者行为的理论。我们利用这个面板数据集,通过估计几个混合面板数据模型,预测要价、认可线索和标题元素对广告受欢迎程度的同时影响。要价并不影响广告受欢迎程度。我们的研究结果可能表明,账户凭证并不是简单的常规产品,而是高风险商品,随着其受欢迎程度的提高,其吸引力也会随之降低。本研究强调了深入了解非法网络市场动态的必要性,以完善有关非法消费者行为的理论,并协助网络安全专家更有效地破坏犯罪商业模式。我们为未来的实验研究提出了几条途径,以进一步深入了解这些非法过程。
{"title":"Criminal clickbait: a panel data analysis on the attractiveness of online advertisements offering stolen data","authors":"Renushka Madarie, Christianne J. de Poot, Marleen Weulen Kranenbarg","doi":"10.3389/fdata.2023.1320569","DOIUrl":"https://doi.org/10.3389/fdata.2023.1320569","url":null,"abstract":"Few studies have examined the sales of stolen account credentials on darkweb markets. In this study, we tested how advertisement characteristics affect the popularity of illicit online advertisements offering account credentials. Unlike previous criminological research, we take a novel approach by assessing the applicability of knowledge on regular consumer behaviours instead of theories explaining offender behaviour.We scraped 1,565 unique advertisements offering credentials on a darkweb market. We used this panel data set to predict the simultaneous effects of the asking price, endorsement cues and title elements on advertisement popularity by estimating several hybrid panel data models.Most of our findings disconfirm our hypotheses. Asking price did not affect advertisement popularity. Endorsement cues, including vendor reputation and cumulative sales and views, had mixed and negative relationships, respectively, with advertisement popularity.Our results might suggest that account credentials are not simply regular products, but high-risk commodities that, paradoxically, become less attractive as they gain popularity. This study highlights the necessity of a deeper understanding of illicit online market dynamics to improve theories on illicit consumer behaviours and assist cybersecurity experts in disrupting criminal business models more effectively. We propose several avenues for future experimental research to gain further insights into these illicit processes.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"1 11","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138944240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum: Non-invasive detection of anemia using lip mucosa images transfer learning convolutional neural networks. 更正:利用唇粘膜图像转移学习卷积神经网络对贫血进行无创检测。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-20 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1338363
Shekhar Mahmud, Mohammed Mansour, Turker Berk Donmez, Mustafa Kutlu, Chris Freeman

[This corrects the article DOI: 10.3389/fdata.2023.1291329.].

[此处更正了文章 DOI:10.3389/fdata.2023.1291329]。
{"title":"Corrigendum: Non-invasive detection of anemia using lip mucosa images transfer learning convolutional neural networks.","authors":"Shekhar Mahmud, Mohammed Mansour, Turker Berk Donmez, Mustafa Kutlu, Chris Freeman","doi":"10.3389/fdata.2023.1338363","DOIUrl":"https://doi.org/10.3389/fdata.2023.1338363","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1291329.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1338363"},"PeriodicalIF":3.1,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10762862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing knowledge discovery from unstructured data using a deep learning approach to support subsurface modeling predictions 利用深度学习方法加强非结构化数据的知识发现,为地下建模预测提供支持
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-19 DOI: 10.3389/fdata.2023.1227189
Brendan Hoover, Dakota Zaengle, M. Mark-Moser, Patrick C. Wingo, Anuj Suhag, Kelly Rose
Subsurface interpretations and models rely on knowledge from subject matter experts who utilize unstructured information from images, maps, cross sections, and other products to provide context to measured data (e. g., cores, well logs, seismic surveys). To enhance such knowledge discovery, we advanced the National Energy Technology Laboratory's (NETL) Subsurface Trend Analysis (STA) workflow with an artificial intelligence (AI) deep learning approach for image embedding. NETL's STA method offers a validated science-based approach of combining geologic systems knowledge, statistical modeling, and datasets to improve predictions of subsurface properties. The STA image embedding tool quickly extracts images from unstructured knowledge products like publications, maps, websites, and presentations; categorically labels the images; and creates a repository for geologic domain postulation. Via a case study on geographic and subsurface literature of the Gulf of Mexico (GOM), results show the STA image embedding tool extracts images and correctly labels them with ~90 to ~95% accuracy.
地下解释和模型依赖于主题专家的知识,他们利用图像、地图、横截面和其他产品中的非结构化信息为测量数据(如岩心、测井记录、地震勘探)提供背景信息。为了加强这种知识发现,我们将国家能源技术实验室(NETL)的地下趋势分析(STA)工作流程与人工智能(AI)深度学习方法相结合,用于图像嵌入。NETL 的 STA 方法提供了一种经过验证的基于科学的方法,将地质系统知识、统计建模和数据集结合起来,以改进对地下属性的预测。STA 图像嵌入工具可从出版物、地图、网站和演示文稿等非结构化知识产品中快速提取图像,对图像进行分类标记,并创建一个地质领域推测库。通过对墨西哥湾(GOM)的地理和地下文献进行案例研究,结果表明 STA 图像嵌入工具提取图像并正确标注的准确率在 90% 到 95% 之间。
{"title":"Enhancing knowledge discovery from unstructured data using a deep learning approach to support subsurface modeling predictions","authors":"Brendan Hoover, Dakota Zaengle, M. Mark-Moser, Patrick C. Wingo, Anuj Suhag, Kelly Rose","doi":"10.3389/fdata.2023.1227189","DOIUrl":"https://doi.org/10.3389/fdata.2023.1227189","url":null,"abstract":"Subsurface interpretations and models rely on knowledge from subject matter experts who utilize unstructured information from images, maps, cross sections, and other products to provide context to measured data (e. g., cores, well logs, seismic surveys). To enhance such knowledge discovery, we advanced the National Energy Technology Laboratory's (NETL) Subsurface Trend Analysis (STA) workflow with an artificial intelligence (AI) deep learning approach for image embedding. NETL's STA method offers a validated science-based approach of combining geologic systems knowledge, statistical modeling, and datasets to improve predictions of subsurface properties. The STA image embedding tool quickly extracts images from unstructured knowledge products like publications, maps, websites, and presentations; categorically labels the images; and creates a repository for geologic domain postulation. Via a case study on geographic and subsurface literature of the Gulf of Mexico (GOM), results show the STA image embedding tool extracts images and correctly labels them with ~90 to ~95% accuracy.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":" 31","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138962433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond-accuracy: a review on diversity, serendipity, and fairness in recommender systems based on graph neural networks. 超越准确性:基于图神经网络的推荐系统中的多样性、偶然性和公平性综述。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-19 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1251072
Tomislav Duricic, Dominik Kowald, Emanuel Lacic, Elisabeth Lex

By providing personalized suggestions to users, recommender systems have become essential to numerous online platforms. Collaborative filtering, particularly graph-based approaches using Graph Neural Networks (GNNs), have demonstrated great results in terms of recommendation accuracy. However, accuracy may not always be the most important criterion for evaluating recommender systems' performance, since beyond-accuracy aspects such as recommendation diversity, serendipity, and fairness can strongly influence user engagement and satisfaction. This review paper focuses on addressing these dimensions in GNN-based recommender systems, going beyond the conventional accuracy-centric perspective. We begin by reviewing recent developments in approaches that improve not only the accuracy-diversity trade-off but also promote serendipity, and fairness in GNN-based recommender systems. We discuss different stages of model development including data preprocessing, graph construction, embedding initialization, propagation layers, embedding fusion, score computation, and training methodologies. Furthermore, we present a look into the practical difficulties encountered in assuring diversity, serendipity, and fairness, while retaining high accuracy. Finally, we discuss potential future research directions for developing more robust GNN-based recommender systems that go beyond the unidimensional perspective of focusing solely on accuracy. This review aims to provide researchers and practitioners with an in-depth understanding of the multifaceted issues that arise when designing GNN-based recommender systems, setting our work apart by offering a comprehensive exploration of beyond-accuracy dimensions.

通过向用户提供个性化建议,推荐系统已成为众多在线平台的重要组成部分。协作过滤,尤其是使用图神经网络(GNN)的基于图的方法,在推荐准确性方面取得了巨大的成果。然而,准确性并不总是评价推荐系统性能的最重要标准,因为推荐多样性、偶然性和公平性等准确性之外的方面也会对用户参与度和满意度产生重大影响。本综述论文将重点讨论在基于 GNN 的推荐系统中如何超越传统的以准确性为中心的视角,解决这些方面的问题。我们首先回顾了在基于 GNN 的推荐系统中,不仅能改善准确性-多样性权衡,还能促进偶然性和公平性的方法的最新进展。我们讨论了模型开发的不同阶段,包括数据预处理、图构建、嵌入初始化、传播层、嵌入融合、分数计算和训练方法。此外,我们还探讨了在确保多样性、偶然性和公平性的同时保持高准确性所遇到的实际困难。最后,我们讨论了开发基于 GNN 的更强大的推荐系统的潜在未来研究方向,这些研究方向超越了只关注准确性的单维视角。本综述旨在让研究人员和从业人员深入了解在设计基于 GNN 的推荐系统时出现的多方面问题,通过对准确性以外的维度进行全面探讨,使我们的工作与众不同。
{"title":"Beyond-accuracy: a review on diversity, serendipity, and fairness in recommender systems based on graph neural networks.","authors":"Tomislav Duricic, Dominik Kowald, Emanuel Lacic, Elisabeth Lex","doi":"10.3389/fdata.2023.1251072","DOIUrl":"10.3389/fdata.2023.1251072","url":null,"abstract":"<p><p>By providing personalized suggestions to users, recommender systems have become essential to numerous online platforms. Collaborative filtering, particularly graph-based approaches using Graph Neural Networks (GNNs), have demonstrated great results in terms of recommendation accuracy. However, accuracy may not always be the most important criterion for evaluating recommender systems' performance, since beyond-accuracy aspects such as recommendation diversity, serendipity, and fairness can strongly influence user engagement and satisfaction. This review paper focuses on addressing these dimensions in GNN-based recommender systems, going beyond the conventional accuracy-centric perspective. We begin by reviewing recent developments in approaches that improve not only the accuracy-diversity trade-off but also promote serendipity, and fairness in GNN-based recommender systems. We discuss different stages of model development including data preprocessing, graph construction, embedding initialization, propagation layers, embedding fusion, score computation, and training methodologies. Furthermore, we present a look into the practical difficulties encountered in assuring diversity, serendipity, and fairness, while retaining high accuracy. Finally, we discuss potential future research directions for developing more robust GNN-based recommender systems that go beyond the unidimensional perspective of focusing solely on accuracy. This review aims to provide researchers and practitioners with an in-depth understanding of the multifaceted issues that arise when designing GNN-based recommender systems, setting our work apart by offering a comprehensive exploration of beyond-accuracy dimensions.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1251072"},"PeriodicalIF":3.1,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10762851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum: Towards an understanding of global brain data governance: ethical positions that underpin global brain data governance discourse. 更正:对全球脑数据治理的理解:支撑全球脑数据治理讨论的伦理立场。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-19 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1344345
Paschal Ochang, Damian Eke, Bernd Carsten Stahl

[This corrects the article DOI: 10.3389/fdata.2023.1240660.].

[此处更正了文章 DOI:10.3389/fdata.2023.1240660]。
{"title":"Corrigendum: Towards an understanding of global brain data governance: ethical positions that underpin global brain data governance discourse.","authors":"Paschal Ochang, Damian Eke, Bernd Carsten Stahl","doi":"10.3389/fdata.2023.1344345","DOIUrl":"10.3389/fdata.2023.1344345","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1240660.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1344345"},"PeriodicalIF":3.1,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10758607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum: Do you hear the people sing? Comparison of synchronized URL and narrative themes in 2020 and 2023 French protests. 更正:你听到人民在歌唱吗?2020 年和 2023 年法国抗议活动中同步 URL 和叙事主题的比较。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-12 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1343108
Lynnette Hui Xian Ng, Kathleen M Carley

[This corrects the article DOI: 10.3389/fdata.2023.1221744.].

[此处更正了文章 DOI:10.3389/fdata.2023.1221744]。
{"title":"Corrigendum: Do you hear the people sing? Comparison of synchronized URL and narrative themes in 2020 and 2023 French protests.","authors":"Lynnette Hui Xian Ng, Kathleen M Carley","doi":"10.3389/fdata.2023.1343108","DOIUrl":"https://doi.org/10.3389/fdata.2023.1343108","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1221744.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1343108"},"PeriodicalIF":3.1,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10750104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139040893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum: Anemia detection through non-invasive analysis of lip mucosa images. 更正:通过对嘴唇粘膜图像的非侵入性分析检测贫血。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-11 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1335213
Shekhar Mahmud, Turker Berk Donmez, Mohammed Mansour, Mustafa Kutlu, Chris Freeman

[This corrects the article DOI: 10.3389/fdata.2023.1241899.].

[此处更正了文章 DOI:10.3389/fdata.2023.1241899]。
{"title":"Corrigendum: Anemia detection through non-invasive analysis of lip mucosa images.","authors":"Shekhar Mahmud, Turker Berk Donmez, Mohammed Mansour, Mustafa Kutlu, Chris Freeman","doi":"10.3389/fdata.2023.1335213","DOIUrl":"https://doi.org/10.3389/fdata.2023.1335213","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1241899.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1335213"},"PeriodicalIF":3.1,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10749427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139038212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TEE-Graph: efficient privacy and ownership protection for cloud-based graph spectral analysis. TEE-Graph:基于云的图谱分析的高效隐私和所有权保护。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-11-30 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1296469
A K M Mubashwir Alam, Keke Chen

Introduction: Big graphs like social network user interactions and customer rating matrices require significant computing resources to maintain. Data owners are now using public cloud resources for storage and computing elasticity. However, existing solutions do not fully address the privacy and ownership protection needs of the key involved parties: data contributors and the data owner who collects data from contributors.

Methods: We propose a Trusted Execution Environment (TEE) based solution: TEE-Graph for graph spectral analysis of outsourced graphs in the cloud. TEEs are new CPU features that can enable much more efficient confidential computing solutions than traditional software-based cryptographic ones. Our approach has several unique contributions compared to existing confidential graph analysis approaches. (1) It utilizes the unique TEE properties to ensure contributors' new privacy needs, e.g., the right of revocation for shared data. (2) It implements efficient access-pattern protection with a differentially private data encoding method. And (3) it implements TEE-based special analysis algorithms: the Lanczos method and the Nystrom method for efficiently handling big graphs and protecting confidentiality from compromised cloud providers.

Results: The TEE-Graph approach is much more efficient than software crypto approaches and also immune to access-pattern-based attacks. Compared with the best-known software crypto approach for graph spectral analysis, PrivateGraph, we have seen that TEE-Graph has 103-105 times lower computation, storage, and communication costs. Furthermore, the proposed access-pattern protection method incurs only about 10%-25% of the overall computation cost.

Discussion: Our experimentation showed that TEE-Graph performs significantly better and has lower costs than typical software approaches. It also addresses the unique ownership and access-pattern issues that other TEE-related graph analytics approaches have not sufficiently studied. The proposed approach can be extended to other graph analytics problems with strong ownership and access-pattern protection.

简介社交网络用户互动和客户评级矩阵等大型图表需要大量计算资源来维护。目前,数据所有者正在使用公共云资源进行存储和弹性计算。然而,现有的解决方案并不能完全满足关键相关方(数据贡献者和从贡献者处收集数据的数据所有者)的隐私和所有权保护需求:我们提出了一种基于可信执行环境(TEE)的解决方案:方法:我们提出了一种基于可信执行环境(TEE)的解决方案:TEE-Graph,用于对云中的外包图形进行图谱分析。TEE是CPU的新特性,与传统的基于软件的加密解决方案相比,它能提供更高效的保密计算解决方案。与现有的保密图分析方法相比,我们的方法有几个独特的贡献。(1) 它利用独特的 TEE 特性来确保贡献者新的隐私需求,例如共享数据的撤销权。(2) 它利用不同的隐私数据编码方法实现了高效的访问模式保护。(3) 它实现了基于 TEE 的特殊分析算法:Lanczos 方法和 Nystrom 方法,可有效处理大型图并保护机密性免受受损云提供商的破坏:结果:TEE-Graph 方法比软件加密方法更高效,而且还能抵御基于访问模式的攻击。与最著名的用于图谱分析的软件加密方法 PrivateGraph 相比,我们发现 TEE-Graph 的计算、存储和通信成本要低 103-105 倍。此外,所提出的访问模式保护方法只占总体计算成本的 10%-25%:我们的实验表明,与典型的软件方法相比,TEE-Graph 的性能明显更好,成本更低。它还解决了其他与 TEE 相关的图分析方法尚未充分研究的独特所有权和访问模式问题。所提出的方法可扩展到其他具有强大所有权和访问模式保护的图分析问题。
{"title":"TEE-Graph: efficient privacy and ownership protection for cloud-based graph spectral analysis.","authors":"A K M Mubashwir Alam, Keke Chen","doi":"10.3389/fdata.2023.1296469","DOIUrl":"https://doi.org/10.3389/fdata.2023.1296469","url":null,"abstract":"<p><strong>Introduction: </strong>Big graphs like social network user interactions and customer rating matrices require significant computing resources to maintain. Data owners are now using public cloud resources for storage and computing elasticity. However, existing solutions do not fully address the privacy and ownership protection needs of the key involved parties: data contributors and the data owner who collects data from contributors.</p><p><strong>Methods: </strong>We propose a Trusted Execution Environment (TEE) based solution: TEE-Graph for graph spectral analysis of outsourced graphs in the cloud. TEEs are new CPU features that can enable much more efficient confidential computing solutions than traditional software-based cryptographic ones. Our approach has several unique contributions compared to existing confidential graph analysis approaches. (1) It utilizes the unique TEE properties to ensure contributors' new privacy needs, e.g., the right of revocation for shared data. (2) It implements efficient access-pattern protection with a differentially private data encoding method. And (3) it implements TEE-based special analysis algorithms: the Lanczos method and the Nystrom method for efficiently handling big graphs and protecting confidentiality from compromised cloud providers.</p><p><strong>Results: </strong>The TEE-Graph approach is much more efficient than software crypto approaches and also immune to access-pattern-based attacks. Compared with the best-known software crypto approach for graph spectral analysis, PrivateGraph, we have seen that TEE-Graph has 10<sup>3</sup>-10<sup>5</sup> times lower computation, storage, and communication costs. Furthermore, the proposed access-pattern protection method incurs only about 10%-25% of the overall computation cost.</p><p><strong>Discussion: </strong>Our experimentation showed that TEE-Graph performs significantly better and has lower costs than typical software approaches. It also addresses the unique ownership and access-pattern issues that other TEE-related graph analytics approaches have not sufficiently studied. The proposed approach can be extended to other graph analytics problems with strong ownership and access-pattern protection.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1296469"},"PeriodicalIF":3.1,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10724017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1