Pub Date : 2024-01-08eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1296508
Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen
The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.
由于隐私和法律方面的原因,无法获得原始数据,而合成数据作为真实数据的一种增强工具,其使用势头日益强劲。生成对抗网络(GANs)是生成模型的典范,最初用于图像,后来用于表格数据,它为许多最先进的合成器做出了贡献。随着 GANs 的改进,合成数据与真实数据越来越相似,从而有可能泄露隐私。差分隐私(DP)在理论上保证了隐私不会丢失,但却降低了数据的实用性。如何实现最佳权衡仍是一个具有挑战性的研究问题。在本研究中,我们提出了 CTAB-GAN+ 一种新型条件表式 GAN。CTAB-GAN+ 通过以下方式改进了最先进的技术:(i) 在条件 GAN 中添加下游损失,以在分类和回归领域获得更高的合成数据效用;(ii) 使用带有梯度惩罚的 Wasserstein 损失,以获得更好的训练收敛性;(iii) 引入新型编码器,以混合连续分类变量和具有不平衡或倾斜数据的变量为目标;(iv) 使用 DP 随机梯度下降法进行训练,以提供严格的隐私保证。我们对 CTAB-GAN+ 的统计相似性和机器学习效用进行了广泛评估,并与最先进的表格型 GAN 进行了比较。结果表明,在给定隐私预算的情况下,CTAB-GAN+ 在多个数据集和学习任务中合成的隐私保护数据的机器学习效用(即 F1 分数)至少高出 21.9%。
{"title":"CTAB-GAN+: enhancing tabular data synthesis.","authors":"Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen","doi":"10.3389/fdata.2023.1296508","DOIUrl":"https://doi.org/10.3389/fdata.2023.1296508","url":null,"abstract":"<p><p>The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1296508"},"PeriodicalIF":3.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10801038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139520685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-04DOI: 10.3389/fdata.2023.1282541
Erman Arif, Elin Herlinawati, D. Devianto, Mutia Yollanda, Dony Permana
Inflation is capable of significantly impacting monetary policy, thereby emphasizing the need for accurate forecasts to guide decisions aimed at stabilizing inflation rates. Given the significant relationship between inflation and monetary, it becomes feasible to detect long-memory patterns within the data. To capture these long-memory patterns, Autoregressive Fractionally Moving Average (ARFIMA) was developed as a valuable tool in data mining. Due to the challenges posed in residual assumptions, time series model has to be developed to address heteroscedasticity. Consequently, the implementation of a suitable model was imperative to rectify this effect within the residual ARFIMA. In this context, a novel hybrid model was proposed, with Generalized Autoregressive Conditional Heteroscedasticity (GARCH) being replaced by Long Short-Term Memory (LSTM) neural network. The network was used as iterative model to address this issue and achieve optimal parameters. Through a sensitivity analysis using mean absolute percentage error (MAPE), mean squared error (MSE), and mean absolute error (MAE), the performance of ARFIMA, ARFIMA-GARCH, and ARFIMA-LSTM models was assessed. The results showed that ARFIMA-LSTM excelled in simulating the inflation rate. This provided further evidence that inflation data showed characteristics of long memory, and the accuracy of the model was improved by integrating LSTM neural network.
{"title":"Hybridization of long short-term memory neural network in fractional time series modeling of inflation","authors":"Erman Arif, Elin Herlinawati, D. Devianto, Mutia Yollanda, Dony Permana","doi":"10.3389/fdata.2023.1282541","DOIUrl":"https://doi.org/10.3389/fdata.2023.1282541","url":null,"abstract":"Inflation is capable of significantly impacting monetary policy, thereby emphasizing the need for accurate forecasts to guide decisions aimed at stabilizing inflation rates. Given the significant relationship between inflation and monetary, it becomes feasible to detect long-memory patterns within the data. To capture these long-memory patterns, Autoregressive Fractionally Moving Average (ARFIMA) was developed as a valuable tool in data mining. Due to the challenges posed in residual assumptions, time series model has to be developed to address heteroscedasticity. Consequently, the implementation of a suitable model was imperative to rectify this effect within the residual ARFIMA. In this context, a novel hybrid model was proposed, with Generalized Autoregressive Conditional Heteroscedasticity (GARCH) being replaced by Long Short-Term Memory (LSTM) neural network. The network was used as iterative model to address this issue and achieve optimal parameters. Through a sensitivity analysis using mean absolute percentage error (MAPE), mean squared error (MSE), and mean absolute error (MAE), the performance of ARFIMA, ARFIMA-GARCH, and ARFIMA-LSTM models was assessed. The results showed that ARFIMA-LSTM excelled in simulating the inflation rate. This provided further evidence that inflation data showed characteristics of long memory, and the accuracy of the model was improved by integrating LSTM neural network.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"3 3","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139384694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-22DOI: 10.3389/fdata.2023.1320569
Renushka Madarie, Christianne J. de Poot, Marleen Weulen Kranenbarg
Few studies have examined the sales of stolen account credentials on darkweb markets. In this study, we tested how advertisement characteristics affect the popularity of illicit online advertisements offering account credentials. Unlike previous criminological research, we take a novel approach by assessing the applicability of knowledge on regular consumer behaviours instead of theories explaining offender behaviour.We scraped 1,565 unique advertisements offering credentials on a darkweb market. We used this panel data set to predict the simultaneous effects of the asking price, endorsement cues and title elements on advertisement popularity by estimating several hybrid panel data models.Most of our findings disconfirm our hypotheses. Asking price did not affect advertisement popularity. Endorsement cues, including vendor reputation and cumulative sales and views, had mixed and negative relationships, respectively, with advertisement popularity.Our results might suggest that account credentials are not simply regular products, but high-risk commodities that, paradoxically, become less attractive as they gain popularity. This study highlights the necessity of a deeper understanding of illicit online market dynamics to improve theories on illicit consumer behaviours and assist cybersecurity experts in disrupting criminal business models more effectively. We propose several avenues for future experimental research to gain further insights into these illicit processes.
{"title":"Criminal clickbait: a panel data analysis on the attractiveness of online advertisements offering stolen data","authors":"Renushka Madarie, Christianne J. de Poot, Marleen Weulen Kranenbarg","doi":"10.3389/fdata.2023.1320569","DOIUrl":"https://doi.org/10.3389/fdata.2023.1320569","url":null,"abstract":"Few studies have examined the sales of stolen account credentials on darkweb markets. In this study, we tested how advertisement characteristics affect the popularity of illicit online advertisements offering account credentials. Unlike previous criminological research, we take a novel approach by assessing the applicability of knowledge on regular consumer behaviours instead of theories explaining offender behaviour.We scraped 1,565 unique advertisements offering credentials on a darkweb market. We used this panel data set to predict the simultaneous effects of the asking price, endorsement cues and title elements on advertisement popularity by estimating several hybrid panel data models.Most of our findings disconfirm our hypotheses. Asking price did not affect advertisement popularity. Endorsement cues, including vendor reputation and cumulative sales and views, had mixed and negative relationships, respectively, with advertisement popularity.Our results might suggest that account credentials are not simply regular products, but high-risk commodities that, paradoxically, become less attractive as they gain popularity. This study highlights the necessity of a deeper understanding of illicit online market dynamics to improve theories on illicit consumer behaviours and assist cybersecurity experts in disrupting criminal business models more effectively. We propose several avenues for future experimental research to gain further insights into these illicit processes.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"1 11","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138944240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-20eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1338363
Shekhar Mahmud, Mohammed Mansour, Turker Berk Donmez, Mustafa Kutlu, Chris Freeman
[This corrects the article DOI: 10.3389/fdata.2023.1291329.].
[此处更正了文章 DOI:10.3389/fdata.2023.1291329]。
{"title":"Corrigendum: Non-invasive detection of anemia using lip mucosa images transfer learning convolutional neural networks.","authors":"Shekhar Mahmud, Mohammed Mansour, Turker Berk Donmez, Mustafa Kutlu, Chris Freeman","doi":"10.3389/fdata.2023.1338363","DOIUrl":"https://doi.org/10.3389/fdata.2023.1338363","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1291329.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1338363"},"PeriodicalIF":3.1,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10762862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19DOI: 10.3389/fdata.2023.1227189
Brendan Hoover, Dakota Zaengle, M. Mark-Moser, Patrick C. Wingo, Anuj Suhag, Kelly Rose
Subsurface interpretations and models rely on knowledge from subject matter experts who utilize unstructured information from images, maps, cross sections, and other products to provide context to measured data (e. g., cores, well logs, seismic surveys). To enhance such knowledge discovery, we advanced the National Energy Technology Laboratory's (NETL) Subsurface Trend Analysis (STA) workflow with an artificial intelligence (AI) deep learning approach for image embedding. NETL's STA method offers a validated science-based approach of combining geologic systems knowledge, statistical modeling, and datasets to improve predictions of subsurface properties. The STA image embedding tool quickly extracts images from unstructured knowledge products like publications, maps, websites, and presentations; categorically labels the images; and creates a repository for geologic domain postulation. Via a case study on geographic and subsurface literature of the Gulf of Mexico (GOM), results show the STA image embedding tool extracts images and correctly labels them with ~90 to ~95% accuracy.
地下解释和模型依赖于主题专家的知识,他们利用图像、地图、横截面和其他产品中的非结构化信息为测量数据(如岩心、测井记录、地震勘探)提供背景信息。为了加强这种知识发现,我们将国家能源技术实验室(NETL)的地下趋势分析(STA)工作流程与人工智能(AI)深度学习方法相结合,用于图像嵌入。NETL 的 STA 方法提供了一种经过验证的基于科学的方法,将地质系统知识、统计建模和数据集结合起来,以改进对地下属性的预测。STA 图像嵌入工具可从出版物、地图、网站和演示文稿等非结构化知识产品中快速提取图像,对图像进行分类标记,并创建一个地质领域推测库。通过对墨西哥湾(GOM)的地理和地下文献进行案例研究,结果表明 STA 图像嵌入工具提取图像并正确标注的准确率在 90% 到 95% 之间。
{"title":"Enhancing knowledge discovery from unstructured data using a deep learning approach to support subsurface modeling predictions","authors":"Brendan Hoover, Dakota Zaengle, M. Mark-Moser, Patrick C. Wingo, Anuj Suhag, Kelly Rose","doi":"10.3389/fdata.2023.1227189","DOIUrl":"https://doi.org/10.3389/fdata.2023.1227189","url":null,"abstract":"Subsurface interpretations and models rely on knowledge from subject matter experts who utilize unstructured information from images, maps, cross sections, and other products to provide context to measured data (e. g., cores, well logs, seismic surveys). To enhance such knowledge discovery, we advanced the National Energy Technology Laboratory's (NETL) Subsurface Trend Analysis (STA) workflow with an artificial intelligence (AI) deep learning approach for image embedding. NETL's STA method offers a validated science-based approach of combining geologic systems knowledge, statistical modeling, and datasets to improve predictions of subsurface properties. The STA image embedding tool quickly extracts images from unstructured knowledge products like publications, maps, websites, and presentations; categorically labels the images; and creates a repository for geologic domain postulation. Via a case study on geographic and subsurface literature of the Gulf of Mexico (GOM), results show the STA image embedding tool extracts images and correctly labels them with ~90 to ~95% accuracy.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":" 31","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138962433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1251072
Tomislav Duricic, Dominik Kowald, Emanuel Lacic, Elisabeth Lex
By providing personalized suggestions to users, recommender systems have become essential to numerous online platforms. Collaborative filtering, particularly graph-based approaches using Graph Neural Networks (GNNs), have demonstrated great results in terms of recommendation accuracy. However, accuracy may not always be the most important criterion for evaluating recommender systems' performance, since beyond-accuracy aspects such as recommendation diversity, serendipity, and fairness can strongly influence user engagement and satisfaction. This review paper focuses on addressing these dimensions in GNN-based recommender systems, going beyond the conventional accuracy-centric perspective. We begin by reviewing recent developments in approaches that improve not only the accuracy-diversity trade-off but also promote serendipity, and fairness in GNN-based recommender systems. We discuss different stages of model development including data preprocessing, graph construction, embedding initialization, propagation layers, embedding fusion, score computation, and training methodologies. Furthermore, we present a look into the practical difficulties encountered in assuring diversity, serendipity, and fairness, while retaining high accuracy. Finally, we discuss potential future research directions for developing more robust GNN-based recommender systems that go beyond the unidimensional perspective of focusing solely on accuracy. This review aims to provide researchers and practitioners with an in-depth understanding of the multifaceted issues that arise when designing GNN-based recommender systems, setting our work apart by offering a comprehensive exploration of beyond-accuracy dimensions.
{"title":"Beyond-accuracy: a review on diversity, serendipity, and fairness in recommender systems based on graph neural networks.","authors":"Tomislav Duricic, Dominik Kowald, Emanuel Lacic, Elisabeth Lex","doi":"10.3389/fdata.2023.1251072","DOIUrl":"10.3389/fdata.2023.1251072","url":null,"abstract":"<p><p>By providing personalized suggestions to users, recommender systems have become essential to numerous online platforms. Collaborative filtering, particularly graph-based approaches using Graph Neural Networks (GNNs), have demonstrated great results in terms of recommendation accuracy. However, accuracy may not always be the most important criterion for evaluating recommender systems' performance, since beyond-accuracy aspects such as recommendation diversity, serendipity, and fairness can strongly influence user engagement and satisfaction. This review paper focuses on addressing these dimensions in GNN-based recommender systems, going beyond the conventional accuracy-centric perspective. We begin by reviewing recent developments in approaches that improve not only the accuracy-diversity trade-off but also promote serendipity, and fairness in GNN-based recommender systems. We discuss different stages of model development including data preprocessing, graph construction, embedding initialization, propagation layers, embedding fusion, score computation, and training methodologies. Furthermore, we present a look into the practical difficulties encountered in assuring diversity, serendipity, and fairness, while retaining high accuracy. Finally, we discuss potential future research directions for developing more robust GNN-based recommender systems that go beyond the unidimensional perspective of focusing solely on accuracy. This review aims to provide researchers and practitioners with an in-depth understanding of the multifaceted issues that arise when designing GNN-based recommender systems, setting our work apart by offering a comprehensive exploration of beyond-accuracy dimensions.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1251072"},"PeriodicalIF":3.1,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10762851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1344345
Paschal Ochang, Damian Eke, Bernd Carsten Stahl
[This corrects the article DOI: 10.3389/fdata.2023.1240660.].
[此处更正了文章 DOI:10.3389/fdata.2023.1240660]。
{"title":"Corrigendum: Towards an understanding of global brain data governance: ethical positions that underpin global brain data governance discourse.","authors":"Paschal Ochang, Damian Eke, Bernd Carsten Stahl","doi":"10.3389/fdata.2023.1344345","DOIUrl":"10.3389/fdata.2023.1344345","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1240660.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1344345"},"PeriodicalIF":3.1,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10758607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1343108
Lynnette Hui Xian Ng, Kathleen M Carley
[This corrects the article DOI: 10.3389/fdata.2023.1221744.].
[此处更正了文章 DOI:10.3389/fdata.2023.1221744]。
{"title":"Corrigendum: Do you hear the people sing? Comparison of synchronized URL and narrative themes in 2020 and 2023 French protests.","authors":"Lynnette Hui Xian Ng, Kathleen M Carley","doi":"10.3389/fdata.2023.1343108","DOIUrl":"https://doi.org/10.3389/fdata.2023.1343108","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1221744.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1343108"},"PeriodicalIF":3.1,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10750104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139040893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-11eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1335213
Shekhar Mahmud, Turker Berk Donmez, Mohammed Mansour, Mustafa Kutlu, Chris Freeman
[This corrects the article DOI: 10.3389/fdata.2023.1241899.].
[此处更正了文章 DOI:10.3389/fdata.2023.1241899]。
{"title":"Corrigendum: Anemia detection through non-invasive analysis of lip mucosa images.","authors":"Shekhar Mahmud, Turker Berk Donmez, Mohammed Mansour, Mustafa Kutlu, Chris Freeman","doi":"10.3389/fdata.2023.1335213","DOIUrl":"https://doi.org/10.3389/fdata.2023.1335213","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2023.1241899.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1335213"},"PeriodicalIF":3.1,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10749427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139038212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1296469
A K M Mubashwir Alam, Keke Chen
Introduction: Big graphs like social network user interactions and customer rating matrices require significant computing resources to maintain. Data owners are now using public cloud resources for storage and computing elasticity. However, existing solutions do not fully address the privacy and ownership protection needs of the key involved parties: data contributors and the data owner who collects data from contributors.
Methods: We propose a Trusted Execution Environment (TEE) based solution: TEE-Graph for graph spectral analysis of outsourced graphs in the cloud. TEEs are new CPU features that can enable much more efficient confidential computing solutions than traditional software-based cryptographic ones. Our approach has several unique contributions compared to existing confidential graph analysis approaches. (1) It utilizes the unique TEE properties to ensure contributors' new privacy needs, e.g., the right of revocation for shared data. (2) It implements efficient access-pattern protection with a differentially private data encoding method. And (3) it implements TEE-based special analysis algorithms: the Lanczos method and the Nystrom method for efficiently handling big graphs and protecting confidentiality from compromised cloud providers.
Results: The TEE-Graph approach is much more efficient than software crypto approaches and also immune to access-pattern-based attacks. Compared with the best-known software crypto approach for graph spectral analysis, PrivateGraph, we have seen that TEE-Graph has 103-105 times lower computation, storage, and communication costs. Furthermore, the proposed access-pattern protection method incurs only about 10%-25% of the overall computation cost.
Discussion: Our experimentation showed that TEE-Graph performs significantly better and has lower costs than typical software approaches. It also addresses the unique ownership and access-pattern issues that other TEE-related graph analytics approaches have not sufficiently studied. The proposed approach can be extended to other graph analytics problems with strong ownership and access-pattern protection.
简介社交网络用户互动和客户评级矩阵等大型图表需要大量计算资源来维护。目前,数据所有者正在使用公共云资源进行存储和弹性计算。然而,现有的解决方案并不能完全满足关键相关方(数据贡献者和从贡献者处收集数据的数据所有者)的隐私和所有权保护需求:我们提出了一种基于可信执行环境(TEE)的解决方案:方法:我们提出了一种基于可信执行环境(TEE)的解决方案:TEE-Graph,用于对云中的外包图形进行图谱分析。TEE是CPU的新特性,与传统的基于软件的加密解决方案相比,它能提供更高效的保密计算解决方案。与现有的保密图分析方法相比,我们的方法有几个独特的贡献。(1) 它利用独特的 TEE 特性来确保贡献者新的隐私需求,例如共享数据的撤销权。(2) 它利用不同的隐私数据编码方法实现了高效的访问模式保护。(3) 它实现了基于 TEE 的特殊分析算法:Lanczos 方法和 Nystrom 方法,可有效处理大型图并保护机密性免受受损云提供商的破坏:结果:TEE-Graph 方法比软件加密方法更高效,而且还能抵御基于访问模式的攻击。与最著名的用于图谱分析的软件加密方法 PrivateGraph 相比,我们发现 TEE-Graph 的计算、存储和通信成本要低 103-105 倍。此外,所提出的访问模式保护方法只占总体计算成本的 10%-25%:我们的实验表明,与典型的软件方法相比,TEE-Graph 的性能明显更好,成本更低。它还解决了其他与 TEE 相关的图分析方法尚未充分研究的独特所有权和访问模式问题。所提出的方法可扩展到其他具有强大所有权和访问模式保护的图分析问题。
{"title":"TEE-Graph: efficient privacy and ownership protection for cloud-based graph spectral analysis.","authors":"A K M Mubashwir Alam, Keke Chen","doi":"10.3389/fdata.2023.1296469","DOIUrl":"https://doi.org/10.3389/fdata.2023.1296469","url":null,"abstract":"<p><strong>Introduction: </strong>Big graphs like social network user interactions and customer rating matrices require significant computing resources to maintain. Data owners are now using public cloud resources for storage and computing elasticity. However, existing solutions do not fully address the privacy and ownership protection needs of the key involved parties: data contributors and the data owner who collects data from contributors.</p><p><strong>Methods: </strong>We propose a Trusted Execution Environment (TEE) based solution: TEE-Graph for graph spectral analysis of outsourced graphs in the cloud. TEEs are new CPU features that can enable much more efficient confidential computing solutions than traditional software-based cryptographic ones. Our approach has several unique contributions compared to existing confidential graph analysis approaches. (1) It utilizes the unique TEE properties to ensure contributors' new privacy needs, e.g., the right of revocation for shared data. (2) It implements efficient access-pattern protection with a differentially private data encoding method. And (3) it implements TEE-based special analysis algorithms: the Lanczos method and the Nystrom method for efficiently handling big graphs and protecting confidentiality from compromised cloud providers.</p><p><strong>Results: </strong>The TEE-Graph approach is much more efficient than software crypto approaches and also immune to access-pattern-based attacks. Compared with the best-known software crypto approach for graph spectral analysis, PrivateGraph, we have seen that TEE-Graph has 10<sup>3</sup>-10<sup>5</sup> times lower computation, storage, and communication costs. Furthermore, the proposed access-pattern protection method incurs only about 10%-25% of the overall computation cost.</p><p><strong>Discussion: </strong>Our experimentation showed that TEE-Graph performs significantly better and has lower costs than typical software approaches. It also addresses the unique ownership and access-pattern issues that other TEE-related graph analytics approaches have not sufficiently studied. The proposed approach can be extended to other graph analytics problems with strong ownership and access-pattern protection.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1296469"},"PeriodicalIF":3.1,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10724017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}