首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project. 联合学习在基因组数据方面的功效:对英国生物库和 1000 个基因组项目的研究。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-29 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1266031
Dmitry Kolobkov, Satyarth Mishra Sharma, Aleksandr Medvedev, Mikhail Lebedev, Egor Kosaretskiy, Ruslan Vakhitov

Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.

将来自多个来源的训练数据结合起来,可以增加样本量,减少混杂因素,从而建立更准确、偏差更小的机器学习模型。然而,在医疗保健领域,数据保管人往往不允许直接汇集数据,因为他们有责任尽量减少敏感信息的暴露。联盟学习以分散的方式训练模型,从而降低了数据泄漏的风险,为这一问题提供了一个很有前景的解决方案。虽然联合学习在临床数据上的应用越来越多,但其在个人层面基因组数据上的功效还未得到研究。本研究通过研究联合学习在两种情况下的适用性,为基因组数据的采用奠定了基础:英国生物库数据的表型预测和千人基因组计划数据的祖先预测。我们的研究表明,即使在节点间存在显著异质性的情况下,在分割成独立节点的数据上训练的联合模型也能获得接近集中模型的性能。此外,我们还研究了联合模型的准确性如何受到通信频率的影响,并提出了降低计算复杂性或通信成本的方法。
{"title":"Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.","authors":"Dmitry Kolobkov, Satyarth Mishra Sharma, Aleksandr Medvedev, Mikhail Lebedev, Egor Kosaretskiy, Ruslan Vakhitov","doi":"10.3389/fdata.2024.1266031","DOIUrl":"10.3389/fdata.2024.1266031","url":null,"abstract":"<p><p>Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1266031"},"PeriodicalIF":3.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10937521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge-based recommender systems: overview and research directions. 基于知识的推荐系统:概述与研究方向。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-26 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1304439
Mathias Uta, Alexander Felfernig, Viet-Man Le, Thi Ngoc Trang Tran, Damian Garber, Sebastian Lubos, Tamim Burgstaller

Recommender systems are decision support systems that help users to identify items of relevance from a potentially large set of alternatives. In contrast to the mainstream recommendation approaches of collaborative filtering and content-based filtering, knowledge-based recommenders exploit semantic user preference knowledge, item knowledge, and recommendation knowledge, to identify user-relevant items which is of specific relevance when dealing with complex and high-involvement items. Such recommenders are primarily applied in scenarios where users specify (and revise) their preferences, and related recommendations are determined on the basis of constraints or attribute-level similarity metrics. In this article, we provide an overview of the existing state-of-the-art in knowledge-based recommender systems. Different related recommendation techniques are explained on the basis of a working example from the domain of survey software services. On the basis of our analysis, we outline different directions for future research.

推荐系统是一种决策支持系统,可帮助用户从潜在的大量备选项目中识别相关项目。与协作过滤和基于内容的过滤等主流推荐方法不同,基于知识的推荐器利用语义用户偏好知识、项目知识和推荐知识来识别用户相关项目,这在处理复杂和高参与度项目时具有特殊意义。这类推荐器主要应用于用户指定(和修改)其偏好,并根据约束条件或属性级相似度指标确定相关推荐的场景。本文概述了基于知识的推荐系统的现有先进技术。我们以调查软件服务领域的一个工作实例为基础,解释了不同的相关推荐技术。在分析的基础上,我们概述了未来研究的不同方向。
{"title":"Knowledge-based recommender systems: overview and research directions.","authors":"Mathias Uta, Alexander Felfernig, Viet-Man Le, Thi Ngoc Trang Tran, Damian Garber, Sebastian Lubos, Tamim Burgstaller","doi":"10.3389/fdata.2024.1304439","DOIUrl":"10.3389/fdata.2024.1304439","url":null,"abstract":"<p><p>Recommender systems are decision support systems that help users to identify items of relevance from a potentially large set of alternatives. In contrast to the mainstream recommendation approaches of collaborative filtering and content-based filtering, knowledge-based recommenders exploit semantic user preference knowledge, item knowledge, and recommendation knowledge, to identify user-relevant items which is of specific relevance when dealing with complex and high-involvement items. Such recommenders are primarily applied in scenarios where users specify (and revise) their preferences, and related recommendations are determined on the basis of constraints or attribute-level similarity metrics. In this article, we provide an overview of the existing state-of-the-art in knowledge-based recommender systems. Different related recommendation techniques are explained on the basis of a working example from the domain of survey software services. On the basis of our analysis, we outline different directions for future research.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1304439"},"PeriodicalIF":3.1,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10925703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140102782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Internet of Medical Things and computational intelligence in healthcare 4.0. 社论:医疗物联网和计算智能在医疗保健 4.0 中的应用。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-21 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1368581
Sujata Dash, Subhendu Kumar Pani, Wellington Pinheiro Dos Santos
{"title":"Editorial: Internet of Medical Things and computational intelligence in healthcare 4.0.","authors":"Sujata Dash, Subhendu Kumar Pani, Wellington Pinheiro Dos Santos","doi":"10.3389/fdata.2024.1368581","DOIUrl":"10.3389/fdata.2024.1368581","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1368581"},"PeriodicalIF":3.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10916686/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Cyber security in the wake of fourth industrial revolution: opportunities and challenges. 社论:第四次工业革命后的网络安全:机遇与挑战。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-21 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1369159
Elochukwu Ukwandu, Chaminda Hewage, Hanan Hindy
{"title":"Editorial: Cyber security in the wake of fourth industrial revolution: opportunities and challenges.","authors":"Elochukwu Ukwandu, Chaminda Hewage, Hanan Hindy","doi":"10.3389/fdata.2024.1369159","DOIUrl":"https://doi.org/10.3389/fdata.2024.1369159","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1369159"},"PeriodicalIF":3.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10915258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing multi-objective task scheduling in fog computing with GA-PSO algorithm for big data application. 利用 GA-PSO 算法优化大数据应用中的雾计算多目标任务调度。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-21 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1358486
Muhammad Saad, Rabia Noor Enam, Rehan Qureshi

As the volume and velocity of Big Data continue to grow, traditional cloud computing approaches struggle to meet the demands of real-time processing and low latency. Fog computing, with its distributed network of edge devices, emerges as a compelling solution. However, efficient task scheduling in fog computing remains a challenge due to its inherently multi-objective nature, balancing factors like execution time, response time, and resource utilization. This paper proposes a hybrid Genetic Algorithm (GA)-Particle Swarm Optimization (PSO) algorithm to optimize multi-objective task scheduling in fog computing environments. The hybrid approach combines the strengths of GA and PSO, achieving effective exploration and exploitation of the search space, leading to improved performance compared to traditional single-algorithm approaches. The proposed hybrid algorithm results improved the execution time by 85.68% when compared with GA algorithm, by 84% when compared with Hybrid PWOA and by 51.03% when compared with PSO algorithm as well as it improved the response time by 67.28% when compared with GA algorithm, by 54.24% when compared with Hybrid PWOA and by 75.40% when compared with PSO algorithm as well as it improved the completion time by 68.69% when compared with GA algorithm, by 98.91% when compared with Hybrid PWOA and by 75.90% when compared with PSO algorithm when various tasks inputs are given. The proposed hybrid algorithm results also improved the execution time by 84.87% when compared with GA algorithm, by 88.64% when compared with Hybrid PWOA and by 85.07% when compared with PSO algorithm it improved the response time by 65.92% when compared with GA algorithm, by 80.51% when compared with Hybrid PWOA and by 85.26% when compared with PSO algorithm as well as it improved the completion time by 67.60% when compared with GA algorithm, by 81.34% when compared with Hybrid PWOA and by 85.23% when compared with PSO algorithm when various fog nodes are given.

随着大数据的数量和速度不断增长,传统的云计算方法难以满足实时处理和低延迟的要求。拥有边缘设备分布式网络的雾计算成为一种引人注目的解决方案。然而,由于雾计算本身具有多目标性,需要平衡执行时间、响应时间和资源利用率等因素,因此雾计算中的高效任务调度仍然是一项挑战。本文提出了一种遗传算法(GA)- 粒子群优化(PSO)混合算法,用于优化雾计算环境中的多目标任务调度。该混合方法结合了遗传算法和 PSO 的优势,实现了对搜索空间的有效探索和利用,与传统的单一算法方法相比,性能有所提高。与 GA 算法相比,混合算法的执行时间缩短了 85.68%;与混合 PWOA 算法相比,执行时间缩短了 84%;与 PSO 算法相比,执行时间缩短了 51.03%;与 GA 算法相比,响应时间缩短了 67.28%;与混合 PWOA 算法相比,响应时间缩短了 54.24%。与 GA 算法相比,它的响应时间缩短了 67.28%;与混合 PWOA 算法相比,它的响应时间缩短了 54.24%;与 PSO 算法相比,它的响应时间缩短了 75.40%;当给定各种任务输入时,与 GA 算法相比,它的完成时间缩短了 68.69%;与混合 PWOA 算法相比,它的完成时间缩短了 98.91%;与 PSO 算法相比,它的完成时间缩短了 75.90%。与 GA 算法相比,混合算法的执行时间缩短了 84.87%;与混合 PWOA 算法相比,执行时间缩短了 88.64%;与 PSO 算法相比,执行时间缩短了 85.07%;与 GA 算法相比,混合算法的响应时间缩短了 65.92%;与混合 PWOA 算法相比,响应时间缩短了 80.51%。与 GA 算法相比,它的响应时间缩短了 65.92%;与混合 PWOA 算法相比,它的响应时间缩短了 80.51%;与 PSO 算法相比,它的响应时间缩短了 85.26%;在给定各种雾节点的情况下,与 GA 算法相比,它的完成时间缩短了 67.60%;与混合 PWOA 算法相比,它的完成时间缩短了 81.34%;与 PSO 算法相比,它的完成时间缩短了 85.23%。
{"title":"Optimizing multi-objective task scheduling in fog computing with GA-PSO algorithm for big data application.","authors":"Muhammad Saad, Rabia Noor Enam, Rehan Qureshi","doi":"10.3389/fdata.2024.1358486","DOIUrl":"10.3389/fdata.2024.1358486","url":null,"abstract":"<p><p>As the volume and velocity of Big Data continue to grow, traditional cloud computing approaches struggle to meet the demands of real-time processing and low latency. Fog computing, with its distributed network of edge devices, emerges as a compelling solution. However, efficient task scheduling in fog computing remains a challenge due to its inherently multi-objective nature, balancing factors like execution time, response time, and resource utilization. This paper proposes a hybrid Genetic Algorithm (GA)-Particle Swarm Optimization (PSO) algorithm to optimize multi-objective task scheduling in fog computing environments. The hybrid approach combines the strengths of GA and PSO, achieving effective exploration and exploitation of the search space, leading to improved performance compared to traditional single-algorithm approaches. The proposed hybrid algorithm results improved the execution time by 85.68% when compared with GA algorithm, by 84% when compared with Hybrid PWOA and by 51.03% when compared with PSO algorithm as well as it improved the response time by 67.28% when compared with GA algorithm, by 54.24% when compared with Hybrid PWOA and by 75.40% when compared with PSO algorithm as well as it improved the completion time by 68.69% when compared with GA algorithm, by 98.91% when compared with Hybrid PWOA and by 75.90% when compared with PSO algorithm when various tasks inputs are given. The proposed hybrid algorithm results also improved the execution time by 84.87% when compared with GA algorithm, by 88.64% when compared with Hybrid PWOA and by 85.07% when compared with PSO algorithm it improved the response time by 65.92% when compared with GA algorithm, by 80.51% when compared with Hybrid PWOA and by 85.26% when compared with PSO algorithm as well as it improved the completion time by 67.60% when compared with GA algorithm, by 81.34% when compared with Hybrid PWOA and by 85.23% when compared with PSO algorithm when various fog nodes are given.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1358486"},"PeriodicalIF":3.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10915077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Big scientific data analytics on HPC and cloud. 社论:高性能计算和云计算上的大科学数据分析。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-20 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1353988
Jianwu Wang, Junqi Yin, Mai H Nguyen, Jingbo Wang, Weijia Xu
{"title":"Editorial: Big scientific data analytics on HPC and cloud.","authors":"Jianwu Wang, Junqi Yin, Mai H Nguyen, Jingbo Wang, Weijia Xu","doi":"10.3389/fdata.2024.1353988","DOIUrl":"https://doi.org/10.3389/fdata.2024.1353988","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1353988"},"PeriodicalIF":3.1,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10912602/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trends of the COVID-19 dynamics in 2022 and 2023 vs. the population age, testing and vaccination levels 2022 年和 2023 年 COVID-19 的动态趋势与人口年龄、检测和疫苗接种水平的对比
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-10 DOI: 10.3389/fdata.2023.1355080
I. Nesteruk
The population, governments, and researchers show much less interest in the COVID-19 pandemic. However, many questions still need to be answered: why the much less vaccinated African continent has accumulated 15 times less deaths per capita than Europe? or why in 2023 the global value of the case fatality risk is almost twice higher than in 2022 and the UK figure is four times higher than the global one?The averaged daily numbers of cases DCC and death DDC per million, case fatality risks DDC/DCC were calculated for 34 countries and regions with the use of John Hopkins University (JHU) datasets. Possible linear and non-linear correlations with the averaged daily numbers of tests per thousand DTC, median age of population A, and percentages of vaccinations VC and boosters BC were investigated.Strong correlations between age and DCC and DDC values were revealed. One-year increment in the median age yielded 39.8 increase in DCC values and 0.0799 DDC increase in 2022 (in 2023 these figures are 5.8 and 0.0263, respectively). With decreasing of testing level DTC, the case fatality risk can increase drastically. DCC and DDC values increase with increasing the percentages of fully vaccinated people and boosters, which definitely increase for greater A. After removing the influence of age, no correlations between vaccinations and DCC and DDC values were revealed.The presented analysis demonstrates that age is a pivot factor of visible (registered) part of the COVID-19 pandemic dynamics. Much younger Africa has registered less numbers of cases and death per capita due to many unregistered asymptomatic patients. Of great concern is the fact that COVID-19 mortality in 2023 in the UK is still at least 4 times higher than the global value caused by seasonal flu.
民众、政府和研究人员对 COVID-19 大流行的兴趣要小得多。然而,仍有许多问题需要解答:为什么疫苗接种率低得多的非洲大陆的人均死亡人数比欧洲少 15 倍?为什么 2023 年全球病例死亡风险值比 2022 年高出近两倍,而英国的数字比全球高出四倍?研究结果表明,年龄与 DCC 和 DDC 值之间存在很强的相关性。在 2022 年,年龄中位数每增加一年,DCC 值就会增加 39.8,DDC 值就会增加 0.0799(在 2023 年,这两个数字分别为 5.8 和 0.0263)。随着检测水平 DTC 的下降,病例死亡风险会急剧增加。在剔除年龄的影响后,没有发现疫苗接种与 DCC 和 DDC 值之间的相关性。由于有许多未登记的无症状患者,因此更年轻的非洲登记的病例和人均死亡人数更少。令人极为担忧的是,2023 年 COVID-19 在英国的死亡率仍比季节性流感造成的全球死亡率高出至少 4 倍。
{"title":"Trends of the COVID-19 dynamics in 2022 and 2023 vs. the population age, testing and vaccination levels","authors":"I. Nesteruk","doi":"10.3389/fdata.2023.1355080","DOIUrl":"https://doi.org/10.3389/fdata.2023.1355080","url":null,"abstract":"The population, governments, and researchers show much less interest in the COVID-19 pandemic. However, many questions still need to be answered: why the much less vaccinated African continent has accumulated 15 times less deaths per capita than Europe? or why in 2023 the global value of the case fatality risk is almost twice higher than in 2022 and the UK figure is four times higher than the global one?The averaged daily numbers of cases DCC and death DDC per million, case fatality risks DDC/DCC were calculated for 34 countries and regions with the use of John Hopkins University (JHU) datasets. Possible linear and non-linear correlations with the averaged daily numbers of tests per thousand DTC, median age of population A, and percentages of vaccinations VC and boosters BC were investigated.Strong correlations between age and DCC and DDC values were revealed. One-year increment in the median age yielded 39.8 increase in DCC values and 0.0799 DDC increase in 2022 (in 2023 these figures are 5.8 and 0.0263, respectively). With decreasing of testing level DTC, the case fatality risk can increase drastically. DCC and DDC values increase with increasing the percentages of fully vaccinated people and boosters, which definitely increase for greater A. After removing the influence of age, no correlations between vaccinations and DCC and DDC values were revealed.The presented analysis demonstrates that age is a pivot factor of visible (registered) part of the COVID-19 pandemic dynamics. Much younger Africa has registered less numbers of cases and death per capita due to many unregistered asymptomatic patients. Of great concern is the fact that COVID-19 mortality in 2023 in the UK is still at least 4 times higher than the global value caused by seasonal flu.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"92 20","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach to fake news classification using LSTM-based deep learning models 利用基于 LSTM 的深度学习模型进行假新闻分类的新方法
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-08 DOI: 10.3389/fdata.2023.1320800
Halyna Padalko, Vasyl Chomko, D. Chumachenko
The rapid dissemination of information has been accompanied by the proliferation of fake news, posing significant challenges in discerning authentic news from fabricated narratives. This study addresses the urgent need for effective fake news detection mechanisms. The spread of fake news on digital platforms has necessitated the development of sophisticated tools for accurate detection and classification. Deep learning models, particularly Bi-LSTM and attention-based Bi-LSTM architectures, have shown promise in tackling this issue. This research utilized Bi-LSTM and attention-based Bi-LSTM models, integrating an attention mechanism to assess the significance of different parts of the input data. The models were trained on an 80% subset of the data and tested on the remaining 20%, employing comprehensive evaluation metrics including Recall, Precision, F1-Score, Accuracy, and Loss. Comparative analysis with existing models revealed the superior efficacy of the proposed architectures. The attention-based Bi-LSTM model demonstrated remarkable proficiency, outperforming other models in terms of accuracy (97.66%) and other key metrics. The study highlighted the potential of integrating advanced deep learning techniques in fake news detection. The proposed models set new standards in the field, offering effective tools for combating misinformation. Limitations such as data dependency, potential for overfitting, and language and context specificity were acknowledged. The research underscores the importance of leveraging cutting-edge deep learning methodologies, particularly attention mechanisms, in fake news identification. The innovative models presented pave the way for more robust solutions to counter misinformation, thereby preserving the veracity of digital information. Future research should focus on enhancing data diversity, model efficiency, and applicability across various languages and contexts.
信息的快速传播伴随着假新闻的泛滥,给辨别真假新闻带来了巨大挑战。本研究探讨了对有效假新闻检测机制的迫切需求。假新闻在数字平台上的传播要求开发复杂的工具来进行准确的检测和分类。深度学习模型,尤其是 Bi-LSTM 和基于注意力的 Bi-LSTM 架构,在解决这一问题方面已显示出前景。本研究利用 Bi-LSTM 和基于注意力的 Bi-LSTM 模型,整合了注意力机制,以评估输入数据不同部分的重要性。这些模型在 80% 的数据子集上进行了训练,并在其余 20% 的数据上进行了测试,采用的综合评估指标包括 Recall、Precision、F1-Score、Accuracy 和 Loss。与现有模型的对比分析表明,所提出的架构具有卓越的功效。基于注意力的 Bi-LSTM 模型表现出了卓越的能力,在准确率(97.66%)和其他关键指标方面都优于其他模型。这项研究凸显了将先进的深度学习技术整合到假新闻检测中的潜力。所提出的模型为该领域设定了新标准,为打击虚假信息提供了有效工具。数据依赖性、过拟合的可能性以及语言和语境的特殊性等局限性也得到了认可。这项研究强调了在假新闻识别中利用尖端深度学习方法,特别是注意力机制的重要性。所提出的创新模型为更强大的反虚假信息解决方案铺平了道路,从而维护了数字信息的真实性。未来的研究应侧重于提高数据的多样性、模型的效率以及在各种语言和语境中的适用性。
{"title":"A novel approach to fake news classification using LSTM-based deep learning models","authors":"Halyna Padalko, Vasyl Chomko, D. Chumachenko","doi":"10.3389/fdata.2023.1320800","DOIUrl":"https://doi.org/10.3389/fdata.2023.1320800","url":null,"abstract":"The rapid dissemination of information has been accompanied by the proliferation of fake news, posing significant challenges in discerning authentic news from fabricated narratives. This study addresses the urgent need for effective fake news detection mechanisms. The spread of fake news on digital platforms has necessitated the development of sophisticated tools for accurate detection and classification. Deep learning models, particularly Bi-LSTM and attention-based Bi-LSTM architectures, have shown promise in tackling this issue. This research utilized Bi-LSTM and attention-based Bi-LSTM models, integrating an attention mechanism to assess the significance of different parts of the input data. The models were trained on an 80% subset of the data and tested on the remaining 20%, employing comprehensive evaluation metrics including Recall, Precision, F1-Score, Accuracy, and Loss. Comparative analysis with existing models revealed the superior efficacy of the proposed architectures. The attention-based Bi-LSTM model demonstrated remarkable proficiency, outperforming other models in terms of accuracy (97.66%) and other key metrics. The study highlighted the potential of integrating advanced deep learning techniques in fake news detection. The proposed models set new standards in the field, offering effective tools for combating misinformation. Limitations such as data dependency, potential for overfitting, and language and context specificity were acknowledged. The research underscores the importance of leveraging cutting-edge deep learning methodologies, particularly attention mechanisms, in fake news identification. The innovative models presented pave the way for more robust solutions to counter misinformation, thereby preserving the veracity of digital information. Future research should focus on enhancing data diversity, model efficiency, and applicability across various languages and contexts.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 3","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139446656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CTAB-GAN+: enhancing tabular data synthesis. CTAB-GAN+:增强表格数据合成。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-08 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1296508
Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen

The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.

由于隐私和法律方面的原因,无法获得原始数据,而合成数据作为真实数据的一种增强工具,其使用势头日益强劲。生成对抗网络(GANs)是生成模型的典范,最初用于图像,后来用于表格数据,它为许多最先进的合成器做出了贡献。随着 GANs 的改进,合成数据与真实数据越来越相似,从而有可能泄露隐私。差分隐私(DP)在理论上保证了隐私不会丢失,但却降低了数据的实用性。如何实现最佳权衡仍是一个具有挑战性的研究问题。在本研究中,我们提出了 CTAB-GAN+ 一种新型条件表式 GAN。CTAB-GAN+ 通过以下方式改进了最先进的技术:(i) 在条件 GAN 中添加下游损失,以在分类和回归领域获得更高的合成数据效用;(ii) 使用带有梯度惩罚的 Wasserstein 损失,以获得更好的训练收敛性;(iii) 引入新型编码器,以混合连续分类变量和具有不平衡或倾斜数据的变量为目标;(iv) 使用 DP 随机梯度下降法进行训练,以提供严格的隐私保证。我们对 CTAB-GAN+ 的统计相似性和机器学习效用进行了广泛评估,并与最先进的表格型 GAN 进行了比较。结果表明,在给定隐私预算的情况下,CTAB-GAN+ 在多个数据集和学习任务中合成的隐私保护数据的机器学习效用(即 F1 分数)至少高出 21.9%。
{"title":"CTAB-GAN+: enhancing tabular data synthesis.","authors":"Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen","doi":"10.3389/fdata.2023.1296508","DOIUrl":"https://doi.org/10.3389/fdata.2023.1296508","url":null,"abstract":"<p><p>The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1296508"},"PeriodicalIF":3.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10801038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139520685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybridization of long short-term memory neural network in fractional time series modeling of inflation 在通货膨胀的分数时间序列建模中混合使用长短期记忆神经网络
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-04 DOI: 10.3389/fdata.2023.1282541
Erman Arif, Elin Herlinawati, D. Devianto, Mutia Yollanda, Dony Permana
Inflation is capable of significantly impacting monetary policy, thereby emphasizing the need for accurate forecasts to guide decisions aimed at stabilizing inflation rates. Given the significant relationship between inflation and monetary, it becomes feasible to detect long-memory patterns within the data. To capture these long-memory patterns, Autoregressive Fractionally Moving Average (ARFIMA) was developed as a valuable tool in data mining. Due to the challenges posed in residual assumptions, time series model has to be developed to address heteroscedasticity. Consequently, the implementation of a suitable model was imperative to rectify this effect within the residual ARFIMA. In this context, a novel hybrid model was proposed, with Generalized Autoregressive Conditional Heteroscedasticity (GARCH) being replaced by Long Short-Term Memory (LSTM) neural network. The network was used as iterative model to address this issue and achieve optimal parameters. Through a sensitivity analysis using mean absolute percentage error (MAPE), mean squared error (MSE), and mean absolute error (MAE), the performance of ARFIMA, ARFIMA-GARCH, and ARFIMA-LSTM models was assessed. The results showed that ARFIMA-LSTM excelled in simulating the inflation rate. This provided further evidence that inflation data showed characteristics of long memory, and the accuracy of the model was improved by integrating LSTM neural network.
通货膨胀能够对货币政策产生重大影响,因此强调需要准确的预测来指导旨在稳定通货膨胀率的决策。鉴于通货膨胀与货币之间的重要关系,在数据中检测长期记忆模式变得可行。为了捕捉这些长记忆模式,自回归分位移平均法(ARFIMA)作为数据挖掘的重要工具应运而生。由于残差假设带来的挑战,必须开发时间序列模型来解决异方差问题。因此,必须实施一个合适的模型来纠正残差 ARFIMA 中的这种效应。在这种情况下,提出了一种新的混合模型,用长短期记忆(LSTM)神经网络取代广义自回归条件异方差(GARCH)。该网络被用作迭代模型来解决这一问题,并获得最佳参数。通过使用平均绝对百分比误差 (MAPE)、平均平方误差 (MSE) 和平均绝对误差 (MAE) 进行敏感性分析,评估了 ARFIMA、ARFIMA-GARCH 和 ARFIMA-LSTM 模型的性能。结果表明,ARFIMA-LSTM 在模拟通货膨胀率方面表现出色。这进一步证明了通货膨胀数据具有长记忆的特点,而通过整合 LSTM 神经网络,模型的准确性得到了提高。
{"title":"Hybridization of long short-term memory neural network in fractional time series modeling of inflation","authors":"Erman Arif, Elin Herlinawati, D. Devianto, Mutia Yollanda, Dony Permana","doi":"10.3389/fdata.2023.1282541","DOIUrl":"https://doi.org/10.3389/fdata.2023.1282541","url":null,"abstract":"Inflation is capable of significantly impacting monetary policy, thereby emphasizing the need for accurate forecasts to guide decisions aimed at stabilizing inflation rates. Given the significant relationship between inflation and monetary, it becomes feasible to detect long-memory patterns within the data. To capture these long-memory patterns, Autoregressive Fractionally Moving Average (ARFIMA) was developed as a valuable tool in data mining. Due to the challenges posed in residual assumptions, time series model has to be developed to address heteroscedasticity. Consequently, the implementation of a suitable model was imperative to rectify this effect within the residual ARFIMA. In this context, a novel hybrid model was proposed, with Generalized Autoregressive Conditional Heteroscedasticity (GARCH) being replaced by Long Short-Term Memory (LSTM) neural network. The network was used as iterative model to address this issue and achieve optimal parameters. Through a sensitivity analysis using mean absolute percentage error (MAPE), mean squared error (MSE), and mean absolute error (MAE), the performance of ARFIMA, ARFIMA-GARCH, and ARFIMA-LSTM models was assessed. The results showed that ARFIMA-LSTM excelled in simulating the inflation rate. This provided further evidence that inflation data showed characteristics of long memory, and the accuracy of the model was improved by integrating LSTM neural network.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"3 3","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139384694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1