首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
Cybermycelium: a reference architecture for domain-driven distributed big data systems. Cybermycelium:领域驱动的分布式大数据系统参考架构。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-05 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1448481
Pouya Ataei

Introduction: The ubiquity of digital devices, the infrastructure of today, and the ever-increasing proliferation of digital products have dawned a new era, the era of big data (BD). This era began when the volume, variety, and velocity of data overwhelmed traditional systems that used to analyze and store that data. This precipitated a new class of software systems, namely, BD systems. Whereas BD systems provide a competitive advantage to businesses, many have failed to harness the power of them. It has been estimated that only 20% of companies have successfully implemented a BD project.

Methods: This study aims to facilitate BD system development by introducing Cybermycelium, a domain-driven decentralized BD reference architecture (RA). The artifact was developed following the guidelines of empirically grounded RAs and evaluated through implementation in a real-world scenario using the Architecture Tradeoff Analysis Method (ATAM).

Results: The evaluation revealed that Cybermycelium successfully addressed key architectural qualities: performance (achieving <1,000 ms response times), availability (through event brokers and circuit breaking), and modifiability (enabling rapid service deployment and configuration). The prototype demonstrated effective handling of data processing, scalability challenges, and domain-specific requirements in a large-scale international company setting.

Discussion: The results highlight important architectural trade-offs between event backbone implementation and service mesh design. While the domain-driven distributed approach improved scalability and maintainability compared to traditional monolithic architectures, it requires significant technical expertise for implementation. This contribution advances the field by providing a validated reference architecture that addresses the challenges of adopting BD in modern enterprises.

引言无处不在的数字设备、当今的基础设施以及日益激增的数字产品,开启了一个新的时代--大数据(BD)时代。当数据的数量、种类和速度压倒了用于分析和存储数据的传统系统时,这个时代就开始了。这催生了一类新的软件系统,即 BD 系统。虽然 BD 系统为企业提供了竞争优势,但许多企业却未能利用好它的力量。据估计,只有 20% 的公司成功实施了 BD 项目:本研究旨在通过引入领域驱动的分散式 BD 参考架构(RA)Cybermycelium,促进 BD 系统的开发。该工具是根据经验基础参考架构的指导方针开发的,并通过使用架构权衡分析方法(ATAM)在真实世界场景中的实施进行了评估:评估结果表明,Cybermycelium 成功地解决了关键的架构质量问题:性能(实现讨论)、可扩展性、可扩展性和可扩展性:结果凸显了事件骨干网实施与服务网格设计之间的重要架构权衡。虽然与传统的单体架构相比,领域驱动的分布式方法提高了可扩展性和可维护性,但它的实施需要大量的专业技术知识。本文提供了一个经过验证的参考架构,解决了现代企业采用 BD 所面临的挑战,从而推动了该领域的发展。
{"title":"Cybermycelium: a reference architecture for domain-driven distributed big data systems.","authors":"Pouya Ataei","doi":"10.3389/fdata.2024.1448481","DOIUrl":"https://doi.org/10.3389/fdata.2024.1448481","url":null,"abstract":"<p><strong>Introduction: </strong>The ubiquity of digital devices, the infrastructure of today, and the ever-increasing proliferation of digital products have dawned a new era, the era of big data (BD). This era began when the volume, variety, and velocity of data overwhelmed traditional systems that used to analyze and store that data. This precipitated a new class of software systems, namely, BD systems. Whereas BD systems provide a competitive advantage to businesses, many have failed to harness the power of them. It has been estimated that only 20% of companies have successfully implemented a BD project.</p><p><strong>Methods: </strong>This study aims to facilitate BD system development by introducing Cybermycelium, a domain-driven decentralized BD reference architecture (RA). The artifact was developed following the guidelines of empirically grounded RAs and evaluated through implementation in a real-world scenario using the Architecture Tradeoff Analysis Method (ATAM).</p><p><strong>Results: </strong>The evaluation revealed that Cybermycelium successfully addressed key architectural qualities: performance (achieving <1,000 ms response times), availability (through event brokers and circuit breaking), and modifiability (enabling rapid service deployment and configuration). The prototype demonstrated effective handling of data processing, scalability challenges, and domain-specific requirements in a large-scale international company setting.</p><p><strong>Discussion: </strong>The results highlight important architectural trade-offs between event backbone implementation and service mesh design. While the domain-driven distributed approach improved scalability and maintainability compared to traditional monolithic architectures, it requires significant technical expertise for implementation. This contribution advances the field by providing a validated reference architecture that addresses the challenges of adopting BD in modern enterprises.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1448481"},"PeriodicalIF":2.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573557/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142677536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cognitive warfare: a conceptual analysis of the NATO ACT cognitive warfare exploratory concept. 认知战:北约 ACT 认知战探索概念分析。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1452129
Christoph Deppe, Gary S Schaal

This study evaluates NATO ACT's cognitive warfare concept from a political science perspective, exploring its utility beyond military applications. Despite its growing presence in scholarly discourse, the concept's interdisciplinary nature has hindered a unified definition. By analyzing NATO's framework, developed with input from diverse disciplines and both military and civilian researchers, this paper seeks to assess its applicability to political science. It aims to bridge military and civilian research divides and refine NATO's cognitive warfare approach, offering significant implications for enhancing political science research and fostering integrated scholarly collaboration.

本研究从政治学的角度评估了北约 ACT 的认知战概念,探讨了其在军事应用之外的效用。尽管这一概念在学术界的影响越来越大,但其跨学科的性质阻碍了统一定义的形成。通过分析北约的框架,本文试图评估其对政治科学的适用性。本文旨在弥合军事和民事研究的分歧,完善北约的认知战争方法,为加强政治科学研究和促进综合学术合作提供重要启示。
{"title":"Cognitive warfare: a conceptual analysis of the NATO ACT cognitive warfare exploratory concept.","authors":"Christoph Deppe, Gary S Schaal","doi":"10.3389/fdata.2024.1452129","DOIUrl":"https://doi.org/10.3389/fdata.2024.1452129","url":null,"abstract":"<p><p>This study evaluates NATO ACT's cognitive warfare concept from a political science perspective, exploring its utility beyond military applications. Despite its growing presence in scholarly discourse, the concept's interdisciplinary nature has hindered a unified definition. By analyzing NATO's framework, developed with input from diverse disciplines and both military and civilian researchers, this paper seeks to assess its applicability to political science. It aims to bridge military and civilian research divides and refine NATO's cognitive warfare approach, offering significant implications for enhancing political science research and fostering integrated scholarly collaboration.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1452129"},"PeriodicalIF":2.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11565700/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An enhanced whale optimization algorithm for task scheduling in edge computing environments. 用于边缘计算环境任务调度的增强型鲸鱼优化算法。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-30 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1422546
Li Han, Shuaijie Zhu, Haoyang Zhao, Yanqiang He

The widespread use of mobile devices and compute-intensive applications has increased the connection of smart devices to networks, generating significant data. Real-time execution faces challenges due to limited resources and demanding applications in edge computing environments. To address these challenges, an enhanced whale optimization algorithm (EWOA) was proposed for task scheduling. A multi-objective model based on CPU, memory, time, and resource utilization was developed. The model was transformed into a whale optimization problem, incorporating chaotic mapping to initialize populations and prevent premature convergence. A nonlinear convergence factor was introduced to balance local and global search. The algorithm's performance was evaluated in an experimental edge computing environment and compared with ODTS, WOA, HWACO, and CATSA algorithms. Experimental results demonstrated that EWOA reduced costs by 29.22%, decreased completion time by 17.04%, and improved node resource utilization by 9.5%. While EWOA offers significant advantages, limitations include the lack of consideration for potential network delays and user mobility. Future research will focus on fault-tolerant scheduling techniques to address dynamic user needs and improve service robustness and quality.

移动设备和计算密集型应用的广泛使用增加了智能设备与网络的连接,产生了大量数据。在边缘计算环境中,由于资源有限和应用要求苛刻,实时执行面临挑战。为了应对这些挑战,人们提出了一种用于任务调度的增强型鲸鱼优化算法(EWOA)。我们开发了一个基于 CPU、内存、时间和资源利用率的多目标模型。该模型被转化为鲸鱼优化问题,并结合混沌映射来初始化种群,防止过早收敛。还引入了一个非线性收敛因子,以平衡局部搜索和全局搜索。该算法的性能在实验性边缘计算环境中进行了评估,并与 ODTS、WOA、HWACO 和 CATSA 算法进行了比较。实验结果表明,EWOA 降低了 29.22% 的成本,减少了 17.04% 的完成时间,提高了 9.5% 的节点资源利用率。虽然 EWOA 具有显著的优势,但其局限性包括没有考虑潜在的网络延迟和用户移动性。未来的研究将侧重于容错调度技术,以满足用户的动态需求,提高服务的稳健性和质量。
{"title":"An enhanced whale optimization algorithm for task scheduling in edge computing environments.","authors":"Li Han, Shuaijie Zhu, Haoyang Zhao, Yanqiang He","doi":"10.3389/fdata.2024.1422546","DOIUrl":"10.3389/fdata.2024.1422546","url":null,"abstract":"<p><p>The widespread use of mobile devices and compute-intensive applications has increased the connection of smart devices to networks, generating significant data. Real-time execution faces challenges due to limited resources and demanding applications in edge computing environments. To address these challenges, an enhanced whale optimization algorithm (EWOA) was proposed for task scheduling. A multi-objective model based on CPU, memory, time, and resource utilization was developed. The model was transformed into a whale optimization problem, incorporating chaotic mapping to initialize populations and prevent premature convergence. A nonlinear convergence factor was introduced to balance local and global search. The algorithm's performance was evaluated in an experimental edge computing environment and compared with ODTS, WOA, HWACO, and CATSA algorithms. Experimental results demonstrated that EWOA reduced costs by 29.22%, decreased completion time by 17.04%, and improved node resource utilization by 9.5%. While EWOA offers significant advantages, limitations include the lack of consideration for potential network delays and user mobility. Future research will focus on fault-tolerant scheduling techniques to address dynamic user needs and improve service robustness and quality.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1422546"},"PeriodicalIF":2.4,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Promoting fairness in link prediction with graph enhancement. 通过图增强促进链接预测的公平性
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-24 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1489306
Yezi Liu, Hanning Chen, Mohsen Imani

Link prediction is a crucial task in network analysis, but it has been shown to be prone to biased predictions, particularly when links are unfairly predicted between nodes from different sensitive groups. In this paper, we study the fair link prediction problem, which aims to ensure that the predicted link probability is independent of the sensitive attributes of the connected nodes. Existing methods typically incorporate debiasing techniques within graph embeddings to mitigate this issue. However, training on large real-world graphs is already challenging, and adding fairness constraints can further complicate the process. To overcome this challenge, we propose FairLink, a method that learns a fairness-enhanced graph to bypass the need for debiasing during the link predictor's training. FairLink maintains link prediction accuracy by ensuring that the enhanced graph follows a training trajectory similar to that of the original input graph. Meanwhile, it enhances fairness by minimizing the absolute difference in link probabilities between node pairs within the same sensitive group and those between node pairs from different sensitive groups. Our extensive experiments on multiple large-scale graphs demonstrate that FairLink not only promotes fairness but also often achieves link prediction accuracy comparable to baseline methods. Most importantly, the enhanced graph exhibits strong generalizability across different GNN architectures. FairLink is highly scalable, making it suitable for deployment in real-world large-scale graphs, where maintaining both fairness and accuracy is critical.

链接预测是网络分析中的一项重要任务,但事实证明它很容易出现预测偏差,尤其是当来自不同敏感组的节点之间的链接被不公平地预测时。本文研究了公平链接预测问题,旨在确保预测的链接概率与所连接节点的敏感属性无关。现有方法通常会在图嵌入中采用去杂技术来缓解这一问题。然而,在现实世界的大型图上进行训练本来就具有挑战性,如果再加上公平性约束,就会使这一过程更加复杂。为了克服这一挑战,我们提出了 FairLink,这是一种学习公平性增强图的方法,可以在链接预测器的训练过程中绕过去毛刺的需要。FairLink 通过确保增强图遵循与原始输入图相似的训练轨迹来保持链接预测的准确性。同时,它通过最小化同一敏感组内节点对之间以及不同敏感组内节点对之间链接概率的绝对差异来提高公平性。我们在多个大规模图上进行的大量实验表明,FairLink 不仅提高了公平性,而且通常还能达到与基准方法相当的链接预测精度。最重要的是,增强型图在不同的 GNN 架构中表现出很强的通用性。FairLink 具有很强的可扩展性,因此适合部署在现实世界的大规模图中,在这种图中,保持公平性和准确性至关重要。
{"title":"Promoting fairness in link prediction with graph enhancement.","authors":"Yezi Liu, Hanning Chen, Mohsen Imani","doi":"10.3389/fdata.2024.1489306","DOIUrl":"https://doi.org/10.3389/fdata.2024.1489306","url":null,"abstract":"<p><p>Link prediction is a crucial task in network analysis, but it has been shown to be prone to biased predictions, particularly when links are unfairly predicted between nodes from different sensitive groups. In this paper, we study the fair link prediction problem, which aims to ensure that the predicted link probability is independent of the sensitive attributes of the connected nodes. Existing methods typically incorporate debiasing techniques within graph embeddings to mitigate this issue. However, training on large real-world graphs is already challenging, and adding fairness constraints can further complicate the process. To overcome this challenge, we propose FairLink, a method that learns a fairness-enhanced graph to bypass the need for debiasing during the link predictor's training. FairLink maintains link prediction accuracy by ensuring that the enhanced graph follows a training trajectory similar to that of the original input graph. Meanwhile, it enhances fairness by minimizing the absolute difference in link probabilities between node pairs within the same sensitive group and those between node pairs from different sensitive groups. Our extensive experiments on multiple large-scale graphs demonstrate that FairLink not only promotes fairness but also often achieves link prediction accuracy comparable to baseline methods. Most importantly, the enhanced graph exhibits strong generalizability across different GNN architectures. FairLink is highly scalable, making it suitable for deployment in real-world large-scale graphs, where maintaining both fairness and accuracy is critical.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1489306"},"PeriodicalIF":2.4,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring code portability solutions for HEP with a particle tracking test code. 利用粒子跟踪测试代码探索 HEP 代码可移植性解决方案。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-23 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1485344
Hammad Ather, Sophie Berkman, Giuseppe Cerati, Matti J Kortelainen, Ka Hei Martin Kwok, Steven Lantz, Seyong Lee, Boyana Norris, Michael Reid, Allison Reinsvold Hall, Daniel Riley, Alexei Strelchenko, Cong Wang

Traditionally, high energy physics (HEP) experiments have relied on x86 CPUs for the majority of their significant computing needs. As the field looks ahead to the next generation of experiments such as DUNE and the High-Luminosity LHC, the computing demands are expected to increase dramatically. To cope with this increase, it will be necessary to take advantage of all available computing resources, including GPUs from different vendors. A broad landscape of code portability tools-including compiler pragma-based approaches, abstraction libraries, and other tools-allow the same source code to run efficiently on multiple architectures. In this paper, we use a test code taken from a HEP tracking algorithm to compare the performance and experience of implementing different portability solutions. While in several cases portable implementations perform close to the reference code version, we find that the performance varies significantly depending on the details of the implementation. Achieving optimal performance is not easy, even for relatively simple applications such as the test codes considered in this work. Several factors can affect the performance, such as the choice of the memory layout, the memory pinning strategy, and the compiler used. The compilers and tools are being actively developed, so future developments may be critical for their deployment in HEP experiments.

传统上,高能物理(HEP)实验的大部分重要计算需求都依赖于 x86 CPU。随着该领域对下一代实验(如 DUNE 和高亮度 LHC)的展望,预计计算需求将急剧增加。为了应对这一增长,有必要利用所有可用的计算资源,包括来自不同供应商的 GPU。代码可移植性工具的广泛应用--包括基于编译器语法的方法、抽象库和其他工具--允许相同的源代码在多种体系结构上高效运行。在本文中,我们使用 HEP 跟踪算法的测试代码来比较不同可移植性解决方案的性能和实施经验。虽然在某些情况下,可移植实现的性能接近参考代码版本,但我们发现,性能因实现细节的不同而有很大差异。实现最佳性能并非易事,即使是相对简单的应用,如本研究中考虑的测试代码。有几个因素会影响性能,如内存布局的选择、内存引脚策略和所使用的编译器。编译器和工具正在积极开发中,因此未来的发展可能对其在 HEP 实验中的部署至关重要。
{"title":"Exploring code portability solutions for HEP with a particle tracking test code.","authors":"Hammad Ather, Sophie Berkman, Giuseppe Cerati, Matti J Kortelainen, Ka Hei Martin Kwok, Steven Lantz, Seyong Lee, Boyana Norris, Michael Reid, Allison Reinsvold Hall, Daniel Riley, Alexei Strelchenko, Cong Wang","doi":"10.3389/fdata.2024.1485344","DOIUrl":"10.3389/fdata.2024.1485344","url":null,"abstract":"<p><p>Traditionally, high energy physics (HEP) experiments have relied on x86 CPUs for the majority of their significant computing needs. As the field looks ahead to the next generation of experiments such as DUNE and the High-Luminosity LHC, the computing demands are expected to increase dramatically. To cope with this increase, it will be necessary to take advantage of all available computing resources, including GPUs from different vendors. A broad landscape of code portability tools-including compiler pragma-based approaches, abstraction libraries, and other tools-allow the same source code to run efficiently on multiple architectures. In this paper, we use a test code taken from a HEP tracking algorithm to compare the performance and experience of implementing different portability solutions. While in several cases portable implementations perform close to the reference code version, we find that the performance varies significantly depending on the details of the implementation. Achieving optimal performance is not easy, even for relatively simple applications such as the test codes considered in this work. Several factors can affect the performance, such as the choice of the memory layout, the memory pinning strategy, and the compiler used. The compilers and tools are being actively developed, so future developments may be critical for their deployment in HEP experiments.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1485344"},"PeriodicalIF":2.4,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11537910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Utilizing big data and deep learning to improve healthcare intelligence and biomedical service delivery. 社论:利用大数据和深度学习改善医疗智能和生物医学服务的提供。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-22 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1502398
V E Sathishkumar
{"title":"Editorial: Utilizing big data and deep learning to improve healthcare intelligence and biomedical service delivery.","authors":"V E Sathishkumar","doi":"10.3389/fdata.2024.1502398","DOIUrl":"https://doi.org/10.3389/fdata.2024.1502398","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1502398"},"PeriodicalIF":2.4,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big data and AI for gender equality in health: bias is a big challenge. 大数据和人工智能促进卫生领域的性别平等:偏见是一大挑战。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-16 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1436019
Anagha Joshi

Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.

人工智能和机器学习是快速发展的领域,通过提高诊断准确性、个性化治疗方案和建立疾病进展预测模型以实现预防保健,它们有可能改变妇女的健康状况。本文讨论了三类妇女健康问题,在这些问题中,机器学习可以促进可获得、可负担、个性化和基于证据的医疗保健。在这一视角中,首先阐述了大数据和机器学习应用在妇女健康方面的前景。尽管有这些前景,但由于许多问题,包括伦理问题、患者隐私、知情同意、算法偏差、数据质量和可用性以及医疗保健专业人员的教育和培训,机器学习应用并未广泛应用于临床医疗。在医疗领域,对女性的歧视由来已久。机器学习隐含着数据中的偏见。因此,尽管机器学习有可能改善女性健康的某些方面,但它也可能强化性和性别偏见。因此,盲目整合先进的机器学习工具,而不正确理解和纠正社会文化中带有性和性别偏见的做法和政策,不太可能实现健康领域的性和性别平等。
{"title":"Big data and AI for gender equality in health: bias is a big challenge.","authors":"Anagha Joshi","doi":"10.3389/fdata.2024.1436019","DOIUrl":"10.3389/fdata.2024.1436019","url":null,"abstract":"<p><p>Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1436019"},"PeriodicalIF":2.4,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11521869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub. 将纵向心理健康数据纳入分期数据库:在 INSPIRE 网络 Datahub 中利用 DDI-lifecycle 和 OMOP 词汇表。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-11 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1435510
Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield

Background: Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.

Methods: The "INSPIRE" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.

Results: Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.

Conclusion: The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.

背景:纵向研究对于了解心理健康疾病随时间的发展至关重要,但将通过不同方法收集的数据结合起来以评估抑郁症、焦虑症和精神病等疾病却面临着巨大的挑战。本研究提出了一种映射技术,可将不同的纵向数据转换为标准化的分期数据库,利用数据文档倡议(DDI)生命周期和观察性医疗结果合作组织(OMOP)通用数据模型(CDM)标准,确保不同数据集之间的一致性和兼容性:方法:"INSPIRE "项目采用雪花模式结构的元数据文件标准,将非洲研究的纵向数据整合到一个分期数据库中。这有助于开发提取、转换和加载(ETL)脚本,将数据整合到 OMOP CDM 中。分期数据库模式旨在捕捉纵向研究的动态特性,包括研究方案的变化和不同数据收集浪潮中不同工具的使用:利用这种映射方法,我们简化了将数据迁移到分期数据库的过程,从而可以将其整合到 OMOP CDM 中。遵守元数据标准可确保数据质量,促进互操作性,并扩大心理健康研究数据共享的机会:分阶段数据库是管理纵向心理健康数据的创新工具,它不仅是简单的数据托管,还是全面的研究描述符。它提供了对每个研究阶段的详细了解,并为将数据标准化并整合到 OMOP CDM 中奠定了数据科学基础。
{"title":"Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub.","authors":"Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield","doi":"10.3389/fdata.2024.1435510","DOIUrl":"10.3389/fdata.2024.1435510","url":null,"abstract":"<p><strong>Background: </strong>Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.</p><p><strong>Methods: </strong>The \"INSPIRE\" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.</p><p><strong>Results: </strong>Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.</p><p><strong>Conclusion: </strong>The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1435510"},"PeriodicalIF":2.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI security and cyber risk in IoT systems. 物联网系统中的人工智能安全和网络风险。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-10 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1402745
Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani

Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.

物联网(IoT)是指各种新技术中使用的低内存连接设备,包括无人机、自主机器和机器人。文章旨在更好地了解低内存设备的网络风险以及物联网风险管理所面临的挑战。文章对当前的风险方法及其对物联网的适用程度进行了批判性反思。我们针对当前数据战略面临的挑战,提出了一个量身定制的依赖性模型,并为网络安全界提出了建议。该模型可用于网络风险估计和评估以及一般风险影响评估。该模型是为新技术(如无人机、机器人)的网络风险保险而开发的。不过,从业人员仍可将其用于估算和评估组织和企业的网络风险。此外,本文还批判性地讨论了为什么风险评估和管理在这一领域至关重要,以及在物联网风险评估和风险管理方面还有哪些开放性问题有待进一步研究。然后,本文介绍了对物联网网络风险的更全面理解。我们解释了业界如何利用新的风险评估和管理方法来应对新出现的物联网网络风险所带来的挑战。我们解释了这些方法如何影响网络风险政策和数据战略。我们还介绍了一种新的网络风险评估方法,该方法通过依赖性建模将物联网风险纳入其中。本文介绍了这种方法非常适合估算物联网风险的原因。
{"title":"AI security and cyber risk in IoT systems.","authors":"Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani","doi":"10.3389/fdata.2024.1402745","DOIUrl":"https://doi.org/10.3389/fdata.2024.1402745","url":null,"abstract":"<p><p>Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1402745"},"PeriodicalIF":2.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ontology extension by online clustering with large language model agents. 利用大型语言模型代理进行在线聚类的本体扩展。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-07 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1463543
Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao

An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.

本体是一种结构化框架,它将某一领域内的实体、概念和关系进行分类,以促进共同理解,在计算语言学和知识表示中非常重要。在本文中,我们提出了一个新颖的框架,以零镜头的方式从流式数据中自动扩展现有的本体。具体来说,零镜头本体扩展框架使用在线和分层聚类将新知识整合到现有本体中,而无需大量注释数据或特定领域的专业知识。该方法以医疗领域为重点,利用大型语言模型(LLM)完成两项关键任务:乳腺癌和膀胱癌幸存者的症状分类和症状分类学。症状分类包括从非结构化的在线患者论坛数据中识别医学症状并对其进行分类,而症状分类则是将这些症状组织并整合到现有的本体论中。在线聚类和分层聚类的结合使用实现了症状的实时、结构化分类和整合。双阶段模型采用了多个 LLM,以确保分类的准确性和新症状的无缝整合,并尽量减少人为监督。论文详细介绍了该框架的开发、实验、定量分析和数据可视化,展示了其在增强医学本体论和推进医疗保健领域基于知识的系统方面的有效性。
{"title":"Ontology extension by online clustering with large language model agents.","authors":"Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao","doi":"10.3389/fdata.2024.1463543","DOIUrl":"10.3389/fdata.2024.1463543","url":null,"abstract":"<p><p>An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1463543"},"PeriodicalIF":2.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491333/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1