Pub Date : 2024-11-05eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1448481
Pouya Ataei
Introduction: The ubiquity of digital devices, the infrastructure of today, and the ever-increasing proliferation of digital products have dawned a new era, the era of big data (BD). This era began when the volume, variety, and velocity of data overwhelmed traditional systems that used to analyze and store that data. This precipitated a new class of software systems, namely, BD systems. Whereas BD systems provide a competitive advantage to businesses, many have failed to harness the power of them. It has been estimated that only 20% of companies have successfully implemented a BD project.
Methods: This study aims to facilitate BD system development by introducing Cybermycelium, a domain-driven decentralized BD reference architecture (RA). The artifact was developed following the guidelines of empirically grounded RAs and evaluated through implementation in a real-world scenario using the Architecture Tradeoff Analysis Method (ATAM).
Results: The evaluation revealed that Cybermycelium successfully addressed key architectural qualities: performance (achieving <1,000 ms response times), availability (through event brokers and circuit breaking), and modifiability (enabling rapid service deployment and configuration). The prototype demonstrated effective handling of data processing, scalability challenges, and domain-specific requirements in a large-scale international company setting.
Discussion: The results highlight important architectural trade-offs between event backbone implementation and service mesh design. While the domain-driven distributed approach improved scalability and maintainability compared to traditional monolithic architectures, it requires significant technical expertise for implementation. This contribution advances the field by providing a validated reference architecture that addresses the challenges of adopting BD in modern enterprises.
{"title":"Cybermycelium: a reference architecture for domain-driven distributed big data systems.","authors":"Pouya Ataei","doi":"10.3389/fdata.2024.1448481","DOIUrl":"https://doi.org/10.3389/fdata.2024.1448481","url":null,"abstract":"<p><strong>Introduction: </strong>The ubiquity of digital devices, the infrastructure of today, and the ever-increasing proliferation of digital products have dawned a new era, the era of big data (BD). This era began when the volume, variety, and velocity of data overwhelmed traditional systems that used to analyze and store that data. This precipitated a new class of software systems, namely, BD systems. Whereas BD systems provide a competitive advantage to businesses, many have failed to harness the power of them. It has been estimated that only 20% of companies have successfully implemented a BD project.</p><p><strong>Methods: </strong>This study aims to facilitate BD system development by introducing Cybermycelium, a domain-driven decentralized BD reference architecture (RA). The artifact was developed following the guidelines of empirically grounded RAs and evaluated through implementation in a real-world scenario using the Architecture Tradeoff Analysis Method (ATAM).</p><p><strong>Results: </strong>The evaluation revealed that Cybermycelium successfully addressed key architectural qualities: performance (achieving <1,000 ms response times), availability (through event brokers and circuit breaking), and modifiability (enabling rapid service deployment and configuration). The prototype demonstrated effective handling of data processing, scalability challenges, and domain-specific requirements in a large-scale international company setting.</p><p><strong>Discussion: </strong>The results highlight important architectural trade-offs between event backbone implementation and service mesh design. While the domain-driven distributed approach improved scalability and maintainability compared to traditional monolithic architectures, it requires significant technical expertise for implementation. This contribution advances the field by providing a validated reference architecture that addresses the challenges of adopting BD in modern enterprises.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1448481"},"PeriodicalIF":2.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573557/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142677536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1452129
Christoph Deppe, Gary S Schaal
This study evaluates NATO ACT's cognitive warfare concept from a political science perspective, exploring its utility beyond military applications. Despite its growing presence in scholarly discourse, the concept's interdisciplinary nature has hindered a unified definition. By analyzing NATO's framework, developed with input from diverse disciplines and both military and civilian researchers, this paper seeks to assess its applicability to political science. It aims to bridge military and civilian research divides and refine NATO's cognitive warfare approach, offering significant implications for enhancing political science research and fostering integrated scholarly collaboration.
{"title":"Cognitive warfare: a conceptual analysis of the NATO ACT cognitive warfare exploratory concept.","authors":"Christoph Deppe, Gary S Schaal","doi":"10.3389/fdata.2024.1452129","DOIUrl":"https://doi.org/10.3389/fdata.2024.1452129","url":null,"abstract":"<p><p>This study evaluates NATO ACT's cognitive warfare concept from a political science perspective, exploring its utility beyond military applications. Despite its growing presence in scholarly discourse, the concept's interdisciplinary nature has hindered a unified definition. By analyzing NATO's framework, developed with input from diverse disciplines and both military and civilian researchers, this paper seeks to assess its applicability to political science. It aims to bridge military and civilian research divides and refine NATO's cognitive warfare approach, offering significant implications for enhancing political science research and fostering integrated scholarly collaboration.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1452129"},"PeriodicalIF":2.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11565700/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1422546
Li Han, Shuaijie Zhu, Haoyang Zhao, Yanqiang He
The widespread use of mobile devices and compute-intensive applications has increased the connection of smart devices to networks, generating significant data. Real-time execution faces challenges due to limited resources and demanding applications in edge computing environments. To address these challenges, an enhanced whale optimization algorithm (EWOA) was proposed for task scheduling. A multi-objective model based on CPU, memory, time, and resource utilization was developed. The model was transformed into a whale optimization problem, incorporating chaotic mapping to initialize populations and prevent premature convergence. A nonlinear convergence factor was introduced to balance local and global search. The algorithm's performance was evaluated in an experimental edge computing environment and compared with ODTS, WOA, HWACO, and CATSA algorithms. Experimental results demonstrated that EWOA reduced costs by 29.22%, decreased completion time by 17.04%, and improved node resource utilization by 9.5%. While EWOA offers significant advantages, limitations include the lack of consideration for potential network delays and user mobility. Future research will focus on fault-tolerant scheduling techniques to address dynamic user needs and improve service robustness and quality.
{"title":"An enhanced whale optimization algorithm for task scheduling in edge computing environments.","authors":"Li Han, Shuaijie Zhu, Haoyang Zhao, Yanqiang He","doi":"10.3389/fdata.2024.1422546","DOIUrl":"10.3389/fdata.2024.1422546","url":null,"abstract":"<p><p>The widespread use of mobile devices and compute-intensive applications has increased the connection of smart devices to networks, generating significant data. Real-time execution faces challenges due to limited resources and demanding applications in edge computing environments. To address these challenges, an enhanced whale optimization algorithm (EWOA) was proposed for task scheduling. A multi-objective model based on CPU, memory, time, and resource utilization was developed. The model was transformed into a whale optimization problem, incorporating chaotic mapping to initialize populations and prevent premature convergence. A nonlinear convergence factor was introduced to balance local and global search. The algorithm's performance was evaluated in an experimental edge computing environment and compared with ODTS, WOA, HWACO, and CATSA algorithms. Experimental results demonstrated that EWOA reduced costs by 29.22%, decreased completion time by 17.04%, and improved node resource utilization by 9.5%. While EWOA offers significant advantages, limitations include the lack of consideration for potential network delays and user mobility. Future research will focus on fault-tolerant scheduling techniques to address dynamic user needs and improve service robustness and quality.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1422546"},"PeriodicalIF":2.4,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1489306
Yezi Liu, Hanning Chen, Mohsen Imani
Link prediction is a crucial task in network analysis, but it has been shown to be prone to biased predictions, particularly when links are unfairly predicted between nodes from different sensitive groups. In this paper, we study the fair link prediction problem, which aims to ensure that the predicted link probability is independent of the sensitive attributes of the connected nodes. Existing methods typically incorporate debiasing techniques within graph embeddings to mitigate this issue. However, training on large real-world graphs is already challenging, and adding fairness constraints can further complicate the process. To overcome this challenge, we propose FairLink, a method that learns a fairness-enhanced graph to bypass the need for debiasing during the link predictor's training. FairLink maintains link prediction accuracy by ensuring that the enhanced graph follows a training trajectory similar to that of the original input graph. Meanwhile, it enhances fairness by minimizing the absolute difference in link probabilities between node pairs within the same sensitive group and those between node pairs from different sensitive groups. Our extensive experiments on multiple large-scale graphs demonstrate that FairLink not only promotes fairness but also often achieves link prediction accuracy comparable to baseline methods. Most importantly, the enhanced graph exhibits strong generalizability across different GNN architectures. FairLink is highly scalable, making it suitable for deployment in real-world large-scale graphs, where maintaining both fairness and accuracy is critical.
{"title":"Promoting fairness in link prediction with graph enhancement.","authors":"Yezi Liu, Hanning Chen, Mohsen Imani","doi":"10.3389/fdata.2024.1489306","DOIUrl":"https://doi.org/10.3389/fdata.2024.1489306","url":null,"abstract":"<p><p>Link prediction is a crucial task in network analysis, but it has been shown to be prone to biased predictions, particularly when links are unfairly predicted between nodes from different sensitive groups. In this paper, we study the fair link prediction problem, which aims to ensure that the predicted link probability is independent of the sensitive attributes of the connected nodes. Existing methods typically incorporate debiasing techniques within graph embeddings to mitigate this issue. However, training on large real-world graphs is already challenging, and adding fairness constraints can further complicate the process. To overcome this challenge, we propose FairLink, a method that learns a fairness-enhanced graph to bypass the need for debiasing during the link predictor's training. FairLink maintains link prediction accuracy by ensuring that the enhanced graph follows a training trajectory similar to that of the original input graph. Meanwhile, it enhances fairness by minimizing the absolute difference in link probabilities between node pairs within the same sensitive group and those between node pairs from different sensitive groups. Our extensive experiments on multiple large-scale graphs demonstrate that FairLink not only promotes fairness but also often achieves link prediction accuracy comparable to baseline methods. Most importantly, the enhanced graph exhibits strong generalizability across different GNN architectures. FairLink is highly scalable, making it suitable for deployment in real-world large-scale graphs, where maintaining both fairness and accuracy is critical.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1489306"},"PeriodicalIF":2.4,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1485344
Hammad Ather, Sophie Berkman, Giuseppe Cerati, Matti J Kortelainen, Ka Hei Martin Kwok, Steven Lantz, Seyong Lee, Boyana Norris, Michael Reid, Allison Reinsvold Hall, Daniel Riley, Alexei Strelchenko, Cong Wang
Traditionally, high energy physics (HEP) experiments have relied on x86 CPUs for the majority of their significant computing needs. As the field looks ahead to the next generation of experiments such as DUNE and the High-Luminosity LHC, the computing demands are expected to increase dramatically. To cope with this increase, it will be necessary to take advantage of all available computing resources, including GPUs from different vendors. A broad landscape of code portability tools-including compiler pragma-based approaches, abstraction libraries, and other tools-allow the same source code to run efficiently on multiple architectures. In this paper, we use a test code taken from a HEP tracking algorithm to compare the performance and experience of implementing different portability solutions. While in several cases portable implementations perform close to the reference code version, we find that the performance varies significantly depending on the details of the implementation. Achieving optimal performance is not easy, even for relatively simple applications such as the test codes considered in this work. Several factors can affect the performance, such as the choice of the memory layout, the memory pinning strategy, and the compiler used. The compilers and tools are being actively developed, so future developments may be critical for their deployment in HEP experiments.
传统上,高能物理(HEP)实验的大部分重要计算需求都依赖于 x86 CPU。随着该领域对下一代实验(如 DUNE 和高亮度 LHC)的展望,预计计算需求将急剧增加。为了应对这一增长,有必要利用所有可用的计算资源,包括来自不同供应商的 GPU。代码可移植性工具的广泛应用--包括基于编译器语法的方法、抽象库和其他工具--允许相同的源代码在多种体系结构上高效运行。在本文中,我们使用 HEP 跟踪算法的测试代码来比较不同可移植性解决方案的性能和实施经验。虽然在某些情况下,可移植实现的性能接近参考代码版本,但我们发现,性能因实现细节的不同而有很大差异。实现最佳性能并非易事,即使是相对简单的应用,如本研究中考虑的测试代码。有几个因素会影响性能,如内存布局的选择、内存引脚策略和所使用的编译器。编译器和工具正在积极开发中,因此未来的发展可能对其在 HEP 实验中的部署至关重要。
{"title":"Exploring code portability solutions for HEP with a particle tracking test code.","authors":"Hammad Ather, Sophie Berkman, Giuseppe Cerati, Matti J Kortelainen, Ka Hei Martin Kwok, Steven Lantz, Seyong Lee, Boyana Norris, Michael Reid, Allison Reinsvold Hall, Daniel Riley, Alexei Strelchenko, Cong Wang","doi":"10.3389/fdata.2024.1485344","DOIUrl":"10.3389/fdata.2024.1485344","url":null,"abstract":"<p><p>Traditionally, high energy physics (HEP) experiments have relied on x86 CPUs for the majority of their significant computing needs. As the field looks ahead to the next generation of experiments such as DUNE and the High-Luminosity LHC, the computing demands are expected to increase dramatically. To cope with this increase, it will be necessary to take advantage of all available computing resources, including GPUs from different vendors. A broad landscape of code portability tools-including compiler pragma-based approaches, abstraction libraries, and other tools-allow the same source code to run efficiently on multiple architectures. In this paper, we use a test code taken from a HEP tracking algorithm to compare the performance and experience of implementing different portability solutions. While in several cases portable implementations perform close to the reference code version, we find that the performance varies significantly depending on the details of the implementation. Achieving optimal performance is not easy, even for relatively simple applications such as the test codes considered in this work. Several factors can affect the performance, such as the choice of the memory layout, the memory pinning strategy, and the compiler used. The compilers and tools are being actively developed, so future developments may be critical for their deployment in HEP experiments.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1485344"},"PeriodicalIF":2.4,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11537910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1502398
V E Sathishkumar
{"title":"Editorial: Utilizing big data and deep learning to improve healthcare intelligence and biomedical service delivery.","authors":"V E Sathishkumar","doi":"10.3389/fdata.2024.1502398","DOIUrl":"https://doi.org/10.3389/fdata.2024.1502398","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1502398"},"PeriodicalIF":2.4,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1436019
Anagha Joshi
Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.
{"title":"Big data and AI for gender equality in health: bias is a big challenge.","authors":"Anagha Joshi","doi":"10.3389/fdata.2024.1436019","DOIUrl":"10.3389/fdata.2024.1436019","url":null,"abstract":"<p><p>Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1436019"},"PeriodicalIF":2.4,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11521869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1435510
Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield
Background: Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.
Methods: The "INSPIRE" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.
Results: Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.
Conclusion: The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.
{"title":"Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub.","authors":"Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield","doi":"10.3389/fdata.2024.1435510","DOIUrl":"10.3389/fdata.2024.1435510","url":null,"abstract":"<p><strong>Background: </strong>Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.</p><p><strong>Methods: </strong>The \"INSPIRE\" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.</p><p><strong>Results: </strong>Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.</p><p><strong>Conclusion: </strong>The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1435510"},"PeriodicalIF":2.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1402745
Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani
Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.
{"title":"AI security and cyber risk in IoT systems.","authors":"Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani","doi":"10.3389/fdata.2024.1402745","DOIUrl":"https://doi.org/10.3389/fdata.2024.1402745","url":null,"abstract":"<p><p>Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1402745"},"PeriodicalIF":2.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-07eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1463543
Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao
An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.
{"title":"Ontology extension by online clustering with large language model agents.","authors":"Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao","doi":"10.3389/fdata.2024.1463543","DOIUrl":"10.3389/fdata.2024.1463543","url":null,"abstract":"<p><p>An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1463543"},"PeriodicalIF":2.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491333/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}