首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
Development and application of a machine learning-based predictive model for obstructive sleep apnea screening 开发和应用基于机器学习的阻塞性睡眠呼吸暂停筛查预测模型
IF 3.1 Q2 Computer Science Pub Date : 2024-05-16 DOI: 10.3389/fdata.2024.1353469
Kang Liu, Shi Geng, Ping Shen, Lei Zhao, Peng Zhou, Wen Liu
To develop a robust machine learning prediction model for the automatic screening and diagnosis of obstructive sleep apnea (OSA) using five advanced algorithms, namely Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Support Vector Machine (SVM), Light Gradient Boosting Machine (LightGBM), and Random Forest (RF) to provide substantial support for early clinical diagnosis and intervention.We conducted a retrospective analysis of clinical data from 439 patients who underwent polysomnography at the Affiliated Hospital of Xuzhou Medical University between October 2019 and October 2022. Predictor variables such as demographic information [age, sex, height, weight, body mass index (BMI)], medical history, and Epworth Sleepiness Scale (ESS) were used. Univariate analysis was used to identify variables with significant differences, and the dataset was then divided into training and validation sets in a 4:1 ratio. The training set was established to predict OSA severity grading. The validation set was used to assess model performance using the area under the curve (AUC). Additionally, a separate analysis was conducted, categorizing the normal population as one group and patients with moderate-to-severe OSA as another. The same univariate analysis was applied, and the dataset was divided into training and validation sets in a 4:1 ratio. The training set was used to build a prediction model for screening moderate-to-severe OSA, while the validation set was used to verify the model's performance.Among the four groups, the LightGBM model outperformed others, with the top five feature importance rankings of ESS total score, BMI, sex, hypertension, and gastroesophageal reflux (GERD), where Age, ESS total score and BMI played the most significant roles. In the dichotomous model, RF is the best performer of the five models respectively. The top five ranked feature importance of the best-performing RF models were ESS total score, BMI, GERD, age and Dry mouth, with ESS total score and BMI being particularly pivotal.Machine learning-based prediction models for OSA disease grading and screening prove instrumental in the early identification of patients with moderate-to-severe OSA, revealing pertinent risk factors and facilitating timely interventions to counter pathological changes induced by OSA. Notably, ESS total score and BMI emerge as the most critical features for predicting OSA, emphasizing their significance in clinical assessments. The dataset will be publicly available on my Github.
利用极梯度提升(XGBoost)、逻辑回归(LR)、支持向量机(SVM)、轻梯度提升机(LightGBM)和随机森林(RF)五种先进算法,开发一种用于阻塞性睡眠呼吸暂停(OSA)自动筛查和诊断的稳健机器学习预测模型,为早期临床诊断和干预提供实质性支持。我们对2019年10月至2022年10月期间在徐州医科大学附属医院接受多导睡眠图检查的439名患者的临床数据进行了回顾性分析。预测变量包括人口统计学信息[年龄、性别、身高、体重、体重指数(BMI)]、病史和埃普沃思嗜睡量表(ESS)。采用单变量分析来确定具有显著差异的变量,然后按 4:1 的比例将数据集分为训练集和验证集。训练集用于预测 OSA 严重程度分级。验证集用于使用曲线下面积(AUC)评估模型性能。此外,还进行了一项单独的分析,将正常人群分为一组,将中重度 OSA 患者分为另一组。采用相同的单变量分析,并按 4:1 的比例将数据集分为训练集和验证集。在四组患者中,LightGBM 模型的表现优于其他模型,其前五位特征的重要性依次为ESS 总分、体重指数、性别、高血压和胃食管反流(GERD),其中年龄、ESS 总分和体重指数的作用最大。在二分模型中,RF 分别是五个模型中表现最好的。基于机器学习的 OSA 疾病分级和筛查预测模型有助于早期识别中度至重度 OSA 患者,揭示相关风险因素并促进及时干预,以应对 OSA 引起的病理变化。值得注意的是,ESS 总分和体重指数是预测 OSA 的最关键特征,这强调了它们在临床评估中的重要性。数据集将在我的 Github 上公开发布。
{"title":"Development and application of a machine learning-based predictive model for obstructive sleep apnea screening","authors":"Kang Liu, Shi Geng, Ping Shen, Lei Zhao, Peng Zhou, Wen Liu","doi":"10.3389/fdata.2024.1353469","DOIUrl":"https://doi.org/10.3389/fdata.2024.1353469","url":null,"abstract":"To develop a robust machine learning prediction model for the automatic screening and diagnosis of obstructive sleep apnea (OSA) using five advanced algorithms, namely Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Support Vector Machine (SVM), Light Gradient Boosting Machine (LightGBM), and Random Forest (RF) to provide substantial support for early clinical diagnosis and intervention.We conducted a retrospective analysis of clinical data from 439 patients who underwent polysomnography at the Affiliated Hospital of Xuzhou Medical University between October 2019 and October 2022. Predictor variables such as demographic information [age, sex, height, weight, body mass index (BMI)], medical history, and Epworth Sleepiness Scale (ESS) were used. Univariate analysis was used to identify variables with significant differences, and the dataset was then divided into training and validation sets in a 4:1 ratio. The training set was established to predict OSA severity grading. The validation set was used to assess model performance using the area under the curve (AUC). Additionally, a separate analysis was conducted, categorizing the normal population as one group and patients with moderate-to-severe OSA as another. The same univariate analysis was applied, and the dataset was divided into training and validation sets in a 4:1 ratio. The training set was used to build a prediction model for screening moderate-to-severe OSA, while the validation set was used to verify the model's performance.Among the four groups, the LightGBM model outperformed others, with the top five feature importance rankings of ESS total score, BMI, sex, hypertension, and gastroesophageal reflux (GERD), where Age, ESS total score and BMI played the most significant roles. In the dichotomous model, RF is the best performer of the five models respectively. The top five ranked feature importance of the best-performing RF models were ESS total score, BMI, GERD, age and Dry mouth, with ESS total score and BMI being particularly pivotal.Machine learning-based prediction models for OSA disease grading and screening prove instrumental in the early identification of patients with moderate-to-severe OSA, revealing pertinent risk factors and facilitating timely interventions to counter pathological changes induced by OSA. Notably, ESS total score and BMI emerge as the most critical features for predicting OSA, emphasizing their significance in clinical assessments. The dataset will be publicly available on my Github.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141127186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tradescantia response to air and soil pollution, stamen hair cells dataset and ANN color classification 蔓生植物对空气和土壤污染的反应、雄蕊毛细胞数据集和 ANN 颜色分类
IF 3.1 Q2 Computer Science Pub Date : 2024-05-15 DOI: 10.3389/fdata.2024.1384240
Leatrice Talita Rodrigues, Barbara Sanches Antunes Goeldner, Emílio Graciliano Ferreira Mercuri, S. M. Noe
Tradescantia plant is a complex system that is sensible to environmental factors such as water supply, pH, temperature, light, radiation, impurities, and nutrient availability. It can be used as a biomonitor for environmental changes; however, the bioassays are time-consuming and have a strong human interference factor that might change the result depending on who is performing the analysis. We have developed computer vision models to study color variations from Tradescantia clone 4430 plant stamen hair cells, which can be stressed due to air pollution and soil contamination. The study introduces a novel dataset, Trad-204, comprising single-cell images from Tradescantia clone 4430, captured during the Tradescantia stamen-hair mutation bioassay (Trad-SHM). The dataset contain images from two experiments, one focusing on air pollution by particulate matter and another based on soil contaminated by diesel oil. Both experiments were carried out in Curitiba, Brazil, between 2020 and 2023. The images represent single cells with different shapes, sizes, and colors, reflecting the plant's responses to environmental stressors. An automatic classification task was developed to distinguishing between blue and pink cells, and the study explores both a baseline model and three artificial neural network (ANN) architectures, namely, TinyVGG, VGG-16, and ResNet34. Tradescantia revealed sensibility to both air particulate matter concentration and diesel oil in soil. The results indicate that Residual Network architecture outperforms the other models in terms of accuracy on both training and testing sets. The dataset and findings contribute to the understanding of plant cell responses to environmental stress and provide valuable resources for further research in automated image analysis of plant cells. Discussion highlights the impact of turgor pressure on cell shape and the potential implications for plant physiology. The comparison between ANN architectures aligns with previous research, emphasizing the superior performance of ResNet models in image classification tasks. Artificial intelligence identification of pink cells improves the counting accuracy, thus avoiding human errors due to different color perceptions, fatigue, or inattention, in addition to facilitating and speeding up the analysis process. Overall, the study offers insights into plant cell dynamics and provides a foundation for future investigations like cells morphology change. This research corroborates that biomonitoring should be considered as an important tool for political actions, being a relevant issue in risk assessment and the development of new public policies relating to the environment.
苔藓植物是一个复杂的系统,对供水、pH 值、温度、光照、辐射、杂质和营养供应等环境因素具有敏感性。它可用作环境变化的生物监测器;然而,生物测定耗时且人为干扰因素较强,可能会因分析人员的不同而改变结果。我们开发了计算机视觉模型,以研究因空气污染和土壤污染而受压的Tradescantia克隆4430植物雄蕊毛细胞的颜色变化。这项研究引入了一个新的数据集 Trad-204,该数据集包含在翠菊雄蕊毛突变生物测定(Trad-SHM)过程中捕获的翠菊克隆 4430 单细胞图像。该数据集包含两个实验的图像,一个侧重于微粒物质造成的空气污染,另一个基于柴油污染的土壤。这两项实验于 2020 年至 2023 年在巴西库里提巴进行。这些图像代表了不同形状、大小和颜色的单细胞,反映了植物对环境压力的反应。研究开发了一项自动分类任务,以区分蓝色和粉色细胞,并探索了基线模型和三种人工神经网络(ANN)架构,即 TinyVGG、VGG-16 和 ResNet34。研究显示,Tradescantia 对空气中的颗粒物浓度和土壤中的柴油都很敏感。结果表明,残差网络架构在训练集和测试集上的准确性都优于其他模型。该数据集和研究结果有助于理解植物细胞对环境压力的反应,并为进一步研究植物细胞的自动图像分析提供了宝贵的资源。讨论强调了水分压力对细胞形状的影响以及对植物生理学的潜在影响。ANN架构之间的比较与之前的研究一致,强调了ResNet模型在图像分类任务中的优越性能。人工智能识别粉红色细胞提高了计数的准确性,从而避免了由于对颜色的不同感知、疲劳或注意力不集中而造成的人为错误,此外还促进并加快了分析过程。总之,这项研究有助于深入了解植物细胞的动态变化,并为今后开展细胞形态变化等研究奠定了基础。这项研究证实,生物监测应被视为政治行动的重要工具,是风险评估和制定与环境有关的新公共政策的相关问题。
{"title":"Tradescantia response to air and soil pollution, stamen hair cells dataset and ANN color classification","authors":"Leatrice Talita Rodrigues, Barbara Sanches Antunes Goeldner, Emílio Graciliano Ferreira Mercuri, S. M. Noe","doi":"10.3389/fdata.2024.1384240","DOIUrl":"https://doi.org/10.3389/fdata.2024.1384240","url":null,"abstract":"Tradescantia plant is a complex system that is sensible to environmental factors such as water supply, pH, temperature, light, radiation, impurities, and nutrient availability. It can be used as a biomonitor for environmental changes; however, the bioassays are time-consuming and have a strong human interference factor that might change the result depending on who is performing the analysis. We have developed computer vision models to study color variations from Tradescantia clone 4430 plant stamen hair cells, which can be stressed due to air pollution and soil contamination. The study introduces a novel dataset, Trad-204, comprising single-cell images from Tradescantia clone 4430, captured during the Tradescantia stamen-hair mutation bioassay (Trad-SHM). The dataset contain images from two experiments, one focusing on air pollution by particulate matter and another based on soil contaminated by diesel oil. Both experiments were carried out in Curitiba, Brazil, between 2020 and 2023. The images represent single cells with different shapes, sizes, and colors, reflecting the plant's responses to environmental stressors. An automatic classification task was developed to distinguishing between blue and pink cells, and the study explores both a baseline model and three artificial neural network (ANN) architectures, namely, TinyVGG, VGG-16, and ResNet34. Tradescantia revealed sensibility to both air particulate matter concentration and diesel oil in soil. The results indicate that Residual Network architecture outperforms the other models in terms of accuracy on both training and testing sets. The dataset and findings contribute to the understanding of plant cell responses to environmental stress and provide valuable resources for further research in automated image analysis of plant cells. Discussion highlights the impact of turgor pressure on cell shape and the potential implications for plant physiology. The comparison between ANN architectures aligns with previous research, emphasizing the superior performance of ResNet models in image classification tasks. Artificial intelligence identification of pink cells improves the counting accuracy, thus avoiding human errors due to different color perceptions, fatigue, or inattention, in addition to facilitating and speeding up the analysis process. Overall, the study offers insights into plant cell dynamics and provides a foundation for future investigations like cells morphology change. This research corroborates that biomonitoring should be considered as an important tool for political actions, being a relevant issue in risk assessment and the development of new public policies relating to the environment.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140971653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic literature review on the impact of AI models on the security of code generation 关于人工智能模型对代码生成安全性影响的系统文献综述
IF 3.1 Q2 Computer Science Pub Date : 2024-05-13 DOI: 10.3389/fdata.2024.1386720
Claudia Negri-Ribalta, Rémi Geraud-Stewart, Anastasia Sergeeva, Gabriele Lenzini
Artificial Intelligence (AI) is increasingly used as a helper to develop computing programs. While it can boost software development and improve coding proficiency, this practice offers no guarantee of security. On the contrary, recent research shows that some AI models produce software with vulnerabilities. This situation leads to the question: How serious and widespread are the security flaws in code generated using AI models?Through a systematic literature review, this work reviews the state of the art on how AI models impact software security. It systematizes the knowledge about the risks of using AI in coding security-critical software.It reviews what security flaws of well-known vulnerabilities (e.g., the MITRE CWE Top 25 Most Dangerous Software Weaknesses) are commonly hidden in AI-generated code. It also reviews works that discuss how vulnerabilities in AI-generated code can be exploited to compromise security and lists the attempts to improve the security of such AI-generated code.Overall, this work provides a comprehensive and systematic overview of the impact of AI in secure coding. This topic has sparked interest and concern within the software security engineering community. It highlights the importance of setting up security measures and processes, such as code verification, and that such practices could be customized for AI-aided code production.
人工智能(AI)越来越多地被用作开发计算程序的助手。虽然人工智能可以促进软件开发并提高编码能力,但这种做法并不能保证安全。相反,最近的研究表明,一些人工智能模型生成的软件存在漏洞。这种情况引出了一个问题:通过系统的文献综述,本作品回顾了人工智能模型如何影响软件安全的最新进展。它回顾了人工智能生成的代码中通常隐藏着哪些众所周知的安全漏洞(如 MITRE CWE 最危险的 25 个软件弱点)。它还回顾了讨论如何利用人工智能生成的代码中的漏洞来破坏安全性的作品,并列出了为提高此类人工智能生成的代码的安全性所做的尝试。总之,这部作品全面系统地概述了人工智能对安全编码的影响。总之,这部著作全面系统地概述了人工智能对安全编码的影响,这一话题引发了软件安全工程界的兴趣和关注。它强调了建立安全措施和流程(如代码验证)的重要性,并指出可为人工智能辅助代码生成定制此类做法。
{"title":"A systematic literature review on the impact of AI models on the security of code generation","authors":"Claudia Negri-Ribalta, Rémi Geraud-Stewart, Anastasia Sergeeva, Gabriele Lenzini","doi":"10.3389/fdata.2024.1386720","DOIUrl":"https://doi.org/10.3389/fdata.2024.1386720","url":null,"abstract":"Artificial Intelligence (AI) is increasingly used as a helper to develop computing programs. While it can boost software development and improve coding proficiency, this practice offers no guarantee of security. On the contrary, recent research shows that some AI models produce software with vulnerabilities. This situation leads to the question: How serious and widespread are the security flaws in code generated using AI models?Through a systematic literature review, this work reviews the state of the art on how AI models impact software security. It systematizes the knowledge about the risks of using AI in coding security-critical software.It reviews what security flaws of well-known vulnerabilities (e.g., the MITRE CWE Top 25 Most Dangerous Software Weaknesses) are commonly hidden in AI-generated code. It also reviews works that discuss how vulnerabilities in AI-generated code can be exploited to compromise security and lists the attempts to improve the security of such AI-generated code.Overall, this work provides a comprehensive and systematic overview of the impact of AI in secure coding. This topic has sparked interest and concern within the software security engineering community. It highlights the importance of setting up security measures and processes, such as code verification, and that such practices could be customized for AI-aided code production.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140985358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visualization as irritation: producing knowledge about medieval courts through uncertainty 作为刺激的可视化:通过不确定性生成有关中世纪宫廷的知识
IF 3.1 Q2 Computer Science Pub Date : 2024-05-10 DOI: 10.3389/fdata.2024.1188620
Silke Schwandt, Christian Wachter
Visualizations are ubiquitous in data-driven research, serving as both tools for knowledge production and genuine means of knowledge communication. Despite criticisms targeting the alleged objectivity of visualizations in the digital humanities (DH) and reflections on how they may serve as representations of both scholarly perspective and uncertainty within the data analysis pipeline, there remains a notable scarcity of in-depth theoretical grounding for these assumptions in DH discussions. It is our understanding that only through theoretical foundations such as basic semiotic principles and perspectives on media modality one can fully assess the use and potential of visualizations for innovation in scholarly interpretation. We argue that visualizations have the capacity to “productively irritate” existing scholarly knowledge in a given research field. This does not just mean that visualizations depict patterns in datasets that seem not in line with prior research and thus stimulate deeper examination. Complementarily, “irritation” here consists of visualizations producing uncertainty about their own meaning—yet it is precisely this uncertainty in which the potential for greater insight lies. It stimulates questions about what is depicted and what is not. This turns out to be a valuable resource for scholarly interpretation, and one could argue that visualizing big data is particularly prolific in this sense, because due to their complexity researchers cannot interpret the data without visual representations. However, we argue that “productive irritation” can also happen below the level of big data. We see this potential rooted in the genuinely semiotic and semantic properties of visual media, which studies in multimodality and specifically in the field of Bildlinguistik have carved out: a visualization's holistic overview of data patterns is juxtaposed to its semantic vagueness, which gives way to deep interpretations and multiple perspectives on that data. We elucidate this potential using examples from medieval English legal history. Visualizations of data relating to legal functions and social constellations of various people in court offer surprising insights that can lead to new knowledge through “productive irritation.”
可视化在数据驱动的研究中无处不在,既是知识生产的工具,也是真正的知识交流手段。尽管有人批评数字人文学科(DH)中的可视化所谓的客观性,并反思可视化如何在数据分析管道中代表学术观点和不确定性,但在数字人文学科的讨论中,对这些假设的深入理论基础仍然明显缺乏。我们认为,只有通过基本符号学原理和媒体模式视角等理论基础,才能全面评估可视化在学术解释创新中的应用和潜力。我们认为,可视化能够 "有效地刺激 "特定研究领域的现有学术知识。这不仅仅是指可视化描述的数据集模式似乎与先前的研究不符,从而激发更深入的研究。作为补充,这里的 "刺激 "包括可视化对其自身意义产生的不确定性--然而,正是这种不确定性蕴藏着获得更大洞察力的潜力。它激发了人们对所描绘的内容和未描绘的内容的疑问。可以说,大数据的可视化在这个意义上尤其多产,因为大数据的复杂性使研究人员无法在没有可视化表征的情况下解读数据。不过,我们认为,"富有成效的刺激 "也可以发生在大数据层面之下。我们认为,这种潜力植根于视觉媒体真正的符号学和语义学特性,多模态研究,特别是双语语言学领域的研究已经证明了这一点:可视化对数据模式的整体概述与其语义的模糊性并列,从而为数据的深度解读和多重视角提供了可能。我们以中世纪英国法律史为例,阐释了这一潜力。与法律职能和法庭上不同人员的社会组合有关的可视化数据提供了令人惊讶的见解,可通过 "生产性刺激 "产生新的知识。
{"title":"Visualization as irritation: producing knowledge about medieval courts through uncertainty","authors":"Silke Schwandt, Christian Wachter","doi":"10.3389/fdata.2024.1188620","DOIUrl":"https://doi.org/10.3389/fdata.2024.1188620","url":null,"abstract":"Visualizations are ubiquitous in data-driven research, serving as both tools for knowledge production and genuine means of knowledge communication. Despite criticisms targeting the alleged objectivity of visualizations in the digital humanities (DH) and reflections on how they may serve as representations of both scholarly perspective and uncertainty within the data analysis pipeline, there remains a notable scarcity of in-depth theoretical grounding for these assumptions in DH discussions. It is our understanding that only through theoretical foundations such as basic semiotic principles and perspectives on media modality one can fully assess the use and potential of visualizations for innovation in scholarly interpretation. We argue that visualizations have the capacity to “productively irritate” existing scholarly knowledge in a given research field. This does not just mean that visualizations depict patterns in datasets that seem not in line with prior research and thus stimulate deeper examination. Complementarily, “irritation” here consists of visualizations producing uncertainty about their own meaning—yet it is precisely this uncertainty in which the potential for greater insight lies. It stimulates questions about what is depicted and what is not. This turns out to be a valuable resource for scholarly interpretation, and one could argue that visualizing big data is particularly prolific in this sense, because due to their complexity researchers cannot interpret the data without visual representations. However, we argue that “productive irritation” can also happen below the level of big data. We see this potential rooted in the genuinely semiotic and semantic properties of visual media, which studies in multimodality and specifically in the field of Bildlinguistik have carved out: a visualization's holistic overview of data patterns is juxtaposed to its semantic vagueness, which gives way to deep interpretations and multiple perspectives on that data. We elucidate this potential using examples from medieval English legal history. Visualizations of data relating to legal functions and social constellations of various people in court offer surprising insights that can lead to new knowledge through “productive irritation.”","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140993695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forecasting cryptocurrency's buy signal with a bagged tree learning approach to enhance purchase decisions 用袋装树学习法预测加密货币的购买信号,从而提高购买决策水平
IF 3.1 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.3389/fdata.2024.1369895
Raed Alsini, Q. Abu Al-haija, Abdulaziz A. Alsulami, Badraddin Alturki, Abdulaziz A. Alqurashi, M. D. Mashat, Ali Alqahtani, Nawaf Alhebaishi
The cryptocurrency market is captivating the attention of both retail and institutional investors. While this highly volatile market offers investors substantial profit opportunities, it also entails risks due to its sensitivity to speculative news and the erratic behavior of major investors, both of which can provoke unexpected price fluctuations.In this study, we contend that extreme and sudden price changes and atypical patterns might compromise the performance of technical signals utilized as the basis for feature extraction in a machine learning-based trading system by either augmenting or diminishing the model's generalization capability. To address this issue, this research uses a bagged tree (BT) model to forecast the buy signal for the cryptocurrency market. To achieve this, traders must acquire knowledge about the cryptocurrency market and modify their strategies accordingly.To make an informed decision, we depended on the most prevalently utilized oscillators, namely, the buy signal in the cryptocurrency market, comprising the Relative Strength Index (RSI), Bollinger Bands (BB), and the Moving Average Convergence/Divergence (MACD) indicator. Also, the research evaluates how accurately a model can predict the performance of different cryptocurrencies such as Bitcoin (BTC), Ethereum (ETH), Cardano (ADA), and Binance Coin (BNB). Furthermore, the efficacy of the most popular machine learning model in precisely forecasting outcomes within the cryptocurrency market is examined. Notably, predicting buy signal values using a BT model provides promising results.
加密货币市场正吸引着散户和机构投资者的目光。在本研究中,我们认为,极端和突然的价格变化以及非典型模式可能会损害技术信号的性能,而技术信号是基于机器学习的交易系统中特征提取的基础,它可以增强或削弱模型的泛化能力。为了解决这个问题,本研究采用了袋装树(BT)模型来预测加密货币市场的买入信号。为此,交易者必须掌握加密货币市场的相关知识,并相应地修改策略。为了做出明智的决策,我们依赖于最常用的震荡指标,即加密货币市场的买入信号,包括相对强弱指数(RSI)、布林带(BB)和移动平均趋同/背离(MACD)指标。研究还评估了模型预测比特币(BTC)、以太坊(ETH)、卡达诺(ADA)和 Binance Coin(BNB)等不同加密货币表现的准确性。此外,我们还研究了最流行的机器学习模型在精确预测加密货币市场结果方面的功效。值得注意的是,使用 BT 模型预测买入信号值的结果很有希望。
{"title":"Forecasting cryptocurrency's buy signal with a bagged tree learning approach to enhance purchase decisions","authors":"Raed Alsini, Q. Abu Al-haija, Abdulaziz A. Alsulami, Badraddin Alturki, Abdulaziz A. Alqurashi, M. D. Mashat, Ali Alqahtani, Nawaf Alhebaishi","doi":"10.3389/fdata.2024.1369895","DOIUrl":"https://doi.org/10.3389/fdata.2024.1369895","url":null,"abstract":"The cryptocurrency market is captivating the attention of both retail and institutional investors. While this highly volatile market offers investors substantial profit opportunities, it also entails risks due to its sensitivity to speculative news and the erratic behavior of major investors, both of which can provoke unexpected price fluctuations.In this study, we contend that extreme and sudden price changes and atypical patterns might compromise the performance of technical signals utilized as the basis for feature extraction in a machine learning-based trading system by either augmenting or diminishing the model's generalization capability. To address this issue, this research uses a bagged tree (BT) model to forecast the buy signal for the cryptocurrency market. To achieve this, traders must acquire knowledge about the cryptocurrency market and modify their strategies accordingly.To make an informed decision, we depended on the most prevalently utilized oscillators, namely, the buy signal in the cryptocurrency market, comprising the Relative Strength Index (RSI), Bollinger Bands (BB), and the Moving Average Convergence/Divergence (MACD) indicator. Also, the research evaluates how accurately a model can predict the performance of different cryptocurrencies such as Bitcoin (BTC), Ethereum (ETH), Cardano (ADA), and Binance Coin (BNB). Furthermore, the efficacy of the most popular machine learning model in precisely forecasting outcomes within the cryptocurrency market is examined. Notably, predicting buy signal values using a BT model provides promising results.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140997120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal recommender system for predicting project manager performance within a competency-based framework 在基于能力的框架内预测项目经理绩效的多模式推荐系统
IF 3.1 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.3389/fdata.2024.1295009
Imene Jemal, Wilfried Armand Naoussi Sijou, Belkacem Chikhaoui
The evaluation of performance using competencies within a structured framework holds significant importance across various professional domains, particularly in roles like project manager. Typically, this assessment process, overseen by senior evaluators, involves scoring competencies based on data gathered from interviews, completed forms, and evaluation programs. However, this task is tedious and time-consuming, and requires the expertise of qualified professionals. Moreover, it is compounded by the inconsistent scoring biases introduced by different evaluators. In this paper, we propose a novel approach to automatically predict competency scores, thereby facilitating the assessment of project managers' performance. Initially, we performed data fusion to compile a comprehensive dataset from various sources and modalities, including demographic data, profile-related data, and historical competency assessments. Subsequently, NLP techniques were used to pre-process text data. Finally, recommender systems were explored to predict competency scores. We compared four different recommender system approaches: content-based filtering, demographic filtering, collaborative filtering, and hybrid filtering. Using assessment data collected from 38 project managers, encompassing scores across 67 different competencies, we evaluated the performance of each approach. Notably, the content-based approach yielded promising results, achieving a precision rate of 81.03%. Furthermore, we addressed the challenge of cold-starting, which in our context involves predicting scores for either a new project manager lacking competency data or a newly introduced competency without historical records. Our analysis revealed that demographic filtering achieved an average precision of 54.05% when dealing with new project managers. In contrast, content-based filtering exhibited remarkable performance, achieving a precision of 85.79% in predicting scores for new competencies. These findings underscore the potential of recommender systems in competency assessment, thereby facilitating more effective performance evaluation process.
在结构化框架内利用能力进行绩效评估在各个专业领域都具有重要意义,尤其是在项目经理等职位上。通常情况下,在高级评估员的监督下,这一评估过程包括根据从面谈、填写表格和评估项目中收集的数据对能力进行评分。然而,这项工作既繁琐又耗时,需要合格专业人员的专业知识。此外,不同的评估人员会带来不一致的评分偏差,使这项工作变得更加复杂。在本文中,我们提出了一种自动预测能力得分的新方法,从而促进了对项目经理绩效的评估。首先,我们进行了数据融合,从各种来源和模式(包括人口统计数据、档案相关数据和历史能力评估)汇编了一个综合数据集。随后,我们使用 NLP 技术对文本数据进行预处理。最后,我们探索了推荐系统来预测能力得分。我们比较了四种不同的推荐系统方法:基于内容的过滤、人口统计学过滤、协同过滤和混合过滤。我们使用从 38 名项目经理处收集的评估数据(包括 67 种不同能力的得分),对每种方法的性能进行了评估。值得注意的是,基于内容的方法取得了可喜的成果,精确率达到 81.03%。此外,我们还解决了冷启动的难题,在我们的语境中,冷启动涉及到为缺乏能力数据的新项目经理或没有历史记录的新引入能力预测分数。我们的分析表明,在处理新项目经理时,人口统计学过滤的平均精确度为 54.05%。与此相反,基于内容的过滤则表现出色,在预测新能力得分方面达到了 85.79% 的精确度。这些发现凸显了推荐系统在能力评估方面的潜力,从而促进了更有效的绩效评估过程。
{"title":"Multi-modal recommender system for predicting project manager performance within a competency-based framework","authors":"Imene Jemal, Wilfried Armand Naoussi Sijou, Belkacem Chikhaoui","doi":"10.3389/fdata.2024.1295009","DOIUrl":"https://doi.org/10.3389/fdata.2024.1295009","url":null,"abstract":"The evaluation of performance using competencies within a structured framework holds significant importance across various professional domains, particularly in roles like project manager. Typically, this assessment process, overseen by senior evaluators, involves scoring competencies based on data gathered from interviews, completed forms, and evaluation programs. However, this task is tedious and time-consuming, and requires the expertise of qualified professionals. Moreover, it is compounded by the inconsistent scoring biases introduced by different evaluators. In this paper, we propose a novel approach to automatically predict competency scores, thereby facilitating the assessment of project managers' performance. Initially, we performed data fusion to compile a comprehensive dataset from various sources and modalities, including demographic data, profile-related data, and historical competency assessments. Subsequently, NLP techniques were used to pre-process text data. Finally, recommender systems were explored to predict competency scores. We compared four different recommender system approaches: content-based filtering, demographic filtering, collaborative filtering, and hybrid filtering. Using assessment data collected from 38 project managers, encompassing scores across 67 different competencies, we evaluated the performance of each approach. Notably, the content-based approach yielded promising results, achieving a precision rate of 81.03%. Furthermore, we addressed the challenge of cold-starting, which in our context involves predicting scores for either a new project manager lacking competency data or a newly introduced competency without historical records. Our analysis revealed that demographic filtering achieved an average precision of 54.05% when dealing with new project managers. In contrast, content-based filtering exhibited remarkable performance, achieving a precision of 85.79% in predicting scores for new competencies. These findings underscore the potential of recommender systems in competency assessment, thereby facilitating more effective performance evaluation process.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140997081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Challenges and efforts in managing AI trustworthiness risks: a state of knowledge 管理人工智能可信性风险的挑战和努力:知识现状
IF 3.1 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.3389/fdata.2024.1381163
Nineta Polemi, Isabel Praça, K. Kioskli, Adrien Bécue
This paper addresses the critical gaps in existing AI risk management frameworks, emphasizing the neglect of human factors and the absence of metrics for socially related or human threats. Drawing from insights provided by NIST AI RFM and ENISA, the research underscores the need for understanding the limitations of human-AI interaction and the development of ethical and social measurements. The paper explores various dimensions of trustworthiness, covering legislation, AI cyber threat intelligence, and characteristics of AI adversaries. It delves into technical threats and vulnerabilities, including data access, poisoning, and backdoors, highlighting the importance of collaboration between cybersecurity engineers, AI experts, and social-psychology-behavior-ethics professionals. Furthermore, the socio-psychological threats associated with AI integration into society are examined, addressing issues such as bias, misinformation, and privacy erosion. The manuscript proposes a comprehensive approach to AI trustworthiness, combining technical and social mitigation measures, standards, and ongoing research initiatives. Additionally, it introduces innovative defense strategies, such as cyber-social exercises, digital clones, and conversational agents, to enhance understanding of adversary profiles and fortify AI security. The paper concludes with a call for interdisciplinary collaboration, awareness campaigns, and continuous research efforts to create a robust and resilient AI ecosystem aligned with ethical standards and societal expectations.
本文探讨了现有人工智能风险管理框架中存在的关键差距,强调了对人为因素的忽视,以及缺乏与社会相关或人为威胁的衡量标准。研究借鉴了 NIST AI RFM 和 ENISA 提供的见解,强调有必要了解人与人工智能互动的局限性,并制定道德和社会衡量标准。本文探讨了可信度的各个层面,包括立法、人工智能网络威胁情报和人工智能对手的特征。论文深入探讨了技术威胁和漏洞,包括数据访问、中毒和后门,强调了网络安全工程师、人工智能专家和社会心理学-行为伦理学专业人士之间合作的重要性。此外,还探讨了与人工智能融入社会相关的社会心理威胁,涉及偏见、错误信息和隐私侵蚀等问题。手稿提出了一种全面的人工智能可信性方法,将技术和社会缓解措施、标准和正在进行的研究计划结合起来。此外,它还介绍了创新的防御策略,如网络社交演习、数字克隆和对话代理,以加深对对手特征的了解并加强人工智能的安全性。论文最后呼吁开展跨学科合作、提高认识运动和持续研究工作,以创建一个符合道德标准和社会期望的强大而有弹性的人工智能生态系统。
{"title":"Challenges and efforts in managing AI trustworthiness risks: a state of knowledge","authors":"Nineta Polemi, Isabel Praça, K. Kioskli, Adrien Bécue","doi":"10.3389/fdata.2024.1381163","DOIUrl":"https://doi.org/10.3389/fdata.2024.1381163","url":null,"abstract":"This paper addresses the critical gaps in existing AI risk management frameworks, emphasizing the neglect of human factors and the absence of metrics for socially related or human threats. Drawing from insights provided by NIST AI RFM and ENISA, the research underscores the need for understanding the limitations of human-AI interaction and the development of ethical and social measurements. The paper explores various dimensions of trustworthiness, covering legislation, AI cyber threat intelligence, and characteristics of AI adversaries. It delves into technical threats and vulnerabilities, including data access, poisoning, and backdoors, highlighting the importance of collaboration between cybersecurity engineers, AI experts, and social-psychology-behavior-ethics professionals. Furthermore, the socio-psychological threats associated with AI integration into society are examined, addressing issues such as bias, misinformation, and privacy erosion. The manuscript proposes a comprehensive approach to AI trustworthiness, combining technical and social mitigation measures, standards, and ongoing research initiatives. Additionally, it introduces innovative defense strategies, such as cyber-social exercises, digital clones, and conversational agents, to enhance understanding of adversary profiles and fortify AI security. The paper concludes with a call for interdisciplinary collaboration, awareness campaigns, and continuous research efforts to create a robust and resilient AI ecosystem aligned with ethical standards and societal expectations.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140996466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying uncertainty in graph neural network explanations 量化图神经网络解释中的不确定性
IF 3.1 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.3389/fdata.2024.1392662
Junji Jiang, Chen Ling, Hongyi Li, Guangji Bai, Xujiang Zhao, Liang Zhao
In recent years, analyzing the explanation for the prediction of Graph Neural Networks (GNNs) has attracted increasing attention. Despite this progress, most existing methods do not adequately consider the inherent uncertainties stemming from the randomness of model parameters and graph data, which may lead to overconfidence and misguiding explanations. However, it is challenging for most of GNN explanation methods to quantify these uncertainties since they obtain the prediction explanation in a post-hoc and model-agnostic manner without considering the randomness of graph data and model parameters. To address the above problems, this paper proposes a novel uncertainty quantification framework for GNN explanations. For mitigating the randomness of graph data in the explanation, our framework accounts for two distinct data uncertainties, allowing for a direct assessment of the uncertainty in GNN explanations. For mitigating the randomness of learned model parameters, our method learns the parameter distribution directly from the data, obviating the need for assumptions about specific distributions. Moreover, the explanation uncertainty within model parameters is also quantified based on the learned parameter distributions. This holistic approach can integrate with any post-hoc GNN explanation methods. Empirical results from our study show that our proposed method sets a new standard for GNN explanation performance across diverse real-world graph benchmarks.
近年来,分析图神经网络(GNN)预测的解释引起了越来越多的关注。尽管取得了这一进展,但大多数现有方法并没有充分考虑到模型参数和图数据的随机性所带来的内在不确定性,这可能会导致过度自信和错误的解释。然而,由于大多数 GNN 解释方法都是在不考虑图数据和模型参数的随机性的情况下,以事后和与模型无关的方式获得预测解释,因此量化这些不确定性对它们来说具有挑战性。针对上述问题,本文提出了一种新的 GNN 解释不确定性量化框架。为了减轻解释中图形数据的随机性,我们的框架考虑了两种不同的数据不确定性,从而可以直接评估 GNN 解释的不确定性。为了减轻所学模型参数的随机性,我们的方法直接从数据中学习参数分布,无需对特定分布进行假设。此外,模型参数内的解释不确定性也会根据学习到的参数分布进行量化。这种整体方法可以与任何事后 GNN 解释方法相结合。研究的实证结果表明,我们提出的方法为各种真实世界图基准的 GNN 解释性能设定了新标准。
{"title":"Quantifying uncertainty in graph neural network explanations","authors":"Junji Jiang, Chen Ling, Hongyi Li, Guangji Bai, Xujiang Zhao, Liang Zhao","doi":"10.3389/fdata.2024.1392662","DOIUrl":"https://doi.org/10.3389/fdata.2024.1392662","url":null,"abstract":"In recent years, analyzing the explanation for the prediction of Graph Neural Networks (GNNs) has attracted increasing attention. Despite this progress, most existing methods do not adequately consider the inherent uncertainties stemming from the randomness of model parameters and graph data, which may lead to overconfidence and misguiding explanations. However, it is challenging for most of GNN explanation methods to quantify these uncertainties since they obtain the prediction explanation in a post-hoc and model-agnostic manner without considering the randomness of graph data and model parameters. To address the above problems, this paper proposes a novel uncertainty quantification framework for GNN explanations. For mitigating the randomness of graph data in the explanation, our framework accounts for two distinct data uncertainties, allowing for a direct assessment of the uncertainty in GNN explanations. For mitigating the randomness of learned model parameters, our method learns the parameter distribution directly from the data, obviating the need for assumptions about specific distributions. Moreover, the explanation uncertainty within model parameters is also quantified based on the learned parameter distributions. This holistic approach can integrate with any post-hoc GNN explanation methods. Empirical results from our study show that our proposed method sets a new standard for GNN explanation performance across diverse real-world graph benchmarks.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140996593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics 用户和实体行为分析聚类算法综合研究
IF 3.1 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.3389/fdata.2024.1375818
Pierpaolo Artioli, Antonio Maci, Alessio Magrì
Government agencies are now encouraging industries to enhance their security systems to detect and respond proactively to cybersecurity incidents. Consequently, equipping with a security operation center that combines the analytical capabilities of human experts with systems based on Machine Learning (ML) plays a critical role. In this setting, Security Information and Event Management (SIEM) platforms can effectively handle network-related events to trigger cybersecurity alerts. Furthermore, a SIEM may include a User and Entity Behavior Analytics (UEBA) engine that examines the behavior of both users and devices, or entities, within a corporate network.In recent literature, several contributions have employed ML algorithms for UEBA, especially those based on the unsupervised learning paradigm, because anomalous behaviors are usually not known in advance. However, to shorten the gap between research advances and practice, it is necessary to comprehensively analyze the effectiveness of these methodologies. This paper proposes a thorough investigation of traditional and emerging clustering algorithms for UEBA, considering multiple application contexts, i.e., different user-entity interaction scenarios.Our study involves three datasets sourced from the existing literature and fifteen clustering algorithms. Among the compared techniques, HDBSCAN and DenMune showed promising performance on the state-of-the-art CERT behavior-related dataset, producing groups with a density very close to the number of users.
目前,政府机构正在鼓励各行业加强其安全系统,以检测网络安全事件并做出积极反应。因此,配备一个将人类专家的分析能力与基于机器学习(ML)的系统相结合的安全操作中心至关重要。在这种情况下,安全信息和事件管理(SIEM)平台可以有效处理网络相关事件,触发网络安全警报。此外,SIEM 可能包括用户和实体行为分析(UEBA)引擎,该引擎可检查企业网络内用户和设备或实体的行为。在最近的文献中,有几篇文章采用了用于 UEBA 的 ML 算法,特别是那些基于无监督学习范式的算法,因为异常行为通常无法提前知晓。然而,为了缩短研究进展与实践之间的差距,有必要全面分析这些方法的有效性。考虑到多种应用情境,即不同的用户-实体交互场景,本文对用于 UEBA 的传统和新兴聚类算法进行了深入研究。在所比较的技术中,HDBSCAN 和 DenMune 在最先进的 CERT 行为相关数据集上表现出了良好的性能,产生的群组密度非常接近用户数量。
{"title":"A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics","authors":"Pierpaolo Artioli, Antonio Maci, Alessio Magrì","doi":"10.3389/fdata.2024.1375818","DOIUrl":"https://doi.org/10.3389/fdata.2024.1375818","url":null,"abstract":"Government agencies are now encouraging industries to enhance their security systems to detect and respond proactively to cybersecurity incidents. Consequently, equipping with a security operation center that combines the analytical capabilities of human experts with systems based on Machine Learning (ML) plays a critical role. In this setting, Security Information and Event Management (SIEM) platforms can effectively handle network-related events to trigger cybersecurity alerts. Furthermore, a SIEM may include a User and Entity Behavior Analytics (UEBA) engine that examines the behavior of both users and devices, or entities, within a corporate network.In recent literature, several contributions have employed ML algorithms for UEBA, especially those based on the unsupervised learning paradigm, because anomalous behaviors are usually not known in advance. However, to shorten the gap between research advances and practice, it is necessary to comprehensively analyze the effectiveness of these methodologies. This paper proposes a thorough investigation of traditional and emerging clustering algorithms for UEBA, considering multiple application contexts, i.e., different user-entity interaction scenarios.Our study involves three datasets sourced from the existing literature and fifteen clustering algorithms. Among the compared techniques, HDBSCAN and DenMune showed promising performance on the state-of-the-art CERT behavior-related dataset, producing groups with a density very close to the number of users.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140994869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The role of big data in financial technology toward financial inclusion 大数据在实现普惠金融的金融技术中的作用
IF 3.1 Q2 Computer Science Pub Date : 2024-05-07 DOI: 10.3389/fdata.2024.1184444
David Mhlanga
In the rapidly evolving landscape of financial technology (FinTech), big data stands as a cornerstone, driving significant transformations. This study delves into the pivotal role of big data in FinTech and its implications for financial inclusion. Employing a comprehensive literature review methodology, we analyze diverse sources including academic journals, industry reports, and online articles. Our findings illuminate how big data catalyzes the development of novel financial products and services, enhances risk management, and boosts operational efficiency, thereby fostering financial inclusion. Particularly, big data's capability to offer insightful customer behavior analytics is highlighted as a key driver for creating inclusive financial services. However, challenges such as data privacy and security, and the need for ethical algorithmic practices are also identified. This research contributes valuable insights for policymakers, regulators, and industry practitioners, suggesting a need for balanced regulatory frameworks to harness big data's potential ethically and responsibly. The outcomes of this study underscore the transformative power of big data in FinTech, indicating a pathway toward a more inclusive financial ecosystem.
在快速发展的金融科技(FinTech)领域,大数据是推动重大变革的基石。本研究深入探讨了大数据在金融科技中的关键作用及其对普惠金融的影响。我们采用全面的文献综述方法,分析了包括学术期刊、行业报告和网络文章在内的各种资料来源。我们的研究结果阐明了大数据如何催化新型金融产品和服务的开发、加强风险管理和提高运营效率,从而促进普惠金融的发展。尤其是,大数据提供深入的客户行为分析能力被强调为创建普惠金融服务的关键驱动力。不过,研究也发现了数据隐私和安全等挑战,以及对合乎道德的算法实践的需求。这项研究为政策制定者、监管者和行业从业者提供了宝贵的见解,表明需要建立平衡的监管框架,以合乎道德和负责任的方式利用大数据的潜力。本研究的成果强调了大数据在金融科技领域的变革力量,为建立更具包容性的金融生态系统指明了道路。
{"title":"The role of big data in financial technology toward financial inclusion","authors":"David Mhlanga","doi":"10.3389/fdata.2024.1184444","DOIUrl":"https://doi.org/10.3389/fdata.2024.1184444","url":null,"abstract":"In the rapidly evolving landscape of financial technology (FinTech), big data stands as a cornerstone, driving significant transformations. This study delves into the pivotal role of big data in FinTech and its implications for financial inclusion. Employing a comprehensive literature review methodology, we analyze diverse sources including academic journals, industry reports, and online articles. Our findings illuminate how big data catalyzes the development of novel financial products and services, enhances risk management, and boosts operational efficiency, thereby fostering financial inclusion. Particularly, big data's capability to offer insightful customer behavior analytics is highlighted as a key driver for creating inclusive financial services. However, challenges such as data privacy and security, and the need for ethical algorithmic practices are also identified. This research contributes valuable insights for policymakers, regulators, and industry practitioners, suggesting a need for balanced regulatory frameworks to harness big data's potential ethically and responsibly. The outcomes of this study underscore the transformative power of big data in FinTech, indicating a pathway toward a more inclusive financial ecosystem.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141004699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1