Big Data and Cognitive Computing最新文献_第4页

Enhancing Supervised Model Performance in Credit Risk Classification Using Sampling Strategies and Feature Ranking 利用采样策略和特征排序提高信用风险分类中的监督模型性能

Big Data and Cognitive Computing

Pub Date : 2024-03-06 DOI: 10.3390/bdcc8030028

N. Wattanakitrungroj, Pimchanok Wijitkajee, S. Jaiyen, Sunisa Sathapornvajana, Sasiporn Tongman

For the financial health of lenders and institutions, one important risk assessment called credit risk is about correctly deciding whether or not a borrower will fail to repay a loan. It not only helps in the approval or denial of loan applications but also aids in managing the non-performing loan (NPL) trend. In this study, a dataset provided by the LendingClub company based in San Francisco, CA, USA, from 2007 to 2020 consisting of 2,925,492 records and 141 attributes was experimented with. The loan status was categorized as “Good” or “Risk”. To yield highly effective results of credit risk prediction, experiments on credit risk prediction were performed using three widely adopted supervised machine learning techniques: logistic regression, random forest, and gradient boosting. In addition, to solve the imbalanced data problem, three sampling algorithms, including under-sampling, over-sampling, and combined sampling, were employed. The results show that the gradient boosting technique achieves nearly perfect Accuracy, Precision, Recall, and F1score values, which are better than 99.92%, but its MCC values are greater than 99.77%. Three imbalanced data handling approaches can enhance the model performance of models trained by three algorithms. Moreover, the experiment of reducing the number of features based on mutual information calculation revealed slightly decreasing performance for 50 data features with Accuracy values greater than 99.86%. For 25 data features, which is the smallest size, the random forest supervised model yielded 99.15% Accuracy. Both sampling strategies and feature selection help to improve the supervised model for accurately predicting credit risk, which may be beneficial in the lending business.

对于贷款人和机构的财务健康而言，有一项重要的风险评估被称为信用风险，即正确判断借款人是否会无法偿还贷款。它不仅有助于批准或拒绝贷款申请，还有助于管理不良贷款（NPL）趋势。在本研究中，我们对位于美国加利福尼亚州旧金山的 LendingClub 公司提供的 2007 年至 2020 年数据集进行了实验，该数据集包含 2,925,492 条记录和 141 个属性。贷款状态分为 "良好 "和 "风险 "两种。为了获得高效的信用风险预测结果，使用了三种广泛采用的监督机器学习技术：逻辑回归、随机森林和梯度提升，对信用风险预测进行了实验。此外，为了解决不平衡数据问题，还采用了三种采样算法，包括欠采样、过采样和组合采样。结果表明，梯度提升技术实现了近乎完美的准确率、精确率、召回率和 F1score 值，均优于 99.92%，但其 MCC 值大于 99.77%。三种不平衡数据处理方法可以提高三种算法训练的模型性能。此外，基于互信息计算减少特征数量的实验表明，50 个数据特征的准确率值大于 99.86%，性能略有下降。对于 25 个数据特征，也就是最小的数据特征，随机森林监督模型的准确率为 99.15%。采样策略和特征选择都有助于改进监督模型，从而准确预测信贷风险，这可能对贷款业务有益。

{"title":"Enhancing Supervised Model Performance in Credit Risk Classification Using Sampling Strategies and Feature Ranking","authors":"N. Wattanakitrungroj, Pimchanok Wijitkajee, S. Jaiyen, Sunisa Sathapornvajana, Sasiporn Tongman","doi":"10.3390/bdcc8030028","DOIUrl":"https://doi.org/10.3390/bdcc8030028","url":null,"abstract":"For the financial health of lenders and institutions, one important risk assessment called credit risk is about correctly deciding whether or not a borrower will fail to repay a loan. It not only helps in the approval or denial of loan applications but also aids in managing the non-performing loan (NPL) trend. In this study, a dataset provided by the LendingClub company based in San Francisco, CA, USA, from 2007 to 2020 consisting of 2,925,492 records and 141 attributes was experimented with. The loan status was categorized as “Good” or “Risk”. To yield highly effective results of credit risk prediction, experiments on credit risk prediction were performed using three widely adopted supervised machine learning techniques: logistic regression, random forest, and gradient boosting. In addition, to solve the imbalanced data problem, three sampling algorithms, including under-sampling, over-sampling, and combined sampling, were employed. The results show that the gradient boosting technique achieves nearly perfect Accuracy, Precision, Recall, and F1score values, which are better than 99.92%, but its MCC values are greater than 99.77%. Three imbalanced data handling approaches can enhance the model performance of models trained by three algorithms. Moreover, the experiment of reducing the number of features based on mutual information calculation revealed slightly decreasing performance for 50 data features with Accuracy values greater than 99.86%. For 25 data features, which is the smallest size, the random forest supervised model yielded 99.15% Accuracy. Both sampling strategies and feature selection help to improve the supervised model for accurately predicting credit risk, which may be beneficial in the lending business.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140078506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal Dynamics of Citizen-Reported Urban Challenges: A Comprehensive Time Series Analysis 市民报告的城市挑战的时间动态：综合时间序列分析

Big Data and Cognitive Computing

Pub Date : 2024-03-04 DOI: 10.3390/bdcc8030027

Andreas F. Gkontzis, S. Kotsiantis, G. Feretzakis, V. Verykios

In an epoch characterized by the swift pace of digitalization and urbanization, the essence of community well-being hinges on the efficacy of urban management. As cities burgeon and transform, the need for astute strategies to navigate the complexities of urban life becomes increasingly paramount. This study employs time series analysis to scrutinize citizen interactions with the coordinate-based problem mapping platform in the Municipality of Patras in Greece. The research explores the temporal dynamics of reported urban issues, with a specific focus on identifying recurring patterns through the lens of seasonality. The analysis, employing the seasonal decomposition technique, dissects time series data to expose trends in reported issues and areas of the city that might be obscured in raw big data. It accentuates a distinct seasonal pattern, with concentrations peaking during the summer months. The study extends its approach to forecasting, providing insights into the anticipated evolution of urban issues over time. Projections for the coming years show a consistent upward trend in both overall city issues and those reported in specific areas, with distinct seasonal variations. This comprehensive exploration of time series analysis and seasonality provides valuable insights for city stakeholders, enabling informed decision-making and predictions regarding future urban challenges.

在以数字化和城市化的迅猛发展为特征的时代，社区福祉的本质取决于城市管理的效率。随着城市的蓬勃发展和转型，制定精明的策略以驾驭复杂的城市生活变得越来越重要。本研究采用时间序列分析法，对希腊帕特雷市市民与基于坐标的问题映射平台的互动情况进行了仔细研究。研究探讨了所报告的城市问题的时间动态，特别侧重于通过季节性视角识别重复出现的模式。该分析采用季节分解技术，对时间序列数据进行剖析，以揭示原始大数据中可能被掩盖的报告问题和城市区域的趋势。它突出了一个明显的季节性模式，即夏季的浓度达到峰值。该研究将其方法扩展到了预测，提供了对城市问题随着时间推移的预期演变的见解。对未来几年的预测显示，总体城市问题和特定地区报告的城市问题都呈持续上升趋势，并伴有明显的季节性变化。这种对时间序列分析和季节性的全面探索为城市利益相关者提供了宝贵的见解，使他们能够对未来的城市挑战做出明智的决策和预测。

{"title":"Temporal Dynamics of Citizen-Reported Urban Challenges: A Comprehensive Time Series Analysis","authors":"Andreas F. Gkontzis, S. Kotsiantis, G. Feretzakis, V. Verykios","doi":"10.3390/bdcc8030027","DOIUrl":"https://doi.org/10.3390/bdcc8030027","url":null,"abstract":"In an epoch characterized by the swift pace of digitalization and urbanization, the essence of community well-being hinges on the efficacy of urban management. As cities burgeon and transform, the need for astute strategies to navigate the complexities of urban life becomes increasingly paramount. This study employs time series analysis to scrutinize citizen interactions with the coordinate-based problem mapping platform in the Municipality of Patras in Greece. The research explores the temporal dynamics of reported urban issues, with a specific focus on identifying recurring patterns through the lens of seasonality. The analysis, employing the seasonal decomposition technique, dissects time series data to expose trends in reported issues and areas of the city that might be obscured in raw big data. It accentuates a distinct seasonal pattern, with concentrations peaking during the summer months. The study extends its approach to forecasting, providing insights into the anticipated evolution of urban issues over time. Projections for the coming years show a consistent upward trend in both overall city issues and those reported in specific areas, with distinct seasonal variations. This comprehensive exploration of time series analysis and seasonality provides valuable insights for city stakeholders, enabling informed decision-making and predictions regarding future urban challenges.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140079972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Democratic Erosion of Data-Opolies: Decentralized Web3 Technological Paradigm Shift Amidst AI Disruption 数据政治的民主侵蚀：去中心化网络3 人工智能颠覆下的技术范式转变

Big Data and Cognitive Computing

Pub Date : 2024-02-26 DOI: 10.3390/bdcc8030026

Igor Calzada

This article investigates the intricate dynamics of data monopolies, referred to as “data-opolies”, and their implications for democratic erosion. Data-opolies, typically embodied by large technology corporations, accumulate extensive datasets, affording them significant influence. The sustainability of such data practices is critically examined within the context of decentralized Web3 technologies amidst Artificial Intelligence (AI) disruption. Additionally, the article explores emancipatory datafication strategies to counterbalance the dominance of data-opolies. It presents an in-depth analysis of two emergent phenomena within the decentralized Web3 emerging landscape: People-Centered Smart Cities and Datafied Network States. The article investigates a paradigm shift in data governance and advocates for joint efforts to establish equitable data ecosystems, with an emphasis on prioritizing data sovereignty and achieving digital self-governance. It elucidates the remarkable roles of (i) blockchain, (ii) decentralized autonomous organizations (DAOs), and (iii) data cooperatives in empowering citizens to have control over their personal data. In conclusion, the article introduces a forward-looking examination of Web3 decentralized technologies, outlining a timely path toward a more transparent, inclusive, and emancipatory data-driven democracy. This approach challenges the prevailing dominance of data-opolies and offers a framework for regenerating datafied democracies through decentralized and emerging Web3 technologies.

本文研究了被称为 "数据垄断 "的数据垄断的复杂动态及其对民主侵蚀的影响。数据垄断通常由大型科技公司体现，它们积累了大量的数据集，为其提供了巨大的影响力。在人工智能（AI）颠覆性发展的背景下，这种数据实践的可持续性在去中心化的 Web3 技术中得到了批判性的审视。此外，文章还探讨了抗衡数据垄断的解放性数据化战略。文章深入分析了去中心化 Web3 新兴领域中出现的两种现象：以人为本的智能城市和数据化的网络国家。文章探讨了数据治理模式的转变，倡导共同努力建立公平的数据生态系统，重点是优先考虑数据主权和实现数字自治。文章阐明了（i）区块链、（ii）去中心化自治组织（DAOs）和（iii）数据合作社在增强公民对其个人数据的控制权方面的显著作用。最后，文章对 Web3 的去中心化技术进行了前瞻性研究，勾勒出一条通往更加透明、包容和解放的数据驱动型民主的及时之路。这种方法挑战了数据垄断的主导地位，为通过去中心化和新兴的 Web3 技术再生数据化民主提供了一个框架。

{"title":"Democratic Erosion of Data-Opolies: Decentralized Web3 Technological Paradigm Shift Amidst AI Disruption","authors":"Igor Calzada","doi":"10.3390/bdcc8030026","DOIUrl":"https://doi.org/10.3390/bdcc8030026","url":null,"abstract":"This article investigates the intricate dynamics of data monopolies, referred to as “data-opolies”, and their implications for democratic erosion. Data-opolies, typically embodied by large technology corporations, accumulate extensive datasets, affording them significant influence. The sustainability of such data practices is critically examined within the context of decentralized Web3 technologies amidst Artificial Intelligence (AI) disruption. Additionally, the article explores emancipatory datafication strategies to counterbalance the dominance of data-opolies. It presents an in-depth analysis of two emergent phenomena within the decentralized Web3 emerging landscape: People-Centered Smart Cities and Datafied Network States. The article investigates a paradigm shift in data governance and advocates for joint efforts to establish equitable data ecosystems, with an emphasis on prioritizing data sovereignty and achieving digital self-governance. It elucidates the remarkable roles of (i) blockchain, (ii) decentralized autonomous organizations (DAOs), and (iii) data cooperatives in empowering citizens to have control over their personal data. In conclusion, the article introduces a forward-looking examination of Web3 decentralized technologies, outlining a timely path toward a more transparent, inclusive, and emancipatory data-driven democracy. This approach challenges the prevailing dominance of data-opolies and offers a framework for regenerating datafied democracies through decentralized and emerging Web3 technologies.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140429220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sign-to-Text Translation from Panamanian Sign Language to Spanish in Continuous Capture Mode with Deep Neural Networks 利用深度神经网络在连续捕捉模式下将巴拿马手语翻译成西班牙语

Big Data and Cognitive Computing

Pub Date : 2024-02-26 DOI: 10.3390/bdcc8030025

Alvaro A. Teran-Quezada, Victor Lopez-Cabrera, J. Rangel, J. Sánchez-Galán

Convolutional neural networks (CNN) have provided great advances for the task of sign language recognition (SLR). However, recurrent neural networks (RNN) in the form of long–short-term memory (LSTM) have become a means for providing solutions to problems involving sequential data. This research proposes the development of a sign language translation system that converts Panamanian Sign Language (PSL) signs into text in Spanish using an LSTM model that, among many things, makes it possible to work with non-static signs (as sequential data). The deep learning model presented focuses on action detection, in this case, the execution of the signs. This involves processing in a precise manner the frames in which a sign language gesture is made. The proposal is a holistic solution that considers, in addition to the seeking of the hands of the speaker, the face and pose determinants. These were added due to the fact that when communicating through sign languages, other visual characteristics matter beyond hand gestures. For the training of this system, a data set of 330 videos (of 30 frames each) for five possible classes (different signs considered) was created. The model was tested having an accuracy of 98.8%, making this a valuable base system for effective communication between PSL users and Spanish speakers. In conclusion, this work provides an improvement of the state of the art for PSL–Spanish translation by using the possibilities of translatable signs via deep learning.

卷积神经网络（CNN）为手语识别（SLR）任务带来了巨大的进步。然而，长短期记忆（LSTM）形式的递归神经网络（RNN）已成为为涉及序列数据的问题提供解决方案的一种手段。本研究提出开发一种手语翻译系统，利用 LSTM 模型将巴拿马手语（PSL）符号转换为西班牙语文本。所介绍的深度学习模型侧重于动作检测，在这种情况下，就是手势的执行。这涉及以精确的方式处理手语手势的帧。该提案是一个整体解决方案，除了寻找说话者的手之外，还考虑了面部和姿势决定因素。之所以加入这些因素，是因为在通过手语进行交流时，除了手势之外，其他视觉特征也很重要。为了对该系统进行训练，我们创建了一个包含 330 个视频（每个视频 30 帧）的数据集，涉及五个可能的类别（考虑了不同的手势）。经测试，该模型的准确率为 98.8%，是 PSL 用户与讲西班牙语者进行有效交流的重要基础系统。总之，这项工作通过深度学习利用可翻译标志的可能性，改进了 PSL-西班牙语翻译的技术水平。

{"title":"Sign-to-Text Translation from Panamanian Sign Language to Spanish in Continuous Capture Mode with Deep Neural Networks","authors":"Alvaro A. Teran-Quezada, Victor Lopez-Cabrera, J. Rangel, J. Sánchez-Galán","doi":"10.3390/bdcc8030025","DOIUrl":"https://doi.org/10.3390/bdcc8030025","url":null,"abstract":"Convolutional neural networks (CNN) have provided great advances for the task of sign language recognition (SLR). However, recurrent neural networks (RNN) in the form of long–short-term memory (LSTM) have become a means for providing solutions to problems involving sequential data. This research proposes the development of a sign language translation system that converts Panamanian Sign Language (PSL) signs into text in Spanish using an LSTM model that, among many things, makes it possible to work with non-static signs (as sequential data). The deep learning model presented focuses on action detection, in this case, the execution of the signs. This involves processing in a precise manner the frames in which a sign language gesture is made. The proposal is a holistic solution that considers, in addition to the seeking of the hands of the speaker, the face and pose determinants. These were added due to the fact that when communicating through sign languages, other visual characteristics matter beyond hand gestures. For the training of this system, a data set of 330 videos (of 30 frames each) for five possible classes (different signs considered) was created. The model was tested having an accuracy of 98.8%, making this a valuable base system for effective communication between PSL users and Spanish speakers. In conclusion, this work provides an improvement of the state of the art for PSL–Spanish translation by using the possibilities of translatable signs via deep learning.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140428735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Experimental Evaluation: Can Humans Recognise Social Media Bots? 实验评估：人类能否识别社交媒体机器人？

Big Data and Cognitive Computing

Pub Date : 2024-02-26 DOI: 10.3390/bdcc8030024

M. Kolomeets, O. Tushkanova, Vasily Desnitsky, L. Vitkova, Andrey Chechulin

This paper aims to test the hypothesis that the quality of social media bot detection systems based on supervised machine learning may not be as accurate as researchers claim, given that bots have become increasingly sophisticated, making it difficult for human annotators to detect them better than random selection. As a result, obtaining a ground-truth dataset with human annotation is not possible, which leads to supervised machine-learning models inheriting annotation errors. To test this hypothesis, we conducted an experiment where humans were tasked with recognizing malicious bots on the VKontakte social network. We then compared the “human” answers with the “ground-truth” bot labels (‘a bot’/‘not a bot’). Based on the experiment, we evaluated the bot detection efficiency of annotators in three scenarios typical for cybersecurity but differing in their detection difficulty as follows: (1) detection among random accounts, (2) detection among accounts of a social network ‘community’, and (3) detection among verified accounts. The study showed that humans could only detect simple bots in all three scenarios but could not detect more sophisticated ones (p-value = 0.05). The study also evaluates the limits of hypothetical and existing bot detection systems that leverage non-expert-labelled datasets as follows: the balanced accuracy of such systems can drop to 0.5 and lower, depending on bot complexity and detection scenario. The paper also describes the experiment design, collected datasets, statistical evaluation, and machine learning accuracy measures applied to support the results. In the discussion, we raise the question of using human labelling in bot detection systems and its potential cybersecurity issues. We also provide open access to the datasets used, experiment results, and software code for evaluating statistical and machine learning accuracy metrics used in this paper on GitHub.

本文旨在验证一个假设，即基于有监督机器学习的社交媒体僵尸检测系统的质量可能并不像研究人员声称的那样准确，因为僵尸变得越来越复杂，使得人类注释者很难比随机选择更好地检测到它们。因此，不可能通过人工标注来获得真实数据集，这就导致监督机器学习模型继承了标注错误。为了验证这一假设，我们进行了一项实验，让人类在 VKontakte 社交网络上识别恶意机器人。然后，我们将 "人类 "的答案与 "地面实况 "的僵尸标签（"僵尸"/"非僵尸"）进行比较。在实验的基础上，我们评估了注释者在以下三种典型的网络安全场景中的僵尸检测效率，这些场景的检测难度各不相同：(1）随机账户中的检测；（2）社交网络 "社区 "账户中的检测；（3）验证账户中的检测。研究表明，在所有三种情况下，人类只能检测到简单的机器人，而无法检测到更复杂的机器人（P 值 = 0.05）。该研究还对利用非外部标签数据集的假设和现有僵尸检测系统的局限性进行了评估：根据僵尸复杂性和检测场景的不同，此类系统的平衡准确率可降至 0.5 或更低。本文还介绍了实验设计、收集的数据集、统计评估以及用于支持结果的机器学习准确度测量。在讨论中，我们提出了在僵尸检测系统中使用人工标记的问题及其潜在的网络安全问题。我们还在 GitHub 上提供了用于评估本文所用统计和机器学习准确度指标的数据集、实验结果和软件代码的开放访问权限。

{"title":"Experimental Evaluation: Can Humans Recognise Social Media Bots?","authors":"M. Kolomeets, O. Tushkanova, Vasily Desnitsky, L. Vitkova, Andrey Chechulin","doi":"10.3390/bdcc8030024","DOIUrl":"https://doi.org/10.3390/bdcc8030024","url":null,"abstract":"This paper aims to test the hypothesis that the quality of social media bot detection systems based on supervised machine learning may not be as accurate as researchers claim, given that bots have become increasingly sophisticated, making it difficult for human annotators to detect them better than random selection. As a result, obtaining a ground-truth dataset with human annotation is not possible, which leads to supervised machine-learning models inheriting annotation errors. To test this hypothesis, we conducted an experiment where humans were tasked with recognizing malicious bots on the VKontakte social network. We then compared the “human” answers with the “ground-truth” bot labels (‘a bot’/‘not a bot’). Based on the experiment, we evaluated the bot detection efficiency of annotators in three scenarios typical for cybersecurity but differing in their detection difficulty as follows: (1) detection among random accounts, (2) detection among accounts of a social network ‘community’, and (3) detection among verified accounts. The study showed that humans could only detect simple bots in all three scenarios but could not detect more sophisticated ones (p-value = 0.05). The study also evaluates the limits of hypothetical and existing bot detection systems that leverage non-expert-labelled datasets as follows: the balanced accuracy of such systems can drop to 0.5 and lower, depending on bot complexity and detection scenario. The paper also describes the experiment design, collected datasets, statistical evaluation, and machine learning accuracy measures applied to support the results. In the discussion, we raise the question of using human labelling in bot detection systems and its potential cybersecurity issues. We also provide open access to the datasets used, experiment results, and software code for evaluating statistical and machine learning accuracy metrics used in this paper on GitHub.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140429684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Solar and Wind Data Recognition: Fourier Regression for Robust Recovery 太阳能和风能数据识别：用于稳健恢复的傅立叶回归

Big Data and Cognitive Computing

Pub Date : 2024-02-24 DOI: 10.3390/bdcc8030023

Abdullah F. Al-Aboosi, Aldo Jonathan Muñoz Vázquez, Fadhil Y. Al-Aboosi, Mahmoud M. El-Halwagi, Wei Zhan

Accurate prediction of renewable energy output is essential for integrating sustainable energy sources into the grid, facilitating a transition towards a more resilient energy infrastructure. Novel applications of machine learning and artificial intelligence are being leveraged to enhance forecasting methodologies, enabling more accurate predictions and optimized decision-making capabilities. Integrating these novel paradigms improves forecasting accuracy, fostering a more efficient and reliable energy grid. These advancements allow better demand management, optimize resource allocation, and improve robustness to potential disruptions. The data collected from solar intensity and wind speed is often recorded through sensor-equipped instruments, which may encounter intermittent or permanent faults. Hence, this paper proposes a novel Fourier network regression model to process solar irradiance and wind speed data. The proposed approach enables accurate prediction of the underlying smooth components, facilitating effective reconstruction of missing data and enhancing the overall forecasting performance. The present study focuses on Midland, Texas, as a case study to assess direct normal irradiance (DNI), diffuse horizontal irradiance (DHI), and wind speed. Remarkably, the model exhibits a correlation of 1 with a minimal RMSE (root mean square error) of 0.0007555. This study leverages Fourier analysis for renewable energy applications, with the aim of establishing a methodology that can be applied to a novel geographic context.

准确预测可再生能源的输出对于将可持续能源纳入电网、促进向更具弹性的能源基础设施过渡至关重要。目前正在利用机器学习和人工智能的新应用来加强预测方法，从而实现更准确的预测和优化的决策能力。整合这些新模式可提高预测的准确性，促进建立更高效、更可靠的能源网。这些进步可以更好地进行需求管理、优化资源配置，并提高应对潜在干扰的能力。从太阳强度和风速收集到的数据通常是通过配备传感器的仪器记录的，这些仪器可能会遇到间歇性或永久性故障。因此，本文提出了一种处理太阳辐照度和风速数据的新型傅立叶网络回归模型。所提出的方法能够准确预测基本的平稳成分，有助于有效重建缺失数据，提高整体预报性能。本研究以得克萨斯州米德兰市为案例，评估了直接法线辐照度（DNI）、弥散水平辐照度（DHI）和风速。值得注意的是，该模型的相关性为 1，最小 RMSE（均方根误差）为 0.0007555。这项研究将傅立叶分析用于可再生能源应用，旨在建立一种可应用于新地理环境的方法。

{"title":"Solar and Wind Data Recognition: Fourier Regression for Robust Recovery","authors":"Abdullah F. Al-Aboosi, Aldo Jonathan Muñoz Vázquez, Fadhil Y. Al-Aboosi, Mahmoud M. El-Halwagi, Wei Zhan","doi":"10.3390/bdcc8030023","DOIUrl":"https://doi.org/10.3390/bdcc8030023","url":null,"abstract":"Accurate prediction of renewable energy output is essential for integrating sustainable energy sources into the grid, facilitating a transition towards a more resilient energy infrastructure. Novel applications of machine learning and artificial intelligence are being leveraged to enhance forecasting methodologies, enabling more accurate predictions and optimized decision-making capabilities. Integrating these novel paradigms improves forecasting accuracy, fostering a more efficient and reliable energy grid. These advancements allow better demand management, optimize resource allocation, and improve robustness to potential disruptions. The data collected from solar intensity and wind speed is often recorded through sensor-equipped instruments, which may encounter intermittent or permanent faults. Hence, this paper proposes a novel Fourier network regression model to process solar irradiance and wind speed data. The proposed approach enables accurate prediction of the underlying smooth components, facilitating effective reconstruction of missing data and enhancing the overall forecasting performance. The present study focuses on Midland, Texas, as a case study to assess direct normal irradiance (DNI), diffuse horizontal irradiance (DHI), and wind speed. Remarkably, the model exhibits a correlation of 1 with a minimal RMSE (root mean square error) of 0.0007555. This study leverages Fourier analysis for renewable energy applications, with the aim of establishing a methodology that can be applied to a novel geographic context.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140434782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparison of Bagging and Sparcity Methods for Connectivity Reduction in Spiking Neural Networks with Memristive Plasticity 具有记忆可塑性的尖峰神经网络中用于降低连接性的套袋法和稀缺性法的比较

Big Data and Cognitive Computing

Pub Date : 2024-02-23 DOI: 10.3390/bdcc8030022

R. Rybka, Yury Davydov, Danila Vlasov, A. Serenko, A. Sboev, Vyacheslav Ilyin

Developing a spiking neural network architecture that could prospectively be trained on energy-efficient neuromorphic hardware to solve various data analysis tasks requires satisfying the limitations of prospective analog or digital hardware, i.e., local learning and limited numbers of connections, respectively. In this work, we compare two methods of connectivity reduction that are applicable to spiking networks with local plasticity; instead of a large fully-connected network (which is used as the baseline for comparison), we employ either an ensemble of independent small networks or a network with probabilistic sparse connectivity. We evaluate both of these methods with a three-layer spiking neural network, which are applied to handwritten and spoken digit classification tasks using two memristive plasticity models and the classical spike time-dependent plasticity (STDP) rule. Both methods achieve an F1-score of 0.93–0.95 on the handwritten digits recognition task and 0.85–0.93 on the spoken digits recognition task. Applying a combination of both methods made it possible to obtain highly accurate models while reducing the number of connections by more than three times compared to the basic model.

开发可在高能效神经形态硬件上进行前瞻性训练的尖峰神经网络架构，以解决各种数据分析任务，需要满足前瞻性模拟或数字硬件的限制，即局部学习和有限的连接数。在这项工作中，我们比较了两种适用于具有局部可塑性的尖峰网络的连接性降低方法；我们采用了独立小型网络集合或具有概率稀疏连接性的网络，而不是大型全连接网络（用作比较基准）。我们利用三层尖峰神经网络对这两种方法进行了评估，并使用两种记忆可塑性模型和经典的尖峰时间可塑性（STDP）规则将其应用于手写和口语数字分类任务。两种方法在手写数字识别任务中的 F1 分数都达到了 0.93-0.95，在口语数字识别任务中的 F1 分数都达到了 0.85-0.93。与基本模型相比，将这两种方法结合使用可以在减少三倍以上连接数的同时获得高精度模型。

{"title":"Comparison of Bagging and Sparcity Methods for Connectivity Reduction in Spiking Neural Networks with Memristive Plasticity","authors":"R. Rybka, Yury Davydov, Danila Vlasov, A. Serenko, A. Sboev, Vyacheslav Ilyin","doi":"10.3390/bdcc8030022","DOIUrl":"https://doi.org/10.3390/bdcc8030022","url":null,"abstract":"Developing a spiking neural network architecture that could prospectively be trained on energy-efficient neuromorphic hardware to solve various data analysis tasks requires satisfying the limitations of prospective analog or digital hardware, i.e., local learning and limited numbers of connections, respectively. In this work, we compare two methods of connectivity reduction that are applicable to spiking networks with local plasticity; instead of a large fully-connected network (which is used as the baseline for comparison), we employ either an ensemble of independent small networks or a network with probabilistic sparse connectivity. We evaluate both of these methods with a three-layer spiking neural network, which are applied to handwritten and spoken digit classification tasks using two memristive plasticity models and the classical spike time-dependent plasticity (STDP) rule. Both methods achieve an F1-score of 0.93–0.95 on the handwritten digits recognition task and 0.85–0.93 on the spoken digits recognition task. Applying a combination of both methods made it possible to obtain highly accurate models while reducing the number of connections by more than three times compared to the basic model.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140436199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anomaly Detection of IoT Cyberattacks in Smart Cities Using Federated Learning and Split Learning 利用联合学习和拆分学习对智慧城市中的物联网网络攻击进行异常检测

Big Data and Cognitive Computing

Pub Date : 2024-02-22 DOI: 10.3390/bdcc8030021

Ishaani Priyadarshini

The swift proliferation of the Internet of Things (IoT) devices in smart city infrastructures has created an urgent demand for robust cybersecurity measures. These devices are susceptible to various cyberattacks that can jeopardize the security and functionality of urban systems. This research presents an innovative approach to identifying anomalies caused by IoT cyberattacks in smart cities. The proposed method harnesses federated and split learning and addresses the dual challenge of enhancing IoT network security while preserving data privacy. This study conducts extensive experiments using authentic datasets from smart cities. To compare the performance of classical machine learning algorithms and deep learning models for detecting anomalies, model effectiveness is assessed using precision, recall, F-1 score, accuracy, and training/deployment time. The findings demonstrate that federated learning and split learning have the potential to balance data privacy concerns with competitive performance, providing robust solutions for detecting IoT cyberattacks. This study contributes to the ongoing discussion about securing IoT deployments in urban settings. It lays the groundwork for scalable and privacy-conscious cybersecurity strategies. The results underscore the vital role of these techniques in fortifying smart cities and promoting the development of adaptable and resilient cybersecurity measures in the IoT era.

物联网（IoT）设备在智能城市基础设施中的迅速普及，迫切要求采取强有力的网络安全措施。这些设备很容易受到各种网络攻击，从而危及城市系统的安全和功能。本研究提出了一种创新方法，用于识别智慧城市中物联网网络攻击导致的异常情况。所提出的方法利用了联合学习和拆分学习，解决了增强物联网网络安全和保护数据隐私的双重挑战。本研究使用来自智慧城市的真实数据集进行了大量实验。为了比较经典机器学习算法和深度学习模型在检测异常情况方面的性能，使用精度、召回率、F-1 分数、准确率和训练/部署时间来评估模型的有效性。研究结果表明，联合学习和拆分学习有可能在数据隐私问题和竞争性能之间取得平衡，为检测物联网网络攻击提供强大的解决方案。这项研究为正在进行的关于确保城市环境中物联网部署安全的讨论做出了贡献。它为可扩展、注重隐私的网络安全战略奠定了基础。研究结果强调了这些技术在强化智慧城市和促进物联网时代适应性和弹性网络安全措施发展方面的重要作用。

{"title":"Anomaly Detection of IoT Cyberattacks in Smart Cities Using Federated Learning and Split Learning","authors":"Ishaani Priyadarshini","doi":"10.3390/bdcc8030021","DOIUrl":"https://doi.org/10.3390/bdcc8030021","url":null,"abstract":"The swift proliferation of the Internet of Things (IoT) devices in smart city infrastructures has created an urgent demand for robust cybersecurity measures. These devices are susceptible to various cyberattacks that can jeopardize the security and functionality of urban systems. This research presents an innovative approach to identifying anomalies caused by IoT cyberattacks in smart cities. The proposed method harnesses federated and split learning and addresses the dual challenge of enhancing IoT network security while preserving data privacy. This study conducts extensive experiments using authentic datasets from smart cities. To compare the performance of classical machine learning algorithms and deep learning models for detecting anomalies, model effectiveness is assessed using precision, recall, F-1 score, accuracy, and training/deployment time. The findings demonstrate that federated learning and split learning have the potential to balance data privacy concerns with competitive performance, providing robust solutions for detecting IoT cyberattacks. This study contributes to the ongoing discussion about securing IoT deployments in urban settings. It lays the groundwork for scalable and privacy-conscious cybersecurity strategies. The results underscore the vital role of these techniques in fortifying smart cities and promoting the development of adaptable and resilient cybersecurity measures in the IoT era.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140438100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Algorithm for Multi-Criteria Ontology Merging through Iterative Update of RDF Graph 通过迭代更新 RDF 图实现多标准本体合并的新算法

Big Data and Cognitive Computing

Pub Date : 2024-02-21 DOI: 10.3390/bdcc8030019

M. Rudwan, Jean Vincent Fonou-Dombeu

Ontology merging is an important task in ontology engineering to date. However, despite the efforts devoted to ontology merging, the incorporation of relevant features of ontologies such as axioms, individuals and annotations in the output ontologies remains challenging. Consequently, existing ontology-merging solutions produce new ontologies that do not include all the relevant semantic features from the candidate ontologies. To address these limitations, this paper proposes a novel algorithm for multi-criteria ontology merging that automatically builds a new ontology from candidate ontologies by iteratively updating an RDF graph in the memory. The proposed algorithm leverages state-of-the-art Natural Language Processing tools as well as a Machine Learning-based framework to assess the similarities and merge various criteria into the resulting output ontology. The key contribution of the proposed algorithm lies in its ability to merge relevant features from the candidate ontologies to build a more accurate, integrated and cohesive output ontology. The proposed algorithm is tested with five ontologies of different computing domains and evaluated in terms of its asymptotic behavior, quality and computational performance. The experimental results indicate that the proposed algorithm produces output ontologies that meet the integrity, accuracy and cohesion quality criteria better than related studies. This performance demonstrates the effectiveness and superior capabilities of the proposed algorithm. Furthermore, the proposed algorithm enables iterative in-memory update and building of the RDF graph of the resulting output ontology, which enhances the processing speed and improves the computational efficiency, making it an ideal solution for big data applications.

本体合并是本体工程迄今为止的一项重要任务。然而，尽管人们致力于本体合并，但将本体的相关特征（如公理、个体和注释）纳入输出本体仍具有挑战性。因此，现有的本体合并解决方案生成的新本体并不包含候选本体的所有相关语义特征。为了解决这些局限性，本文提出了一种用于多标准本体合并的新型算法，该算法通过迭代更新内存中的 RDF 图，自动从候选本体中构建新本体。该算法利用最先进的自然语言处理工具和基于机器学习的框架来评估相似性，并将各种标准合并到最终输出的本体中。拟议算法的主要贡献在于它能够合并候选本体中的相关特征，从而建立一个更准确、更集成、更有内聚力的输出本体。我们用不同计算领域的五个本体对所提出的算法进行了测试，并对其渐近行为、质量和计算性能进行了评估。实验结果表明，与相关研究相比，拟议算法生成的输出本体更符合完整性、准确性和内聚性质量标准。这一表现证明了所提算法的有效性和卓越能力。此外，所提算法还能对生成的输出本体的 RDF 图进行迭代内存更新和构建，从而提高了处理速度和计算效率，是大数据应用的理想解决方案。

{"title":"A Novel Algorithm for Multi-Criteria Ontology Merging through Iterative Update of RDF Graph","authors":"M. Rudwan, Jean Vincent Fonou-Dombeu","doi":"10.3390/bdcc8030019","DOIUrl":"https://doi.org/10.3390/bdcc8030019","url":null,"abstract":"Ontology merging is an important task in ontology engineering to date. However, despite the efforts devoted to ontology merging, the incorporation of relevant features of ontologies such as axioms, individuals and annotations in the output ontologies remains challenging. Consequently, existing ontology-merging solutions produce new ontologies that do not include all the relevant semantic features from the candidate ontologies. To address these limitations, this paper proposes a novel algorithm for multi-criteria ontology merging that automatically builds a new ontology from candidate ontologies by iteratively updating an RDF graph in the memory. The proposed algorithm leverages state-of-the-art Natural Language Processing tools as well as a Machine Learning-based framework to assess the similarities and merge various criteria into the resulting output ontology. The key contribution of the proposed algorithm lies in its ability to merge relevant features from the candidate ontologies to build a more accurate, integrated and cohesive output ontology. The proposed algorithm is tested with five ontologies of different computing domains and evaluated in terms of its asymptotic behavior, quality and computational performance. The experimental results indicate that the proposed algorithm produces output ontologies that meet the integrity, accuracy and cohesion quality criteria better than related studies. This performance demonstrates the effectiveness and superior capabilities of the proposed algorithm. Furthermore, the proposed algorithm enables iterative in-memory update and building of the RDF graph of the resulting output ontology, which enhances the processing speed and improves the computational efficiency, making it an ideal solution for big data applications.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140442413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Image Characterization for In-Bed Posture Classification by Using SVM Algorithm 利用 SVM 算法进行躺姿分类的最佳图像特征描述

Big Data and Cognitive Computing

Pub Date : 2024-01-26 DOI: 10.3390/bdcc8020013

C. A. Rivera-Romero, J. U. Munoz-Minjares, Carlos Lastre-Dominguez, M. Lopez-Ramirez

Identifying patient posture while they are lying in bed is an important task in medical applications such as monitoring a patient after a surgical intervention, sleep supervision to identify behavioral and physiological markers, or for bedsore prevention. An acceptable strategy to identify the patient’s position is the classification of images created from a grid of pressure sensors located in the bed. These samples can be arranged based on supervised learning methods. Usually, image conditioning is required before images are loaded into a learning method to increase classification accuracy. However, continuous monitoring of a person requires large amounts of time and computational resources if complex pre-processing algorithms are used. So, the problem is to classify the image posture of patients with different weights, heights, and positions by using minimal sample conditioning for a specific supervised learning method. In this work, it is proposed to identify the patient posture from pressure sensor images by using well-known and simple conditioning techniques and selecting the optimal texture descriptors for the Support Vector Machine (SVM) method. This is in order to obtain the best classification and to avoid image over-processing in the conditioning stage for the SVM. The experimental stages are performed with the color models Red, Green, and Blue (RGB) and Hue, Saturation, and Value (HSV). The results show an increase in accuracy from 86.9% to 92.9% and in kappa value from 0.825 to 0.904 using image conditioning with histogram equalization and a median filter, respectively.

在医疗应用中，识别病人躺在床上时的姿势是一项重要任务，例如监控手术后的病人、睡眠监测以识别行为和生理标记或预防褥疮。一种可接受的识别病人体位的策略是对床上压力传感器网格产生的图像进行分类。这些样本可根据监督学习方法进行排列。通常情况下，在将图像载入学习方法之前需要对图像进行调节，以提高分类的准确性。但是，如果使用复杂的预处理算法，对人的连续监测需要大量的时间和计算资源。因此，问题的关键在于如何通过最小化的样本调节，为特定的监督学习方法对不同体重、身高和体位的患者的图像姿势进行分类。在这项工作中，建议使用众所周知的简单调节技术，并为支持向量机（SVM）方法选择最佳纹理描述符，从而从压力传感器图像中识别病人的姿势。这是为了获得最佳分类，并避免在 SVM 的调节阶段对图像进行过度处理。实验阶段使用红、绿、蓝（RGB）和色调、饱和度、值（HSV）色彩模型。结果显示，使用直方图均衡化和中值滤波器调节图像，准确率从 86.9% 提高到 92.9%，卡帕值从 0.825 提高到 0.904。

{"title":"Optimal Image Characterization for In-Bed Posture Classification by Using SVM Algorithm","authors":"C. A. Rivera-Romero, J. U. Munoz-Minjares, Carlos Lastre-Dominguez, M. Lopez-Ramirez","doi":"10.3390/bdcc8020013","DOIUrl":"https://doi.org/10.3390/bdcc8020013","url":null,"abstract":"Identifying patient posture while they are lying in bed is an important task in medical applications such as monitoring a patient after a surgical intervention, sleep supervision to identify behavioral and physiological markers, or for bedsore prevention. An acceptable strategy to identify the patient’s position is the classification of images created from a grid of pressure sensors located in the bed. These samples can be arranged based on supervised learning methods. Usually, image conditioning is required before images are loaded into a learning method to increase classification accuracy. However, continuous monitoring of a person requires large amounts of time and computational resources if complex pre-processing algorithms are used. So, the problem is to classify the image posture of patients with different weights, heights, and positions by using minimal sample conditioning for a specific supervised learning method. In this work, it is proposed to identify the patient posture from pressure sensor images by using well-known and simple conditioning techniques and selecting the optimal texture descriptors for the Support Vector Machine (SVM) method. This is in order to obtain the best classification and to avoid image over-processing in the conditioning stage for the SVM. The experimental stages are performed with the color models Red, Green, and Blue (RGB) and Hue, Saturation, and Value (HSV). The results show an increase in accuracy from 86.9% to 92.9% and in kappa value from 0.825 to 0.904 using image conditioning with histogram equalization and a median filter, respectively.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139594987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0