Big Data and Cognitive Computing最新文献_第3页

Autonomous Vehicles: Evolution of Artificial Intelligence and the Current Industry Landscape 自动驾驶汽车：人工智能的发展和当前的产业格局

Big Data and Cognitive Computing

Pub Date : 2024-04-07 DOI: 10.3390/bdcc8040042

Divya Garikapati, Sneha Sudhir Shetiya

The advent of autonomous vehicles has heralded a transformative era in transportation, reshaping the landscape of mobility through cutting-edge technologies. Central to this evolution is the integration of artificial intelligence (AI), propelling vehicles into realms of unprecedented autonomy. Commencing with an overview of the current industry landscape with respect to Operational Design Domain (ODD), this paper delves into the fundamental role of AI in shaping the autonomous decision-making capabilities of vehicles. It elucidates the steps involved in the AI-powered development life cycle in vehicles, addressing various challenges such as safety, security, privacy, and ethical considerations in AI-driven software development for autonomous vehicles. The study presents statistical insights into the usage and types of AI algorithms over the years, showcasing the evolving research landscape within the automotive industry. Furthermore, the paper highlights the pivotal role of parameters in refining algorithms for both trucks and cars, facilitating vehicles to adapt, learn, and improve performance over time. It concludes by outlining different levels of autonomy, elucidating the nuanced usage of AI algorithms, and discussing the automation of key tasks and the software package size at each level. Overall, the paper provides a comprehensive analysis of the current industry landscape, focusing on several critical aspects.

自动驾驶汽车的出现预示着交通领域进入了一个变革时代，通过尖端技术重塑了交通领域的格局。这一演变的核心是人工智能（AI）的整合，推动车辆进入前所未有的自动驾驶领域。本文首先概述了当前运营设计领域（ODD）的行业格局，然后深入探讨了人工智能在塑造车辆自主决策能力方面的基本作用。它阐明了人工智能驱动的汽车开发生命周期所涉及的步骤，并探讨了自动驾驶汽车人工智能驱动软件开发过程中的各种挑战，如安全性、保密性、隐私性和道德考量。研究报告对多年来人工智能算法的使用情况和类型进行了统计分析，展示了汽车行业不断发展的研究状况。此外，论文还强调了参数在完善卡车和汽车算法中的关键作用，有助于车辆适应、学习并随着时间的推移提高性能。最后，论文概述了不同级别的自动驾驶，阐明了人工智能算法的细微用途，并讨论了每个级别的关键任务自动化和软件包规模。总之，本文全面分析了当前的行业格局，重点关注了几个关键方面。

{"title":"Autonomous Vehicles: Evolution of Artificial Intelligence and the Current Industry Landscape","authors":"Divya Garikapati, Sneha Sudhir Shetiya","doi":"10.3390/bdcc8040042","DOIUrl":"https://doi.org/10.3390/bdcc8040042","url":null,"abstract":"The advent of autonomous vehicles has heralded a transformative era in transportation, reshaping the landscape of mobility through cutting-edge technologies. Central to this evolution is the integration of artificial intelligence (AI), propelling vehicles into realms of unprecedented autonomy. Commencing with an overview of the current industry landscape with respect to Operational Design Domain (ODD), this paper delves into the fundamental role of AI in shaping the autonomous decision-making capabilities of vehicles. It elucidates the steps involved in the AI-powered development life cycle in vehicles, addressing various challenges such as safety, security, privacy, and ethical considerations in AI-driven software development for autonomous vehicles. The study presents statistical insights into the usage and types of AI algorithms over the years, showcasing the evolving research landscape within the automotive industry. Furthermore, the paper highlights the pivotal role of parameters in refining algorithms for both trucks and cars, facilitating vehicles to adapt, learn, and improve performance over time. It concludes by outlining different levels of autonomy, elucidating the nuanced usage of AI algorithms, and discussing the automation of key tasks and the software package size at each level. Overall, the paper provides a comprehensive analysis of the current industry landscape, focusing on several critical aspects.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"4 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140733059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating Synthetic Sperm Whale Voice Data Using StyleGAN2-ADA 使用 StyleGAN2-ADA 生成合成抹香鲸声音数据

Big Data and Cognitive Computing

Pub Date : 2024-04-03 DOI: 10.3390/bdcc8040040

E. Kopets, Tatiana Shpilevaya, Oleg Vasilchenko, Artur Karimov, D. Butusov

The application of deep learning neural networks enables the processing of extensive volumes of data and often requires dense datasets. In certain domains, researchers encounter challenges related to the scarcity of training data, particularly in marine biology. In addition, many sounds produced by sea mammals are of interest in technical applications, e.g., underwater communication or sonar construction. Thus, generating synthetic biological sounds is an important task for understanding and studying the behavior of various animal species, especially large sea mammals, which demonstrate complex social behavior and can use hydrolocation to navigate underwater. This study is devoted to generating sperm whale vocalizations using a limited sperm whale click dataset. Our approach utilizes an augmentation technique predicated on the transformation of audio sample spectrograms, followed by the employment of the generative adversarial network StyleGAN2-ADA to generate new audio data. The results show that using the chosen augmentation method, namely mixing along the time axis, makes it possible to create fairly similar clicks of sperm whales with a maximum deviation of 2%. The generation of new clicks was reproduced on datasets using selected augmentation approaches with two neural networks: StyleGAN2-ADA and WaveGan. StyleGAN2-ADA, trained on an augmented dataset using the axis mixing approach, showed better results compared to WaveGAN.

应用深度学习神经网络可以处理大量数据，通常需要密集的数据集。在某些领域，研究人员会遇到与训练数据稀缺有关的挑战，尤其是在海洋生物学领域。此外，海洋哺乳动物发出的许多声音在技术应用中也很有意义，例如水下通信或声纳制造。因此，生成合成生物声音是了解和研究各种动物行为的一项重要任务，尤其是大型海洋哺乳动物，它们表现出复杂的社会行为，并能利用水定位在水下导航。本研究致力于利用有限的抹香鲸点击数据集生成抹香鲸的发声。我们的方法利用了一种基于音频样本频谱图转换的增强技术，然后利用生成式对抗网络 StyleGAN2-ADA 生成新的音频数据。结果表明，使用所选的增强方法（即沿时间轴混合），可以生成相当相似的抹香鲸点击声，最大偏差为 2%。在使用两种神经网络的选定增强方法的数据集上重现了新点击音的生成：StyleGAN2-ADA 和 WaveGan。与 WaveGAN 相比，StyleGAN2-ADA 在使用轴混合方法对增强数据集进行训练后，显示出更好的效果。

{"title":"Generating Synthetic Sperm Whale Voice Data Using StyleGAN2-ADA","authors":"E. Kopets, Tatiana Shpilevaya, Oleg Vasilchenko, Artur Karimov, D. Butusov","doi":"10.3390/bdcc8040040","DOIUrl":"https://doi.org/10.3390/bdcc8040040","url":null,"abstract":"The application of deep learning neural networks enables the processing of extensive volumes of data and often requires dense datasets. In certain domains, researchers encounter challenges related to the scarcity of training data, particularly in marine biology. In addition, many sounds produced by sea mammals are of interest in technical applications, e.g., underwater communication or sonar construction. Thus, generating synthetic biological sounds is an important task for understanding and studying the behavior of various animal species, especially large sea mammals, which demonstrate complex social behavior and can use hydrolocation to navigate underwater. This study is devoted to generating sperm whale vocalizations using a limited sperm whale click dataset. Our approach utilizes an augmentation technique predicated on the transformation of audio sample spectrograms, followed by the employment of the generative adversarial network StyleGAN2-ADA to generate new audio data. The results show that using the chosen augmentation method, namely mixing along the time axis, makes it possible to create fairly similar clicks of sperm whales with a maximum deviation of 2%. The generation of new clicks was reproduced on datasets using selected augmentation approaches with two neural networks: StyleGAN2-ADA and WaveGan. StyleGAN2-ADA, trained on an augmented dataset using the axis mixing approach, showed better results compared to WaveGAN.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140748444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From Traditional Recommender Systems to GPT-Based Chatbots: A Survey of Recent Developments and Future Directions 从传统推荐系统到基于 GPT 的聊天机器人：最新发展和未来方向概览

Big Data and Cognitive Computing

Pub Date : 2024-03-27 DOI: 10.3390/bdcc8040036

T. M. Al-Hasan, A. Sayed, Fayal Bensaali, Yassine Himeur, Iraklis Varlamis, G. Dimitrakopoulos

Recommender systems are a key technology for many applications, such as e-commerce, streaming media, and social media. Traditional recommender systems rely on collaborative filtering or content-based filtering to make recommendations. However, these approaches have limitations, such as the cold start and the data sparsity problem. This survey paper presents an in-depth analysis of the paradigm shift from conventional recommender systems to generative pre-trained-transformers-(GPT)-based chatbots. We highlight recent developments that leverage the power of GPT to create interactive and personalized conversational agents. By exploring natural language processing (NLP) and deep learning techniques, we investigate how GPT models can better understand user preferences and provide context-aware recommendations. The paper further evaluates the advantages and limitations of GPT-based recommender systems, comparing their performance with traditional methods. Additionally, we discuss potential future directions, including the role of reinforcement learning in refining the personalization aspect of these systems.

推荐系统是电子商务、流媒体和社交媒体等许多应用的关键技术。传统的推荐系统依靠协同过滤或基于内容的过滤来进行推荐。然而，这些方法都有局限性，比如冷启动和数据稀疏问题。本调查报告深入分析了从传统推荐系统到基于生成式预训练转换器（GPT）的聊天机器人的范式转变。我们重点介绍了利用 GPT 的强大功能创建交互式个性化对话代理的最新进展。通过探索自然语言处理（NLP）和深度学习技术，我们研究了 GPT 模型如何更好地理解用户偏好并提供上下文感知建议。本文进一步评估了基于 GPT 的推荐系统的优势和局限性，并将其性能与传统方法进行了比较。此外，我们还讨论了潜在的未来发展方向，包括强化学习在完善这些系统的个性化方面所起的作用。

{"title":"From Traditional Recommender Systems to GPT-Based Chatbots: A Survey of Recent Developments and Future Directions","authors":"T. M. Al-Hasan, A. Sayed, Fayal Bensaali, Yassine Himeur, Iraklis Varlamis, G. Dimitrakopoulos","doi":"10.3390/bdcc8040036","DOIUrl":"https://doi.org/10.3390/bdcc8040036","url":null,"abstract":"Recommender systems are a key technology for many applications, such as e-commerce, streaming media, and social media. Traditional recommender systems rely on collaborative filtering or content-based filtering to make recommendations. However, these approaches have limitations, such as the cold start and the data sparsity problem. This survey paper presents an in-depth analysis of the paradigm shift from conventional recommender systems to generative pre-trained-transformers-(GPT)-based chatbots. We highlight recent developments that leverage the power of GPT to create interactive and personalized conversational agents. By exploring natural language processing (NLP) and deep learning techniques, we investigate how GPT models can better understand user preferences and provide context-aware recommendations. The paper further evaluates the advantages and limitations of GPT-based recommender systems, comparing their performance with traditional methods. Additionally, we discuss potential future directions, including the role of reinforcement learning in refining the personalization aspect of these systems.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"31 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140373709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Comparative Study for Stock Market Forecast Based on a New Machine Learning Model 基于新机器学习模型的股市预测比较研究

Big Data and Cognitive Computing

Pub Date : 2024-03-26 DOI: 10.3390/bdcc8040034

Enrique González-Núñez, Luis A. Trejo, Michael Kampouridis

This research aims at applying the Artificial Organic Network (AON), a nature-inspired, supervised, metaheuristic machine learning framework, to develop a new algorithm based on this machine learning class. The focus of the new algorithm is to model and predict stock markets based on the Index Tracking Problem (ITP). In this work, we present a new algorithm, based on the AON framework, that we call Artificial Halocarbon Compounds, or the AHC algorithm for short. In this study, we compare the AHC algorithm against genetic algorithms (GAs), by forecasting eight stock market indices. Additionally, we performed a cross-reference comparison against results regarding the forecast of other stock market indices based on state-of-the-art machine learning methods. The efficacy of the AHC model is evaluated by modeling each index, producing highly promising results. For instance, in the case of the IPC Mexico index, the R-square is 0.9806, with a mean relative error of 7×10−4. Several new features characterize our new model, mainly adaptability, dynamism and topology reconfiguration. This model can be applied to systems requiring simulation analysis using time series data, providing a versatile solution to complex problems like financial forecasting.

本研究旨在应用人工有机网络（AON）--一种受自然启发的、有监督的、元启发式机器学习框架--开发一种基于该机器学习类的新算法。新算法的重点是基于指数跟踪问题（ITP）对股票市场进行建模和预测。在这项工作中，我们提出了一种基于 AON 框架的新算法，我们称之为人工卤素化合物（Artificial Halocarbon Compounds），简称 AHC 算法。在这项研究中，我们通过预测八个股票市场指数，将 AHC 算法与遗传算法（GA）进行了比较。此外，我们还对基于最先进的机器学习方法预测其他股市指数的结果进行了交叉参考比较。通过对每个指数进行建模，评估了 AHC 模型的功效，结果非常令人满意。例如，就墨西哥 IPC 指数而言，R 方为 0.9806，平均相对误差为 7×10-4。我们的新模型有几个新特点，主要是适应性、动态性和拓扑重组。该模型可应用于需要使用时间序列数据进行仿真分析的系统，为金融预测等复杂问题提供了多功能解决方案。

{"title":"A Comparative Study for Stock Market Forecast Based on a New Machine Learning Model","authors":"Enrique González-Núñez, Luis A. Trejo, Michael Kampouridis","doi":"10.3390/bdcc8040034","DOIUrl":"https://doi.org/10.3390/bdcc8040034","url":null,"abstract":"This research aims at applying the Artificial Organic Network (AON), a nature-inspired, supervised, metaheuristic machine learning framework, to develop a new algorithm based on this machine learning class. The focus of the new algorithm is to model and predict stock markets based on the Index Tracking Problem (ITP). In this work, we present a new algorithm, based on the AON framework, that we call Artificial Halocarbon Compounds, or the AHC algorithm for short. In this study, we compare the AHC algorithm against genetic algorithms (GAs), by forecasting eight stock market indices. Additionally, we performed a cross-reference comparison against results regarding the forecast of other stock market indices based on state-of-the-art machine learning methods. The efficacy of the AHC model is evaluated by modeling each index, producing highly promising results. For instance, in the case of the IPC Mexico index, the R-square is 0.9806, with a mean relative error of 7×10−4. Several new features characterize our new model, mainly adaptability, dynamism and topology reconfiguration. This model can be applied to systems requiring simulation analysis using time series data, providing a versatile solution to complex problems like financial forecasting.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"79 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140377755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two-Stage Method for Clothing Feature Detection 服装特征检测的两阶段方法

Big Data and Cognitive Computing

Pub Date : 2024-03-26 DOI: 10.3390/bdcc8040035

Xinwei Lyu, Xinjia Li, Yuexin Zhang, Wenlian Lu

The rapid expansion of e-commerce, particularly in the clothing sector, has led to a significant demand for an effective clothing industry. This study presents a novel two-stage image recognition method. Our approach distinctively combines human keypoint detection, object detection, and classification methods into a two-stage structure. Initially, we utilize open-source libraries, namely OpenPose and Dlib, for accurate human keypoint detection, followed by a custom cropping logic for extracting body part boxes. In the second stage, we employ a blend of Harris Corner, Canny Edge, and skin pixel detection integrated with VGG16 and support vector machine (SVM) models. This configuration allows the bounding boxes to identify ten unique attributes, encompassing facial features and detailed aspects of clothing. Conclusively, the experiment yielded an overall recognition accuracy of 81.4% for tops and 85.72% for bottoms, highlighting the efficacy of the applied methodologies in garment categorization.

电子商务的快速发展，尤其是服装行业的快速发展，对有效的服装行业提出了巨大的需求。本研究提出了一种新颖的两阶段图像识别方法。我们的方法独特地将人体关键点检测、物体检测和分类方法结合到一个两阶段结构中。首先，我们利用开源库（即 OpenPose 和 Dlib）进行精确的人体关键点检测，然后利用自定义裁剪逻辑提取人体部位框。在第二阶段，我们将哈里斯角、坎尼边缘和皮肤像素检测与 VGG16 和支持向量机 (SVM) 模型相结合。这种配置使边界框能够识别十种独特的属性，包括面部特征和服装的细节方面。实验结果表明，上衣和下装的整体识别准确率分别为 81.4% 和 85.72%，突出表明了所应用方法在服装分类方面的功效。

引用次数: 0

Cancer Detection Using a New Hybrid Method Based on Pattern Recognition in MicroRNAs Combining Particle Swarm Optimization Algorithm and Artificial Neural Network 基于微粒群优化算法和人工神经网络的微 RNA 模式识别的新型混合方法可用于癌症检测

Big Data and Cognitive Computing

Pub Date : 2024-03-19 DOI: 10.3390/bdcc8030033

Sepideh Molaei, Stefano Cirillo, Giandomenico Solimando

MicroRNAs (miRNAs) play a crucial role in cancer development, but not all miRNAs are equally significant in cancer detection. Traditional methods face challenges in effectively identifying cancer-associated miRNAs due to data complexity and volume. This study introduces a novel, feature-based technique for detecting attributes related to cancer-affecting microRNAs. It aims to enhance cancer diagnosis accuracy by identifying the most relevant miRNAs for various cancer types using a hybrid approach. In particular, we used a combination of particle swarm optimization (PSO) and artificial neural networks (ANNs) for this purpose. PSO was employed for feature selection, focusing on identifying the most informative miRNAs, while ANNs were used for recognizing patterns within the miRNA data. This hybrid method aims to overcome limitations in traditional miRNA analysis by reducing data redundancy and focusing on key genetic markers. The application of this method showed a significant improvement in the detection accuracy for various cancers, including breast and lung cancer and melanoma. Our approach demonstrated a higher precision in identifying relevant miRNAs compared to existing methods, as evidenced by the analysis of different datasets. The study concludes that the integration of PSO and ANNs provides a more efficient, cost-effective, and accurate method for cancer detection via miRNA analysis. This method can serve as a supplementary tool for cancer diagnosis and potentially aid in developing personalized cancer treatments.

微小 RNA（miRNA）在癌症的发展过程中起着至关重要的作用，但并非所有的 miRNA 在癌症检测中都具有同等重要的意义。由于数据复杂且数量庞大，传统方法在有效识别癌症相关 miRNA 方面面临挑战。本研究介绍了一种基于特征的新型技术，用于检测与影响癌症的 microRNA 相关的属性。其目的是利用混合方法识别与各种癌症类型最相关的 miRNA，从而提高癌症诊断的准确性。为此，我们特别结合使用了粒子群优化（PSO）和人工神经网络（ANN）。粒子群优化用于特征选择，重点是识别信息量最大的 miRNA，而人工神经网络则用于识别 miRNA 数据中的模式。这种混合方法旨在通过减少数据冗余和关注关键遗传标记来克服传统 miRNA 分析的局限性。这种方法的应用表明，对各种癌症（包括乳腺癌、肺癌和黑色素瘤）的检测准确率有了显著提高。对不同数据集的分析表明，与现有方法相比，我们的方法在识别相关 miRNA 方面具有更高的精确度。研究得出结论，PSO 和 ANNs 的整合为通过 miRNA 分析进行癌症检测提供了一种更高效、更经济、更准确的方法。这种方法可以作为癌症诊断的辅助工具，并有可能帮助开发个性化的癌症治疗方法。

{"title":"Cancer Detection Using a New Hybrid Method Based on Pattern Recognition in MicroRNAs Combining Particle Swarm Optimization Algorithm and Artificial Neural Network","authors":"Sepideh Molaei, Stefano Cirillo, Giandomenico Solimando","doi":"10.3390/bdcc8030033","DOIUrl":"https://doi.org/10.3390/bdcc8030033","url":null,"abstract":"MicroRNAs (miRNAs) play a crucial role in cancer development, but not all miRNAs are equally significant in cancer detection. Traditional methods face challenges in effectively identifying cancer-associated miRNAs due to data complexity and volume. This study introduces a novel, feature-based technique for detecting attributes related to cancer-affecting microRNAs. It aims to enhance cancer diagnosis accuracy by identifying the most relevant miRNAs for various cancer types using a hybrid approach. In particular, we used a combination of particle swarm optimization (PSO) and artificial neural networks (ANNs) for this purpose. PSO was employed for feature selection, focusing on identifying the most informative miRNAs, while ANNs were used for recognizing patterns within the miRNA data. This hybrid method aims to overcome limitations in traditional miRNA analysis by reducing data redundancy and focusing on key genetic markers. The application of this method showed a significant improvement in the detection accuracy for various cancers, including breast and lung cancer and melanoma. Our approach demonstrated a higher precision in identifying relevant miRNAs compared to existing methods, as evidenced by the analysis of different datasets. The study concludes that the integration of PSO and ANNs provides a more efficient, cost-effective, and accurate method for cancer detection via miRNA analysis. This method can serve as a supplementary tool for cancer diagnosis and potentially aid in developing personalized cancer treatments.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"23 1‐2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140228141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture 使用基于编码器的变换器架构的阿拉伯语人工智能文本检测器

Big Data and Cognitive Computing

Pub Date : 2024-03-18 DOI: 10.3390/bdcc8030032

Hamed Alshammari, Ahmed El-Sayed, Khaled Elleithy

The effectiveness of existing AI detectors is notably hampered when processing Arabic texts. This study introduces a novel AI text classifier designed specifically for Arabic, tackling the distinct challenges inherent in processing this language. A particular focus is placed on accurately recognizing human-written texts (HWTs), an area where existing AI detectors have demonstrated significant limitations. To achieve this goal, this paper utilized and fine-tuned two Transformer-based models, AraELECTRA and XLM-R, by training them on two distinct datasets: a large dataset comprising 43,958 examples and a custom dataset with 3078 examples that contain HWT and AI-generated texts (AIGTs) from various sources, including ChatGPT 3.5, ChatGPT-4, and BARD. The proposed architecture is adaptable to any language, but this work evaluates these models’ efficiency in recognizing HWTs versus AIGTs in Arabic as an example of Semitic languages. The performance of the proposed models has been compared against the two prominent existing AI detectors, GPTZero and OpenAI Text Classifier, particularly on the AIRABIC benchmark dataset. The results reveal that the proposed classifiers outperform both GPTZero and OpenAI Text Classifier with 81% accuracy compared to 63% and 50% for GPTZero and OpenAI Text Classifier, respectively. Furthermore, integrating a Dediacritization Layer prior to the classification model demonstrated a significant enhancement in the detection accuracy of both HWTs and AIGTs. This Dediacritization step markedly improved the classification accuracy, elevating it from 81% to as high as 99% and, in some instances, even achieving 100%.

在处理阿拉伯语文本时，现有人工智能检测器的有效性明显受到影响。本研究介绍了一种专为阿拉伯语设计的新型人工智能文本分类器，以应对处理这种语言时固有的独特挑战。重点尤其放在准确识别人写文本（HWT）上，而现有的人工智能检测器在这一领域表现出明显的局限性。为了实现这一目标，本文利用并微调了两个基于变换器的模型 AraELECTRA 和 XLM-R，在两个不同的数据集上对它们进行了训练：一个由 43958 个示例组成的大型数据集和一个由 3078 个示例组成的自定义数据集，其中包含来自 ChatGPT 3.5、ChatGPT-4 和 BARD 等不同来源的 HWT 和人工智能生成的文本 (AIGT)。所提出的架构适用于任何语言，但本研究以闪米特语言中的阿拉伯语为例，评估了这些模型在识别 HWT 和 AIGT 时的效率。特别是在 AIRABIC 基准数据集上，与现有的两个著名人工智能检测器 GPTZero 和 OpenAI 文本分类器进行了比较。结果显示，所提出的分类器的准确率比 GPTZero 和 OpenAI 文本分类器都要高，达到 81%，而 GPTZero 和 OpenAI 文本分类器的准确率分别为 63% 和 50%。此外，在分类模型之前集成 Dediacritization Layer 也显著提高了 HWT 和 AIGT 的检测准确率。这一 Dediacritization 步骤显著提高了分类准确率，从 81% 提高到 99%，在某些情况下甚至达到 100%。

{"title":"AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture","authors":"Hamed Alshammari, Ahmed El-Sayed, Khaled Elleithy","doi":"10.3390/bdcc8030032","DOIUrl":"https://doi.org/10.3390/bdcc8030032","url":null,"abstract":"The effectiveness of existing AI detectors is notably hampered when processing Arabic texts. This study introduces a novel AI text classifier designed specifically for Arabic, tackling the distinct challenges inherent in processing this language. A particular focus is placed on accurately recognizing human-written texts (HWTs), an area where existing AI detectors have demonstrated significant limitations. To achieve this goal, this paper utilized and fine-tuned two Transformer-based models, AraELECTRA and XLM-R, by training them on two distinct datasets: a large dataset comprising 43,958 examples and a custom dataset with 3078 examples that contain HWT and AI-generated texts (AIGTs) from various sources, including ChatGPT 3.5, ChatGPT-4, and BARD. The proposed architecture is adaptable to any language, but this work evaluates these models’ efficiency in recognizing HWTs versus AIGTs in Arabic as an example of Semitic languages. The performance of the proposed models has been compared against the two prominent existing AI detectors, GPTZero and OpenAI Text Classifier, particularly on the AIRABIC benchmark dataset. The results reveal that the proposed classifiers outperform both GPTZero and OpenAI Text Classifier with 81% accuracy compared to 63% and 50% for GPTZero and OpenAI Text Classifier, respectively. Furthermore, integrating a Dediacritization Layer prior to the classification model demonstrated a significant enhancement in the detection accuracy of both HWTs and AIGTs. This Dediacritization step markedly improved the classification accuracy, elevating it from 81% to as high as 99% and, in some instances, even achieving 100%.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"25 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140234550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Learning Approaches for Predicting Risk of Cardiometabolic Disease among University Students 预测大学生心血管代谢疾病风险的机器学习方法

Big Data and Cognitive Computing

Pub Date : 2024-03-13 DOI: 10.3390/bdcc8030031

Dhiaa Musleh, Ali Alkhwaja, Ibrahim Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Mohammed Albugami, Faisal Alfawaz, Said El-Ashker, M. Al-Hariri

Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.

肥胖症正日益成为青少年普遍关注的健康问题，并导致心脏代谢疾病（CMDs）等重大风险。为了获得更好的治疗效果，早期发现和诊断 CMD 至关重要。本研究旨在利用各种机器学习技术，建立一个能够预测 CMD 的可靠人工智能模型。本研究比较了支持向量机（SVM）、K-近邻（KNN）、逻辑回归（LR）、随机森林（RF）和梯度提升（Gradient Boosting）这五种稳健的分类器。为了提高所提模型的可解释性和判别特性，本研究引入了一种新的 "风险等级 "特征，该特征是通过将模糊逻辑应用于 Conicity 指数而得出的。由于 Conicity 指数得分表明了 CMD 风险，因此开发了两个单独的模型来分别处理不同的性别。我们使用从沙特阿拉伯 295 份本科生记录中获得的两个数据集对所提议模型的性能进行了评估。数据集包括 121 名男生和 174 名女生，他们的风险水平各不相同。值得注意的是，逻辑回归在男生中表现最佳，准确率达到 91%，而梯度提升落后，准确率为 72%。在女生中，支持向量机和逻辑回归都以 87% 的准确率遥遥领先，而随机森林的表现最差，只有 80%。

{"title":"Machine Learning Approaches for Predicting Risk of Cardiometabolic Disease among University Students","authors":"Dhiaa Musleh, Ali Alkhwaja, Ibrahim Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Mohammed Albugami, Faisal Alfawaz, Said El-Ashker, M. Al-Hariri","doi":"10.3390/bdcc8030031","DOIUrl":"https://doi.org/10.3390/bdcc8030031","url":null,"abstract":"Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140247702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proposal of a Service Model for Blockchain-Based Security Tokens 基于区块链的安全代币服务模式提案

Big Data and Cognitive Computing

Pub Date : 2024-03-12 DOI: 10.3390/bdcc8030030

Keundug Park, H. Youm

The volume of the asset investment and trading market can be expanded through the issuance and management of blockchain-based security tokens that logically divide the value of assets and guarantee ownership. This paper proposes a service model to solve a problem with the existing investment service model, identifies security threats to the service model, and specifies security requirements countering the identified security threats for privacy protection and anti-money laundering (AML) involving security tokens. The identified security threats and specified security requirements should be taken into consideration when implementing the proposed service model. The proposed service model allows users to invest in tokenized tangible and intangible assets and trade in blockchain-based security tokens. This paper discusses considerations to prevent excessive regulation and market monopoly in the issuance of and trading in security tokens when implementing the proposed service model and concludes with future works.

通过发行和管理基于区块链的安全代币，在逻辑上划分资产价值并保证所有权，可以扩大资产投资和交易市场的规模。本文提出了一种服务模式，以解决现有投资服务模式存在的问题，识别了该服务模式面临的安全威胁，并针对识别出的安全威胁提出了涉及安全代币的隐私保护和反洗钱（AML）的安全要求。在实施拟议的服务模式时，应考虑已识别的安全威胁和指定的安全要求。拟议的服务模式允许用户投资代币化的有形和无形资产，并交易基于区块链的安全代币。本文讨论了在实施建议的服务模式时，防止安全代币发行和交易中的过度监管和市场垄断的注意事项，并对未来的工作进行了总结。

引用次数: 0

The Distribution and Accessibility of Elements of Tourism in Historic and Cultural Cities 历史文化名城旅游要素的分布和可达性

Big Data and Cognitive Computing

Pub Date : 2024-03-11 DOI: 10.3390/bdcc8030029

Wei-Lng Hsu, Yi-Jheng Chang, Lin Mou, Juan-Wen Huang, Hsin-Lung Liu

Historic urban areas are the foundations of urban development. Due to rapid urbanization, the sustainable development of historic urban areas has become challenging for many cities. Elements of tourism and tourism service facilities play an important role in the sustainable development of historic areas. This study analyzed policies related to tourism in Panguifang and Meixian districts in Meizhou, Guangdong, China. Kernel density estimation was used to study the clustering characteristics of tourism elements through point of interest (POI) data, while space syntax was used to study the accessibility of roads. In addition, the Pearson correlation coefficient and regression were used to analyze the correlation between the elements and accessibility. The results show the following: (1) the overall number of tourism elements was high on the western side of the districts and low on the eastern one, and the elements were predominantly distributed along the main transportation arteries; (2) according to the integration degree and depth value, the western side was easier to access than the eastern one; and (3) the depth value of the area negatively correlated with kernel density, while the degree of integration positively correlated with it. Based on the results, the study put forward measures for optimizing the elements of tourism in Meizhou’s historic urban area to improve cultural tourism and emphasize the importance of the elements.

历史城区是城市发展的基础。随着城市化进程的加快，历史城区的可持续发展对许多城市来说都是一个挑战。旅游要素和旅游服务设施在历史城区的可持续发展中发挥着重要作用。本研究分析了中国广东省梅州市番桂坊区和梅县区的旅游相关政策。通过兴趣点（POI）数据，采用核密度估计法研究旅游要素的聚类特征，同时采用空间句法研究道路的可达性。此外，还利用皮尔逊相关系数和回归分析了要素与可达性之间的相关性。结果显示如下：(1）各区旅游要素总体数量西侧多、东侧少，且主要分布在主要交通干道沿线；（2）根据整合度和深度值，西侧比东侧更容易进入；（3）区域深度值与内核密度呈负相关，而整合度与内核密度呈正相关。基于上述结果，研究提出了优化梅州历史城区旅游要素的措施，以改善文化旅游并强调要素的重要性。

{"title":"The Distribution and Accessibility of Elements of Tourism in Historic and Cultural Cities","authors":"Wei-Lng Hsu, Yi-Jheng Chang, Lin Mou, Juan-Wen Huang, Hsin-Lung Liu","doi":"10.3390/bdcc8030029","DOIUrl":"https://doi.org/10.3390/bdcc8030029","url":null,"abstract":"Historic urban areas are the foundations of urban development. Due to rapid urbanization, the sustainable development of historic urban areas has become challenging for many cities. Elements of tourism and tourism service facilities play an important role in the sustainable development of historic areas. This study analyzed policies related to tourism in Panguifang and Meixian districts in Meizhou, Guangdong, China. Kernel density estimation was used to study the clustering characteristics of tourism elements through point of interest (POI) data, while space syntax was used to study the accessibility of roads. In addition, the Pearson correlation coefficient and regression were used to analyze the correlation between the elements and accessibility. The results show the following: (1) the overall number of tourism elements was high on the western side of the districts and low on the eastern one, and the elements were predominantly distributed along the main transportation arteries; (2) according to the integration degree and depth value, the western side was easier to access than the eastern one; and (3) the depth value of the area negatively correlated with kernel density, while the degree of integration positively correlated with it. Based on the results, the study put forward measures for optimizing the elements of tourism in Meizhou’s historic urban area to improve cultural tourism and emphasize the importance of the elements.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"34 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140254207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0