Journal of Computer Science最新文献_第6页

Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models 使用基于阿拉伯语 BERT 的模型改进摩洛哥方言情感分析

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.157.167

Ghizlane Bourahouat, Manar Abourezq, N. Daoudi

: This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects

:本研究探讨了自然语言处理中情感分析这一关键任务，尤其侧重于阿拉伯语，特别是方言阿拉伯语，由于其固有的挑战，对该领域的研究相对不足。我们的方法以摩洛哥阿拉伯语的情感分析为中心，利用在阿拉伯语中预先训练好的 BERT 模型，即 AraBERT、QARIB、ALBERT、AraELECTRA 和 CAMeLBERT。这些模型与 SVM 和 CNN 等深度学习和机器学习算法集成，并对预训练模型进行了额外的微调。此外，我们还通过在三个不同的数据集上评估模型来检验数据不平衡的影响：一个不平衡数据集、一个通过抽样不足获得的平衡数据集，以及一个通过将初始数据集与另一个不平衡数据集相结合而创建的平衡数据集。值得注意的是，我们提出的方法表现出了令人印象深刻的准确性，即使在不平衡数据上使用 QARIB 模型，也能达到 96% 的显著准确率。这项研究的新颖之处在于将预先训练好的阿拉伯语 BERT 模型整合到摩洛哥情感分析中，并探索如何将其与 CNN 和 SVM 算法结合使用。此外，我们的研究结果表明，与结合 CNN 或 SVM 的应用相比，使用基于 BERT 的模型会产生更优越的结果，这标志着摩洛哥阿拉伯语情感分析的重大进步。我们的方法通过与最先进方法的对比分析凸显了其有效性，为阿拉伯语方言情感分析的进步提供了有价值的见解。

{"title":"Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models","authors":"Ghizlane Bourahouat, Manar Abourezq, N. Daoudi","doi":"10.3844/jcssp.2024.157.167","DOIUrl":"https://doi.org/10.3844/jcssp.2024.157.167","url":null,"abstract":": This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"49 29","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139683801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimization of Expert System Based on Interpolation, Forward Chaining, and Certainty Factor for Diagnosing Abdominal Colic 基于插值、前向连锁和确定性因子的专家系统在诊断腹绞痛方面的优化

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.191.197

Hari Soetanto, Painem, Muhammad Kamil Suryadewiansyah

: Abdominal colic is a common condition that affects infants and it can be difficult to diagnose because it shares many symptoms with other conditions, such as gastric disease and appendicitis. Limitations of existing diagnostic methods include the unreliability of physical examinations and medical histories and the high cost and time-consuming nature of imaging tests. This research proposes an expert system based on interpolation, forward chaining, and certainty factors for diagnosing abdominal colic. This system has the potential to provide a more accurate and efficient way to diagnose abdominal colic, which could lead to better patient outcomes. This research proposes an expert system based on interpolation, forward chaining, and certainty factors for diagnosing abdominal colic. This system is implemented as a web application model. The forward chaining method is used to establish rules for the expert system. The rules are based on the symptoms and diseases that are included in the system's knowledge base. The interpolation method is used to normalize lab results and the certainty factor method is used to process medical history and physical examinations. This is necessary because medical history and physical examinations can be imprecise. The expert system was tested on a dataset of 100 cases and it was able to accurately diagnose 96 patients, achieving a 96% accuracy rate. This suggests that the expert system has the potential to provide a more accurate and efficient way to diagnose abdominal colic, which could lead to better patient outcomes.

:腹绞痛是影响婴儿的一种常见疾病，由于它与胃病和阑尾炎等其他疾病有许多共同症状，因此很难诊断。现有诊断方法的局限性包括体格检查和病史的不可靠性，以及成像测试的高成本和耗时性。这项研究提出了一种基于插值、前向链和确定性因素的专家系统，用于诊断腹绞痛。该系统有望为诊断腹绞痛提供更准确、更高效的方法，从而为患者带来更好的治疗效果。本研究提出了一种基于插值、前向链和确定性因素的专家系统，用于诊断腹绞痛。该系统以网络应用模式实现。前向链法用于为专家系统建立规则。这些规则基于系统知识库中的症状和疾病。插值法用于规范化实验室结果，确定性因子法用于处理病史和体格检查。这一点很有必要，因为病史和体格检查可能并不精确。专家系统在 100 个病例的数据集上进行了测试，能够准确诊断出 96 名患者，准确率达到 96%。这表明，专家系统有可能为诊断腹绞痛提供更准确、更有效的方法，从而为患者带来更好的治疗效果。

{"title":"Optimization of Expert System Based on Interpolation, Forward Chaining, and Certainty Factor for Diagnosing Abdominal Colic","authors":"Hari Soetanto, Painem, Muhammad Kamil Suryadewiansyah","doi":"10.3844/jcssp.2024.191.197","DOIUrl":"https://doi.org/10.3844/jcssp.2024.191.197","url":null,"abstract":": Abdominal colic is a common condition that affects infants and it can be difficult to diagnose because it shares many symptoms with other conditions, such as gastric disease and appendicitis. Limitations of existing diagnostic methods include the unreliability of physical examinations and medical histories and the high cost and time-consuming nature of imaging tests. This research proposes an expert system based on interpolation, forward chaining, and certainty factors for diagnosing abdominal colic. This system has the potential to provide a more accurate and efficient way to diagnose abdominal colic, which could lead to better patient outcomes. This research proposes an expert system based on interpolation, forward chaining, and certainty factors for diagnosing abdominal colic. This system is implemented as a web application model. The forward chaining method is used to establish rules for the expert system. The rules are based on the symptoms and diseases that are included in the system's knowledge base. The interpolation method is used to normalize lab results and the certainty factor method is used to process medical history and physical examinations. This is necessary because medical history and physical examinations can be imprecise. The expert system was tested on a dataset of 100 cases and it was able to accurately diagnose 96 patients, achieving a 96% accuracy rate. This suggests that the expert system has the potential to provide a more accurate and efficient way to diagnose abdominal colic, which could lead to better patient outcomes.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"51 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139687689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting Smartphone Addiction in Teenagers: An Integrative Model Incorporating Machine Learning and Big Five Personality Traits 预测青少年的智能手机成瘾：结合机器学习和五大人格特质的综合模型

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.181.190

Jacobo Osorio, Marko Figueroa, Lenis Wong

: Smartphone addiction has emerged as a growing concern in society, particularly among teenagers, due to its potential negative impact on physical, emotional social well-being. The excessive use of smartphones has consistently shown associations with negative outcomes, highlighting a strong dependence on these devices, which often leads to detrimental effects on mental health, including heightened levels of anxiety, distress, stress depression. This psychological burden can further result in the neglect of daily activities as individuals become increasingly engrossed in seeking pleasure through their smartphones. The aim of this study is to develop a predictive model utilizing machine learning techniques to identify smartphone addiction based on the "Big Five Personality Traits (BFPT)". The model was developed by following five out of the six phases of the "Cross Industry Standard Process for Data Mining (CRISP-DM)" methodology, namely "business understanding," "data understanding," "data preparation," "modeling," and "evaluation." To construct the database, data was collected from a school using the Big Five Inventory (BFI) and the Smartphone Addiction Scale (SAS) questionnaires. Subsequently, four algorithms (DT, RF, XGB LG) were employed the correlation between the personality traits and addiction was examined. The analysis revealed a relationship between the traits of neuroticism and conscientiousness with smartphone addiction. The results demonstrated that the RF algorithm achieved an accuracy of 89.7%, a precision of 87.3% the highest AUC value on the ROC curve. These findings highlight the effectiveness of the proposed model in accurately predicting smartphone addiction among adolescents

:智能手机成瘾已成为社会日益关注的问题，尤其是在青少年中，因为它可能对身体、情感和社会福祉产生负面影响。过度使用智能手机一直显示出与负面结果的关联，突出表明了对这些设备的强烈依赖，这往往会导致对心理健康的不利影响，包括焦虑、苦恼和压力抑郁水平的升高。随着人们越来越沉迷于通过智能手机寻求乐趣，这种心理负担会进一步导致人们忽视日常活动。本研究旨在利用机器学习技术开发一个预测模型，以 "大五人格特质（BFPT）"为基础识别智能手机成瘾。该模型的开发遵循了 "数据挖掘跨行业标准流程（CRISP-DM）"方法论六个阶段中的五个阶段，即 "业务理解"、"数据理解"、"数据准备"、"建模 "和 "评估"。为了构建数据库，使用大五量表（BFI）和智能手机成瘾量表（SAS）问卷从一所学校收集数据。随后，采用四种算法（DT、RF、XGB LG）对人格特质与成瘾之间的相关性进行了研究。分析结果显示，神经质和自觉性特质与智能手机成瘾之间存在关系。结果表明，RF 算法的准确率为 89.7%，精确率为 87.3%，ROC 曲线上的 AUC 值最高。这些结果凸显了所提出的模型在准确预测青少年智能手机成瘾方面的有效性。

{"title":"Predicting Smartphone Addiction in Teenagers: An Integrative Model Incorporating Machine Learning and Big Five Personality Traits","authors":"Jacobo Osorio, Marko Figueroa, Lenis Wong","doi":"10.3844/jcssp.2024.181.190","DOIUrl":"https://doi.org/10.3844/jcssp.2024.181.190","url":null,"abstract":": Smartphone addiction has emerged as a growing concern in society, particularly among teenagers, due to its potential negative impact on physical, emotional social well-being. The excessive use of smartphones has consistently shown associations with negative outcomes, highlighting a strong dependence on these devices, which often leads to detrimental effects on mental health, including heightened levels of anxiety, distress, stress depression. This psychological burden can further result in the neglect of daily activities as individuals become increasingly engrossed in seeking pleasure through their smartphones. The aim of this study is to develop a predictive model utilizing machine learning techniques to identify smartphone addiction based on the \"Big Five Personality Traits (BFPT)\". The model was developed by following five out of the six phases of the \"Cross Industry Standard Process for Data Mining (CRISP-DM)\" methodology, namely \"business understanding,\" \"data understanding,\" \"data preparation,\" \"modeling,\" and \"evaluation.\" To construct the database, data was collected from a school using the Big Five Inventory (BFI) and the Smartphone Addiction Scale (SAS) questionnaires. Subsequently, four algorithms (DT, RF, XGB LG) were employed the correlation between the personality traits and addiction was examined. The analysis revealed a relationship between the traits of neuroticism and conscientiousness with smartphone addiction. The results demonstrated that the RF algorithm achieved an accuracy of 89.7%, a precision of 87.3% the highest AUC value on the ROC curve. These findings highlight the effectiveness of the proposed model in accurately predicting smartphone addiction among adolescents","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"43 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flock Optimization Algorithm-Based Deep Learning Model for Diabetic Disease Detection Improvement 基于羊群优化算法的深度学习模型用于糖尿病疾病检测改进

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.168.180

Divager Balasubramaniyan, N. Husin, N. Mustapha, N. Sharef, T.N. Mohd Aris

: Worldwide, 422 million people suffer from diabetic disease, and 1.5 million die yearly. Diabetes is a threat to people who still fail to cure or maintain it, so it is challenging to predict this disease accurately. The existing systems face data over-fitting issues, convergence problems, non-converging optimization complex predictions, and latent and predominant feature extraction. These issues affect the system's performance and reduce diabetic disease detection accuracy. Hence, the research objective is to create an improved diabetic disease detection system using a Flock Optimization Algorithm-Based Deep Learning Model (FOADLM) feature modeling approach that leverages the PIMA Indian dataset to predict and classify diabetic disease cases. The collected data is processed by a Gaussian filtering approach that eliminates irrelevant information, reducing the overfitting issues. Then flock optimization algorithm is applied to detect the sequence; this process is used to reduce the convergence and optimization problems. Finally, the recurrent neural approach is applied to classify the normal and abnormal features. The entire research implementation result is carried out with the help of the MATLAB program and the results are analyzed with accuracy, precision, recall, computational time, reliability scalability, and error rate measures like root mean square error, mean square error, and correlation coefficients. In conclusion, the system evaluation result produced 99.23% accuracy in predicting diabetic disease with the metrics.

:全世界有 4.22 亿人患有糖尿病，每年有 150 万人死亡。糖尿病威胁着仍无法治愈或维持的人群，因此准确预测这种疾病具有挑战性。现有系统面临着数据过度拟合问题、收敛问题、非收敛优化复杂预测以及潜在和主要特征提取等问题。这些问题影响了系统的性能，降低了糖尿病疾病检测的准确性。因此，研究目标是利用基于羊群优化算法的深度学习模型（FOADLM）特征建模方法，创建一个改进的糖尿病疾病检测系统，利用 PIMA 印度数据集对糖尿病疾病病例进行预测和分类。收集到的数据通过高斯滤波方法进行处理，以消除无关信息，减少过拟合问题。然后采用成群优化算法检测序列；这一过程用于减少收敛和优化问题。最后，采用循环神经方法对正常和异常特征进行分类。整个研究实施结果是在 MATLAB 程序的帮助下完成的，并对结果进行了准确度、精确度、召回率、计算时间、可靠性可扩展性以及均方根误差、均方误差和相关系数等误差率指标的分析。总之，系统评估结果显示，该指标预测糖尿病疾病的准确率为 99.23%。

{"title":"Flock Optimization Algorithm-Based Deep Learning Model for Diabetic Disease Detection Improvement","authors":"Divager Balasubramaniyan, N. Husin, N. Mustapha, N. Sharef, T.N. Mohd Aris","doi":"10.3844/jcssp.2024.168.180","DOIUrl":"https://doi.org/10.3844/jcssp.2024.168.180","url":null,"abstract":": Worldwide, 422 million people suffer from diabetic disease, and 1.5 million die yearly. Diabetes is a threat to people who still fail to cure or maintain it, so it is challenging to predict this disease accurately. The existing systems face data over-fitting issues, convergence problems, non-converging optimization complex predictions, and latent and predominant feature extraction. These issues affect the system's performance and reduce diabetic disease detection accuracy. Hence, the research objective is to create an improved diabetic disease detection system using a Flock Optimization Algorithm-Based Deep Learning Model (FOADLM) feature modeling approach that leverages the PIMA Indian dataset to predict and classify diabetic disease cases. The collected data is processed by a Gaussian filtering approach that eliminates irrelevant information, reducing the overfitting issues. Then flock optimization algorithm is applied to detect the sequence; this process is used to reduce the convergence and optimization problems. Finally, the recurrent neural approach is applied to classify the normal and abnormal features. The entire research implementation result is carried out with the help of the MATLAB program and the results are analyzed with accuracy, precision, recall, computational time, reliability scalability, and error rate measures like root mean square error, mean square error, and correlation coefficients. In conclusion, the system evaluation result produced 99.23% accuracy in predicting diabetic disease with the metrics.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"33 23","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139684102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of Information Reliability and Cloud Computing Efficiency on Website Design and E-Commerce Business in Thailand 信息可靠性和云计算效率对泰国网站设计和电子商务业务的影响

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.198.206

Charuay Savithi, Arisaphat Suttidee

: The security and reliability of cloud computing services continue to be major concerns that hinder their widespread adoption. This study explores how information reliability and cloud computing efficiency influence website design and e-commerce business development decisions on cloud computing. The researchers distributed 379 questionnaires to determine the sample size, resulting in a 46.50% response rate of 46.50% with 186 participants. Various statistical tests, including the t-test, the f-test (ANOVA and MANOVA), multiple correlation analysis and multiple regression analysis, are used to analyses the collected data. The results of the study show a positive correlation and influence between the reliability of information, specifically in terms of confidentiality, stability and verifiability and the decision to design and develop websites. Furthermore, the efficiency of cloud computing, particularly in communication and processing, demonstrates a positive relationship and impact on website design and development. These findings highlight the importance for e-commerce business leaders to understand the importance of information reliability and cloud computing efficiency. Recognizing these factors can enhance their competitive advantage in the e-commerce industry and foster consistent and sustainable growth. Research also highlights the contribution of cloud technology and security to increasing confidence in the development of e-commerce businesses.

:云计算服务的安全性和可靠性仍然是阻碍其广泛应用的主要问题。本研究探讨了信息可靠性和云计算效率如何影响有关云计算的网站设计和电子商务业务开发决策。研究人员发放了 379 份问卷以确定样本量，结果有 186 人参与，回复率为 46.50%。研究采用了各种统计检验，包括 t 检验、f 检验（方差分析和曼诺夫分析）、多重相关分析和多元回归分析，对收集到的数据进行分析。研究结果表明，信息的可靠性（特别是在保密性、稳定性和可验证性方面）与设计和开发网站的决策之间存在正相关关系和影响。此外，云计算的效率，特别是在通信和处理方面的效率，也对网站设计和开发产生了积极的关系和影响。这些发现凸显了电子商务企业领导者了解信息可靠性和云计算效率的重要性。认识到这些因素可以增强他们在电子商务行业的竞争优势，促进持续稳定的增长。研究还强调了云技术和安全性对增强电子商务企业发展信心的贡献。

{"title":"The Impact of Information Reliability and Cloud Computing Efficiency on Website Design and E-Commerce Business in Thailand","authors":"Charuay Savithi, Arisaphat Suttidee","doi":"10.3844/jcssp.2024.198.206","DOIUrl":"https://doi.org/10.3844/jcssp.2024.198.206","url":null,"abstract":": The security and reliability of cloud computing services continue to be major concerns that hinder their widespread adoption. This study explores how information reliability and cloud computing efficiency influence website design and e-commerce business development decisions on cloud computing. The researchers distributed 379 questionnaires to determine the sample size, resulting in a 46.50% response rate of 46.50% with 186 participants. Various statistical tests, including the t-test, the f-test (ANOVA and MANOVA), multiple correlation analysis and multiple regression analysis, are used to analyses the collected data. The results of the study show a positive correlation and influence between the reliability of information, specifically in terms of confidentiality, stability and verifiability and the decision to design and develop websites. Furthermore, the efficiency of cloud computing, particularly in communication and processing, demonstrates a positive relationship and impact on website design and development. These findings highlight the importance for e-commerce business leaders to understand the importance of information reliability and cloud computing efficiency. Recognizing these factors can enhance their competitive advantage in the e-commerce industry and foster consistent and sustainable growth. Research also highlights the contribution of cloud technology and security to increasing confidence in the development of e-commerce businesses.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"22 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139685595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A New Algorithm for Earthquake Prediction Using Machine Learning Methods 利用机器学习方法预测地震的新算法

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.150.156

N. Jarah, Abbas Hanon Hassin Alasadi, K. M. Hashim

: Seismic tremors are among the foremost perilous normal fiascos individuals confront due to their event without earlier caution and their effect on their lives and properties. In expansion, to consider future disaster prevention measures for major earthquakes, it is necessary to predict earthquakes using Neural Networks (NN). A machine learning technique has developed a technology to predict earthquakes from ground controller data by measuring ground vibration and transmitting data by a sensor network. Devices to process this data and record it in a catalog of seismic data from 1900-2019 for Iraq and neighboring regions, then divide this data into 80% training data and 20% test data. It gave better results than other prediction algorithms, where the NN model performs better Seismic prediction than other machine learning methods.

:地震是人类面临的最危险的正常灾难之一，因为地震发生时没有提前采取预防措施，而且地震会对人们的生命和财产造成影响。在扩展过程中，为了考虑未来针对大地震的防灾措施，有必要使用神经网络（NN）预测地震。机器学习技术通过测量地面振动并通过传感器网络传输数据，开发了一种从地面控制器数据预测地震的技术。设备对这些数据进行处理，并将其记录在伊拉克及邻近地区 1900-2019 年的地震数据目录中，然后将这些数据分为 80% 的训练数据和 20% 的测试数据。与其他预测算法相比，NN 模型的地震预测效果更好。

引用次数: 0

Machine Learning Oceanographic Data for Prediction of the Potential of Marine Resources 机器学习海洋学数据，预测海洋资源潜力

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.129.139

Denny Arbahri, O. Nurhayati, Imam Mudita

: Marine data and information are very important for human survival, therefore this data and information is attractive to investors because of the potential economic value. This data and information has been difficult to obtain, the solution to overcome this is by analyzing oceanographic data for 2009-2019 collected from the marine database belonging to the Agency for the Study and Application of Technology (BPPT). The data is the result of a collaborative marine survey between Indonesian and foreign researchers from various countries who sailed in various Indonesian waters. Raw oceanographic data is converted and classified into Conductivity, Temperature, and Depth (CTD) data as oceanographic data parameters identified as predictor variables (X) that are correlated with each other. CTD data is processed into numeric data attributes that have been labeled for input and training. The data was modeled using the Machine Learning (ML) type Supervised Learning (SL) method with the Decision Tree (DT), Linear Regression (LR) and Random Forest (RF) algorithms which were interpreted according to the characteristics of the CTD data. ML will learn data models to understand and store. Next, the model is evaluated using accuracy metrics by measuring the difference between the predicted value and the actual value to obtain a good prediction model. The prediction results show a salinity level of 34.0 parts per thousand (ppt), meaning that in this area of marine waters salinity will affect the solubility of Oxygen (O 2 ) and play a major role in the sustainability and growth of the fertility level of biological resources which is supported by sea surface temperature conditions 29.2°C. So the salinity values obtained using ML techniques and marine resource potential can be assumed to have a strong correlation. The research results show that the RF model has the lowest level of prediction error based on the values: Mean Square Error (MSE) = 0.007; Root Mean Squared Error (RMSE) = 0.082; Mean Absolute Error (MAE) = 0.007 compared to DT model: MSE = 0.008; RMSE = 0.088; MAE = 0.012 and LR model: MSE = 1.008; RMSE = 1.004; MAE = 0.281. The equivalent RF and DT models have a Determination Coefficient (R 2 ) = 0.999, meaning that a model is created that is good at predicting, compared to the LR model with a value of R 2 = 0.914. The correlation between variables shows that the LR model is very linear with a Correlation Coefficient (r) = 1.000 compared to the DT model (r) = 0.621 and the RF model (r) = 0.379. Therefore the algorithm that has a value of (r) +1 has the best level of accuracy. The use of ML to predict marine resource potential is a relatively new research field, so this research has the potential to contribute data and information as a reference for innovative studies and investment decision material for investors.

:海洋数据和信息对人类生存非常重要，因此这些数据和信息因其潜在的经济价值而对投资者具有吸引力。这些数据和信息一直难以获得，解决这一问题的办法是从技术研究和应用局（BPPT）海洋数据库中收集的 2009-2019 年海洋学数据进行分析。这些数据是印尼和来自不同国家的外国研究人员在印尼不同水域合作开展海洋调查的结果。原始海洋学数据被转换并分类为电导率、温度和深度（CTD）数据，这些海洋学数据参数被确定为相互关联的预测变量（X）。CTD 数据被处理成数字数据属性，这些属性已被标记，用于输入和训练。数据建模采用机器学习（ML）类型的监督学习（SL）方法，包括决策树（DT）、线性回归（LR）和随机森林（RF）算法，这些算法根据 CTD 数据的特征进行解释。ML 将学习数据模型，以便理解和存储。接下来，通过测量预测值与实际值之间的差异，使用准确度指标对模型进行评估，以获得良好的预测模型。预测结果显示，盐度水平为千分之 34.0（ppt），这意味着在这一区域的海水中，盐度将影响氧气（O 2 ）的溶解度，并对生物资源肥力水平的可持续性和增长起到重要作用，而生物资源的肥力水平是由 29.2°C 的海面温度条件支持的。因此，可以认为利用 ML 技术获得的盐度值与海洋资源潜力具有很强的相关性。研究结果表明，RF 模型的预测误差值最小：平均平方误差（MSE）= 0.007；均方根误差（RMSE）= 0.082；平均绝对误差（MAE）= 0.007：MSE = 0.008；RMSE = 0.088；MAE = 0.012 和 LR 模型：MSE = 1.008; RMSE = 1.004; MAE = 0.281。等效的 RF 和 DT 模型的判定系数（R 2 ）= 0.999，这意味着所创建的模型具有良好的预测能力，而 LR 模型的 R 2 = 0.914。变量之间的相关性显示，LR 模型的相关系数（r）= 1.000，而 DT 模型的相关系数（r）= 0.621，RF 模型的相关系数（r）= 0.379。因此，(r)+1 值的算法准确度最高。使用 ML 预测海洋资源潜力是一个相对较新的研究领域，因此本研究有可能提供数据和信息，作为创新研究和投资者投资决策材料的参考。

{"title":"Machine Learning Oceanographic Data for Prediction of the Potential of Marine Resources","authors":"Denny Arbahri, O. Nurhayati, Imam Mudita","doi":"10.3844/jcssp.2024.129.139","DOIUrl":"https://doi.org/10.3844/jcssp.2024.129.139","url":null,"abstract":": Marine data and information are very important for human survival, therefore this data and information is attractive to investors because of the potential economic value. This data and information has been difficult to obtain, the solution to overcome this is by analyzing oceanographic data for 2009-2019 collected from the marine database belonging to the Agency for the Study and Application of Technology (BPPT). The data is the result of a collaborative marine survey between Indonesian and foreign researchers from various countries who sailed in various Indonesian waters. Raw oceanographic data is converted and classified into Conductivity, Temperature, and Depth (CTD) data as oceanographic data parameters identified as predictor variables (X) that are correlated with each other. CTD data is processed into numeric data attributes that have been labeled for input and training. The data was modeled using the Machine Learning (ML) type Supervised Learning (SL) method with the Decision Tree (DT), Linear Regression (LR) and Random Forest (RF) algorithms which were interpreted according to the characteristics of the CTD data. ML will learn data models to understand and store. Next, the model is evaluated using accuracy metrics by measuring the difference between the predicted value and the actual value to obtain a good prediction model. The prediction results show a salinity level of 34.0 parts per thousand (ppt), meaning that in this area of marine waters salinity will affect the solubility of Oxygen (O 2 ) and play a major role in the sustainability and growth of the fertility level of biological resources which is supported by sea surface temperature conditions 29.2°C. So the salinity values obtained using ML techniques and marine resource potential can be assumed to have a strong correlation. The research results show that the RF model has the lowest level of prediction error based on the values: Mean Square Error (MSE) = 0.007; Root Mean Squared Error (RMSE) = 0.082; Mean Absolute Error (MAE) = 0.007 compared to DT model: MSE = 0.008; RMSE = 0.088; MAE = 0.012 and LR model: MSE = 1.008; RMSE = 1.004; MAE = 0.281. The equivalent RF and DT models have a Determination Coefficient (R 2 ) = 0.999, meaning that a model is created that is good at predicting, compared to the LR model with a value of R 2 = 0.914. The correlation between variables shows that the LR model is very linear with a Correlation Coefficient (r) = 1.000 compared to the DT model (r) = 0.621 and the RF model (r) = 0.379. Therefore the algorithm that has a value of (r) +1 has the best level of accuracy. The use of ML to predict marine resource potential is a relatively new research field, so this research has the potential to contribute data and information as a reference for innovative studies and investment decision material for investors.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"24 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139687562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Periodic Service Behavior Strain Analysis-Based Intrusion Detection in Cloud 基于周期性服务行为应变分析的云入侵检测

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.140.149

S. Priya, R. S. Ponmagal

: The problem of intrusion detection in cloud environments has been well studied. The presence of adversaries would challenge data security in the cloud by generating intrusion attacks towards the cloud data and should be mitigated for the development of the cloud environment. In mitigating intrusion attacks, there exist several techniques in the literature. The method uses different features like frequency of access, payload details, protocol mapping, etc. However, the methods need to improve to achieve the expected performance in detecting intrusion attacks. An efficient Periodic Service Behavior Strain Analysis (PSBSA) is presented to handle this issue. Unlike earlier methods, the PSBSA model analyzes the behavior of users in various time frames like historical, recent, and current spans. The model focused on identifying intrusion attacks in several constraints, not just considering the current nature. The performance of intrusion detection can be improved by viewing the user's behavior in historical, present, and recent timespan. Unlike other approaches, the proposed PSBSA model considers the user's behavior at different times in measuring the user's trust towards intrusion detection. Accordingly, the proposed PSBSA model analyzes the behavior of users under various situations. It examines the behavior in accessing the services at historical, current, and recent times. The method performs Historical Strain Analysis (HSA) Current Strain Analysis (CSA) and Recent Strain Analysis (RSA). HSA analysis is performed according to the historical data, CSA is performed based on the current access data and RSA is performed with the recent access data. The model estimates various legitimacy support values on each analysis to conclude the trust of any user. According to the support values, intrusion detection has been performed. The proposed PSBSA model introduces higher accuracy in intrusion detection in a cloud environment.

:云环境中的入侵检测问题已得到深入研究。敌方的存在会对云数据产生入侵攻击，从而对云数据安全构成挑战，因此应为云环境的发展采取缓解措施。在缓解入侵攻击方面，文献中存在多种技术。这些方法使用不同的特征，如访问频率、有效载荷细节、协议映射等。但是，这些方法还需要改进，以达到检测入侵攻击的预期性能。本文提出了一种高效的周期性服务行为应变分析法（PSBSA）来解决这一问题。与早期的方法不同，PSBSA 模型分析了用户在不同时间段内的行为，如历史、近期和当前跨度。该模型侧重于识别多个约束条件下的入侵攻击，而不仅仅考虑当前的性质。通过查看用户在历史、当前和最近时间段的行为，可以提高入侵检测的性能。与其他方法不同，所提出的 PSBSA 模型在衡量用户对入侵检测的信任度时，会考虑用户在不同时间段的行为。因此，所提出的 PSBSA 模型分析了用户在各种情况下的行为。它检查了用户在历史、当前和近期访问服务的行为。该方法执行历史应变分析（HSA）、当前应变分析（CSA）和近期应变分析（RSA）。HSA 分析根据历史数据进行，CSA 根据当前访问数据进行，RSA 根据最近访问数据进行。该模型对每项分析估算出不同的合法性支持值，从而得出用户信任度的结论。根据支持值进行入侵检测。所提出的 PSBSA 模型为云环境中的入侵检测带来了更高的准确性。

{"title":"Periodic Service Behavior Strain Analysis-Based Intrusion Detection in Cloud","authors":"S. Priya, R. S. Ponmagal","doi":"10.3844/jcssp.2024.140.149","DOIUrl":"https://doi.org/10.3844/jcssp.2024.140.149","url":null,"abstract":": The problem of intrusion detection in cloud environments has been well studied. The presence of adversaries would challenge data security in the cloud by generating intrusion attacks towards the cloud data and should be mitigated for the development of the cloud environment. In mitigating intrusion attacks, there exist several techniques in the literature. The method uses different features like frequency of access, payload details, protocol mapping, etc. However, the methods need to improve to achieve the expected performance in detecting intrusion attacks. An efficient Periodic Service Behavior Strain Analysis (PSBSA) is presented to handle this issue. Unlike earlier methods, the PSBSA model analyzes the behavior of users in various time frames like historical, recent, and current spans. The model focused on identifying intrusion attacks in several constraints, not just considering the current nature. The performance of intrusion detection can be improved by viewing the user's behavior in historical, present, and recent timespan. Unlike other approaches, the proposed PSBSA model considers the user's behavior at different times in measuring the user's trust towards intrusion detection. Accordingly, the proposed PSBSA model analyzes the behavior of users under various situations. It examines the behavior in accessing the services at historical, current, and recent times. The method performs Historical Strain Analysis (HSA) Current Strain Analysis (CSA) and Recent Strain Analysis (RSA). HSA analysis is performed according to the historical data, CSA is performed based on the current access data and RSA is performed with the recent access data. The model estimates various legitimacy support values on each analysis to conclude the trust of any user. According to the support values, intrusion detection has been performed. The proposed PSBSA model introduces higher accuracy in intrusion detection in a cloud environment.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"6 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139685196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Analytics for Imbalanced Dataset 不平衡数据集的数据分析

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.207.217

Madhura Prabha R, Sasikala S

: The primary issue in real-time big data classification is imbalanced datasets. Even though we have many balancing techniques to reduce imbalance ratio which is not suitable for big data that has scalability issues. This study is envisioned to explore different balancing techniques with experimental study. We tried comparing the effectiveness of various balancing strategies, including cutting-edge approaches for severely unbalanced data from online repositories. Here we apply SMOTE, SMOTE ENN and SMOTE Tomek balancing algorithms for dermatology, wine quality and diabetes datasets. After balancing the dataset, the balanced dataset is classified with AdaBoost and random forest algorithms. On three datasets, the outcomes show that the classification algorithm with the balancing technique improves the classification performance for imbalanced datasets. Experiment results showed that the SMOTE ENN technique produces higher classification with accuracy than the SMOTE and SMOTE Tomek techniques. The findings are analyzed with other factors like execution time and scalability. Though SMOTE Tomek produces 1.0 for a few datasets, its execution time is longer than SMOTE ENN. Therefore, SMOTE ENN with random forest classification produces 1.0 accuracy for all three datasets with less execution time. This experimental study analyses to create a novel ensemble technique for balancing highly imbalanced data.

:实时大数据分类的首要问题是数据集的不平衡。尽管我们有很多平衡技术来降低不平衡率，但这些技术并不适用于存在可扩展性问题的大数据。本研究旨在通过实验研究探索不同的平衡技术。我们尝试比较各种平衡策略的有效性，包括针对来自在线资源库的严重不平衡数据的前沿方法。在此，我们将 SMOTE、SMOTE ENN 和 SMOTE Tomek 平衡算法应用于皮肤病学、葡萄酒质量和糖尿病数据集。平衡数据集后，使用 AdaBoost 和随机森林算法对平衡后的数据集进行分类。在三个数据集上的结果表明，采用平衡技术的分类算法提高了不平衡数据集的分类性能。实验结果表明，SMOTE ENN 技术的分类准确率高于 SMOTE 和 SMOTE Tomek 技术。分析结果还考虑了其他因素，如执行时间和可扩展性。虽然 SMOTE Tomek 在一些数据集上的准确率达到了 1.0，但其执行时间却比 SMOTE ENN 长。因此，采用随机森林分类的 SMOTE ENN 在所有三个数据集上的准确率都能达到 1.0，而且执行时间更短。这项实验研究分析了如何创建一种新颖的集合技术来平衡高度不平衡的数据。

{"title":"Data Analytics for Imbalanced Dataset","authors":"Madhura Prabha R, Sasikala S","doi":"10.3844/jcssp.2024.207.217","DOIUrl":"https://doi.org/10.3844/jcssp.2024.207.217","url":null,"abstract":": The primary issue in real-time big data classification is imbalanced datasets. Even though we have many balancing techniques to reduce imbalance ratio which is not suitable for big data that has scalability issues. This study is envisioned to explore different balancing techniques with experimental study. We tried comparing the effectiveness of various balancing strategies, including cutting-edge approaches for severely unbalanced data from online repositories. Here we apply SMOTE, SMOTE ENN and SMOTE Tomek balancing algorithms for dermatology, wine quality and diabetes datasets. After balancing the dataset, the balanced dataset is classified with AdaBoost and random forest algorithms. On three datasets, the outcomes show that the classification algorithm with the balancing technique improves the classification performance for imbalanced datasets. Experiment results showed that the SMOTE ENN technique produces higher classification with accuracy than the SMOTE and SMOTE Tomek techniques. The findings are analyzed with other factors like execution time and scalability. Though SMOTE Tomek produces 1.0 for a few datasets, its execution time is longer than SMOTE ENN. Therefore, SMOTE ENN with random forest classification produces 1.0 accuracy for all three datasets with less execution time. This experimental study analyses to create a novel ensemble technique for balancing highly imbalanced data.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"293 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139884057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Analytics for Imbalanced Dataset 不平衡数据集的数据分析

Journal of Computer Science

Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.207.217

Madhura Prabha R, Sasikala S

: The primary issue in real-time big data classification is imbalanced datasets. Even though we have many balancing techniques to reduce imbalance ratio which is not suitable for big data that has scalability issues. This study is envisioned to explore different balancing techniques with experimental study. We tried comparing the effectiveness of various balancing strategies, including cutting-edge approaches for severely unbalanced data from online repositories. Here we apply SMOTE, SMOTE ENN and SMOTE Tomek balancing algorithms for dermatology, wine quality and diabetes datasets. After balancing the dataset, the balanced dataset is classified with AdaBoost and random forest algorithms. On three datasets, the outcomes show that the classification algorithm with the balancing technique improves the classification performance for imbalanced datasets. Experiment results showed that the SMOTE ENN technique produces higher classification with accuracy than the SMOTE and SMOTE Tomek techniques. The findings are analyzed with other factors like execution time and scalability. Though SMOTE Tomek produces 1.0 for a few datasets, its execution time is longer than SMOTE ENN. Therefore, SMOTE ENN with random forest classification produces 1.0 accuracy for all three datasets with less execution time. This experimental study analyses to create a novel ensemble technique for balancing highly imbalanced data.

:实时大数据分类的首要问题是数据集的不平衡。尽管我们有很多平衡技术来降低不平衡率，但这些技术并不适用于存在可扩展性问题的大数据。本研究旨在通过实验研究探索不同的平衡技术。我们尝试比较各种平衡策略的有效性，包括针对来自在线资源库的严重不平衡数据的前沿方法。在此，我们将 SMOTE、SMOTE ENN 和 SMOTE Tomek 平衡算法应用于皮肤病学、葡萄酒质量和糖尿病数据集。平衡数据集后，使用 AdaBoost 和随机森林算法对平衡后的数据集进行分类。在三个数据集上的结果表明，采用平衡技术的分类算法提高了不平衡数据集的分类性能。实验结果表明，SMOTE ENN 技术的分类准确率高于 SMOTE 和 SMOTE Tomek 技术。分析结果还考虑了其他因素，如执行时间和可扩展性。虽然 SMOTE Tomek 在一些数据集上的准确率达到了 1.0，但其执行时间却比 SMOTE ENN 长。因此，采用随机森林分类的 SMOTE ENN 在所有三个数据集上的准确率都能达到 1.0，而且执行时间更短。这项实验研究分析了如何创建一种新颖的集合技术来平衡高度不平衡的数据。

{"title":"Data Analytics for Imbalanced Dataset","authors":"Madhura Prabha R, Sasikala S","doi":"10.3844/jcssp.2024.207.217","DOIUrl":"https://doi.org/10.3844/jcssp.2024.207.217","url":null,"abstract":": The primary issue in real-time big data classification is imbalanced datasets. Even though we have many balancing techniques to reduce imbalance ratio which is not suitable for big data that has scalability issues. This study is envisioned to explore different balancing techniques with experimental study. We tried comparing the effectiveness of various balancing strategies, including cutting-edge approaches for severely unbalanced data from online repositories. Here we apply SMOTE, SMOTE ENN and SMOTE Tomek balancing algorithms for dermatology, wine quality and diabetes datasets. After balancing the dataset, the balanced dataset is classified with AdaBoost and random forest algorithms. On three datasets, the outcomes show that the classification algorithm with the balancing technique improves the classification performance for imbalanced datasets. Experiment results showed that the SMOTE ENN technique produces higher classification with accuracy than the SMOTE and SMOTE Tomek techniques. The findings are analyzed with other factors like execution time and scalability. Though SMOTE Tomek produces 1.0 for a few datasets, its execution time is longer than SMOTE ENN. Therefore, SMOTE ENN with random forest classification produces 1.0 accuracy for all three datasets with less execution time. This experimental study analyses to create a novel ensemble technique for balancing highly imbalanced data.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"501 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139824265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0