To develop rice varieties with better nutritional qualities, it is important to classify rice seeds accurately. Hyperspectral imaging can be used to extract spectral information from rice seeds, which can then be used to classify them into different varieties. The challenges of precise classification increase when there are many classes and few training samples. In this paper, we present a novel method for high-precision Hyperspectral Image (HSI) classification of 90 different classes of rice seeds using ensemble deep learning. Our method first employs band selection techniques to select the optimal hyperspectral bands for rice seed classification. Then, a deep neural network is trained with the selected hyperspectral and RGB data from rice seed images to obtain different models for different bands. Finally, an ensemble of deep learning models is employed to classify rice seed images and improve classification accuracy. The proposed method achieves an overall precision ranging from 92.73 to 96.17% despite a large number of classes and low data samples for each class and with only 15 selected hyperspectral bands. This precision is significantly higher than the state-of-the-art classical machine learning methods like random forest, confirming the effectiveness of the proposed method in classifying hyperspectral images of rice seeds.
{"title":"Ensemble deep learning for high-precision classification of 90 rice seed varieties from hyperspectral images","authors":"AmirMasoud Taheri, Hossein Ebrahimnezhad, Mohammadhossein Sedaaghi","doi":"10.1007/s12652-024-04782-2","DOIUrl":"https://doi.org/10.1007/s12652-024-04782-2","url":null,"abstract":"<p>To develop rice varieties with better nutritional qualities, it is important to classify rice seeds accurately. Hyperspectral imaging can be used to extract spectral information from rice seeds, which can then be used to classify them into different varieties. The challenges of precise classification increase when there are many classes and few training samples. In this paper, we present a novel method for high-precision Hyperspectral Image (HSI) classification of 90 different classes of rice seeds using ensemble deep learning. Our method first employs band selection techniques to select the optimal hyperspectral bands for rice seed classification. Then, a deep neural network is trained with the selected hyperspectral and RGB data from rice seed images to obtain different models for different bands. Finally, an ensemble of deep learning models is employed to classify rice seed images and improve classification accuracy. The proposed method achieves an overall precision ranging from 92.73 to 96.17% despite a large number of classes and low data samples for each class and with only 15 selected hyperspectral bands. This precision is significantly higher than the state-of-the-art classical machine learning methods like random forest, confirming the effectiveness of the proposed method in classifying hyperspectral images of rice seeds.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140570351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1007/s12652-024-04778-y
Birendra Kumar Verma, Ajay Kumar Yadav
As software gets more complicated, diverse, and crucial to people’s daily lives, exploitable software vulnerabilities constitute a major security risk to the computer system. These vulnerabilities allow unauthorized access, which can cause losses in banking, energy, the military, healthcare, and other key infrastructure systems. Most vulnerability scoring methods employ Natural Language Processing to generate models from descriptions. These models ignore Impact scores, Exploitability scores, Attack Complexity and other statistical features when scoring vulnerabilities. A feature vector for machine learning models is created from a description, impact score, exploitability score, attack complexity score, etc. We score vulnerabilities more precisely than we categorize them. The Decision Tree Regressor, Random Forest Regressor, AdaBoost Regressor, K-nearest Neighbors Regressor, and Support Vector Regressor have been evaluated using the metrics explained variance, r-squared, mean absolute error, mean squared error, and root mean squared error. The tenfold cross-validation method verifies regressor test results. The research uses 193,463 Common Vulnerabilities and Exposures from the National Vulnerability Database. The Random Forest regressor performed well on four of the five criteria, and the tenfold cross-validation test performed even better (0.9968 vs. 0.9958).
{"title":"Software security with natural language processing and vulnerability scoring using machine learning approach","authors":"Birendra Kumar Verma, Ajay Kumar Yadav","doi":"10.1007/s12652-024-04778-y","DOIUrl":"https://doi.org/10.1007/s12652-024-04778-y","url":null,"abstract":"<p>As software gets more complicated, diverse, and crucial to people’s daily lives, exploitable software vulnerabilities constitute a major security risk to the computer system. These vulnerabilities allow unauthorized access, which can cause losses in banking, energy, the military, healthcare, and other key infrastructure systems. Most vulnerability scoring methods employ Natural Language Processing to generate models from descriptions. These models ignore Impact scores, Exploitability scores, Attack Complexity and other statistical features when scoring vulnerabilities. A feature vector for machine learning models is created from a description, impact score, exploitability score, attack complexity score, etc. We score vulnerabilities more precisely than we categorize them. The Decision Tree Regressor, Random Forest Regressor, AdaBoost Regressor, K-nearest Neighbors Regressor, and Support Vector Regressor have been evaluated using the metrics explained variance, r-squared, mean absolute error, mean squared error, and root mean squared error. The tenfold cross-validation method verifies regressor test results. The research uses 193,463 Common Vulnerabilities and Exposures from the National Vulnerability Database. The Random Forest regressor performed well on four of the five criteria, and the tenfold cross-validation test performed even better (0.9968 vs. 0.9958).</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140570516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1007/s12652-024-04772-4
Vishal Gupta, Aanchal Gondhi
In this paper, we have proved fixed point results for a pair of soft fuzzy maps in complete ordered soft metric spaces. We have also given some useful corollaries to our main result along with examples. Moreover, the application is also presented in this communication to show the validity of new results.
{"title":"Existence of fixed points in soft metric spaces with application to boundary value problem","authors":"Vishal Gupta, Aanchal Gondhi","doi":"10.1007/s12652-024-04772-4","DOIUrl":"https://doi.org/10.1007/s12652-024-04772-4","url":null,"abstract":"<p>In this paper, we have proved fixed point results for a pair of soft fuzzy maps in complete ordered soft metric spaces. We have also given some useful corollaries to our main result along with examples. Moreover, the application is also presented in this communication to show the validity of new results.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140570355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1007/s12652-024-04777-z
Abstract
Video summarization is an emerging research field. In particular, static video summarization plays a major role in abstraction and indexing of video repositories. It extracts the vital events in a video such that it covers the entire content of the video. Frames having those important events are called keyframes which are eventually used in video indexing. It also helps in giving an abstract view of the video content such that the internet users are aware of the events present in the video before watching it completely. The proposed research work is focused on efficient static video summarization by extracting various visual features namely color, texture and shape features. These features are aggregated and clustered using a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. In order to produce good video summary by clustering, the parameters of DBSCAN algorithm are optimized by using a meta heuristic population based optimization called Artificial Algae Algorithm (AAA). The experimental results on two public datasets namely VSUMM and OVP dataset show that the proposed Static Video Summarization with Multi-objective Constrained Optimization (SVS_MCO) achieves better results when compared to existing methods.
{"title":"Static video summarization with multi-objective constrained optimization","authors":"","doi":"10.1007/s12652-024-04777-z","DOIUrl":"https://doi.org/10.1007/s12652-024-04777-z","url":null,"abstract":"<h3>Abstract</h3> <p>Video summarization is an emerging research field. In particular, static video summarization plays a major role in abstraction and indexing of video repositories. It extracts the vital events in a video such that it covers the entire content of the video. Frames having those important events are called keyframes which are eventually used in video indexing. It also helps in giving an abstract view of the video content such that the internet users are aware of the events present in the video before watching it completely. The proposed research work is focused on efficient static video summarization by extracting various visual features namely color, texture and shape features. These features are aggregated and clustered using a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. In order to produce good video summary by clustering, the parameters of DBSCAN algorithm are optimized by using a meta heuristic population based optimization called Artificial Algae Algorithm (AAA). The experimental results on two public datasets namely VSUMM and OVP dataset show that the proposed Static Video Summarization with Multi-objective Constrained Optimization (SVS_MCO) achieves better results when compared to existing methods.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140570439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-30DOI: 10.1007/s12652-024-04773-3
Mohsen Shahmohammadi, M. Fakhrzad, H. H. Nasab, S. F. Ghannadpour
{"title":"An intelligent auction-based capacity allocation algorithm in shared railways","authors":"Mohsen Shahmohammadi, M. Fakhrzad, H. H. Nasab, S. F. Ghannadpour","doi":"10.1007/s12652-024-04773-3","DOIUrl":"https://doi.org/10.1007/s12652-024-04773-3","url":null,"abstract":"","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140363238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-28DOI: 10.1007/s12652-024-04767-1
Yanle Li
The rapid development of multimedia information processing technology provides development opportunities for digitization in sports, among which motion capture technology, as the latest achievement of multimedia information processing technology, has gradually gained the attention of scholars and started to be used for visualization of sports movements. Therefore, this paper introduces a monocular video motion capture method and optimizes it for the problems of reconstructing human movements such as floating, ground penetration and sliding, which provides a technical path for the specific application of motion capture technology in the field of sports training and also provides a technical guarantee for the visualization of sports training movements. Introduced a new motion capture optimization method. This method captures human motion trajectories from monocular videos, and trajectory operations combine human pose estimation and physical constraints. The proposed method uses foot contact judgment to obtain foot contact events for each motion frame. Then, it optimizes the overall body motion trajectory of the key points based on the obtained contact conditions, making the generated motion visually closer to reality. This article proposes LiteHumanPose Net with a inference speed of up to 22FPS, and conducts experimental analysis and comparison of several popular pose estimation methods from the perspectives of frame rate and average accuracy, such as Sim pleBaseline, HRNet, and Hourglass Net. LiteHumanPose Net outperforms Hourglass Net in terms of frame rate and accuracy, while HRNet has high accuracy due to its multiple parameters but low frame rate. The LiteHumanPose network proposed in this article has a good balance between accuracy and frame rate, and has obvious landing advantages.
{"title":"Visualization of movements in sports training based on multimedia information processing technology","authors":"Yanle Li","doi":"10.1007/s12652-024-04767-1","DOIUrl":"https://doi.org/10.1007/s12652-024-04767-1","url":null,"abstract":"<p>The rapid development of multimedia information processing technology provides development opportunities for digitization in sports, among which motion capture technology, as the latest achievement of multimedia information processing technology, has gradually gained the attention of scholars and started to be used for visualization of sports movements. Therefore, this paper introduces a monocular video motion capture method and optimizes it for the problems of reconstructing human movements such as floating, ground penetration and sliding, which provides a technical path for the specific application of motion capture technology in the field of sports training and also provides a technical guarantee for the visualization of sports training movements. Introduced a new motion capture optimization method. This method captures human motion trajectories from monocular videos, and trajectory operations combine human pose estimation and physical constraints. The proposed method uses foot contact judgment to obtain foot contact events for each motion frame. Then, it optimizes the overall body motion trajectory of the key points based on the obtained contact conditions, making the generated motion visually closer to reality. This article proposes LiteHumanPose Net with a inference speed of up to 22FPS, and conducts experimental analysis and comparison of several popular pose estimation methods from the perspectives of frame rate and average accuracy, such as Sim pleBaseline, HRNet, and Hourglass Net. LiteHumanPose Net outperforms Hourglass Net in terms of frame rate and accuracy, while HRNet has high accuracy due to its multiple parameters but low frame rate. The LiteHumanPose network proposed in this article has a good balance between accuracy and frame rate, and has obvious landing advantages.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140325797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-28DOI: 10.1007/s12652-024-04775-1
Sakshi Jain, Pradeep Kumar Roy
Coronavirus belongs to the family of Coronaviridae. It is responsible for COVID-19 communicable disease, which has affected 213 countries and territories worldwide. Researchers in computational fields have been active in proposing techniques to filter the information and recommendations about this disease and provide surveillance in controlling this outbreak. Researchers used Chest X-ray images, abdominal Computed Tomography scans, and Tweet datasets for building machine learning and deep learning-based models for COVID-19 predictions and forecasting purposes. Accuracy, sensitivity, specificity, precision, and F1-measure are the five primary evaluation criteria researchers employ to evaluate the quality of their study. This article summarises research works on COVID-19 based on machine learning and deep learning models. The analysis of these research works, along with their limitations and source of datasets, will give a quick start for future research to arrive at a defined direction.
冠状病毒属于冠状病毒科。它是 COVID-19 传染病的元凶,已影响到全球 213 个国家和地区。计算领域的研究人员一直在积极提出技术,以过滤有关该疾病的信息和建议,并为控制疫情提供监控。研究人员利用胸部 X 光图像、腹部计算机断层扫描和 Tweet 数据集,建立了基于机器学习和深度学习的模型,用于 COVID-19 的预测和预报。准确性、灵敏度、特异性、精确度和 F1 测量是研究人员评估研究质量的五个主要评价标准。本文总结了基于机器学习和深度学习模型的 COVID-19 研究工作。对这些研究成果及其局限性和数据集来源的分析,将为未来的研究提供一个快速起点,从而确定研究方向。
{"title":"A study of learning models for COVID-19 disease prediction","authors":"Sakshi Jain, Pradeep Kumar Roy","doi":"10.1007/s12652-024-04775-1","DOIUrl":"https://doi.org/10.1007/s12652-024-04775-1","url":null,"abstract":"<p>Coronavirus belongs to the family of Coronaviridae. It is responsible for COVID-19 communicable disease, which has affected 213 countries and territories worldwide. Researchers in computational fields have been active in proposing techniques to filter the information and recommendations about this disease and provide surveillance in controlling this outbreak. Researchers used Chest X-ray images, abdominal Computed Tomography scans, and Tweet datasets for building machine learning and deep learning-based models for COVID-19 predictions and forecasting purposes. Accuracy, sensitivity, specificity, precision, and F1-measure are the five primary evaluation criteria researchers employ to evaluate the quality of their study. This article summarises research works on COVID-19 based on machine learning and deep learning models. The analysis of these research works, along with their limitations and source of datasets, will give a quick start for future research to arrive at a defined direction.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140325799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-27DOI: 10.1007/s12652-024-04771-5
Anas M. Al-Oraiqat, Oleksandr Drieiev, Hanna Drieieva, Yelyzaveta Meleshko, Hazim AlRawashdeh, Karim A. Al-Oraiqat, Yassin M. Y. Hasan, Noor Maricar, Sheroz Khan
Crowds can lead up to severe disasterous consequences resulting in fatalities. Videos obtained through public cameras or captured by drones flying overhead can be processed with artificial intelligence-based crowd analysis systems. Being a hot area of research over the past few years, the goal is not only to identify the presence of crowds but also to predict the probability of crowd-formation in order to issue timely warnings and preventive measures. Such systems will significantly reduce the probablity of the potential disasters. Developing effective systems is a challenging task, especially due to factors such as naturally occuring diverse conditions, variations in people or background pixel areas, noise, behaviors of individuals, relative amounts/distributions/directions of crowd movements, and crowd building reasons. This paper proposes an infrared video processing system based on U-Net convolutional neural network for crowd monitoring in infrared video frames to help estimate the people crowd with normal or abnormal trends. The proposed U-Net architecture aims to efficiently extract crowd features, achieve sufficient people marking-up accuracy, competitively with optimal network configurations in terms of the depth and number of filters to consequently minimise the number of coefficients. For further faster processing, hardware resources/implementation area savings, and lower power, the optimized network coefficients measured are represented in Canonic-Signed Digit with minimal number of nonzero (± 1) digits, minimizing the number of underlying shift-add/subtract operations of all multipliers. The achieved significantly reduced computational cost makes the proposed U-Net effectively suitable for resource-constrained and low power applications.
{"title":"Spatiotemporal crowds features extraction of infrared images using neural network","authors":"Anas M. Al-Oraiqat, Oleksandr Drieiev, Hanna Drieieva, Yelyzaveta Meleshko, Hazim AlRawashdeh, Karim A. Al-Oraiqat, Yassin M. Y. Hasan, Noor Maricar, Sheroz Khan","doi":"10.1007/s12652-024-04771-5","DOIUrl":"https://doi.org/10.1007/s12652-024-04771-5","url":null,"abstract":"<p>Crowds can lead up to severe disasterous consequences resulting in fatalities. Videos obtained through public cameras or captured by drones flying overhead can be processed with artificial intelligence-based crowd analysis systems. Being a hot area of research over the past few years, the goal is not only to identify the presence of crowds but also to predict the probability of crowd-formation in order to issue timely warnings and preventive measures. Such systems will significantly reduce the probablity of the potential disasters. Developing effective systems is a challenging task, especially due to factors such as naturally occuring diverse conditions, variations in people or background pixel areas, noise, behaviors of individuals, relative amounts/distributions/directions of crowd movements, and crowd building reasons. This paper proposes an infrared video processing system based on U-Net convolutional neural network for crowd monitoring in infrared video frames to help estimate the people crowd with normal or abnormal trends. The proposed U-Net architecture aims to efficiently extract crowd features, achieve sufficient people marking-up accuracy, competitively with optimal network configurations in terms of the depth and number of filters to consequently minimise the number of coefficients. For further faster processing, hardware resources/implementation area savings, and lower power, the optimized network coefficients measured are represented in Canonic-Signed Digit with minimal number of nonzero (<b>± 1</b>) digits, minimizing the number of underlying shift-add/subtract operations of all multipliers. The achieved significantly reduced computational cost makes the proposed U-Net effectively suitable for resource-constrained and low power applications.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1007/s12652-024-04761-7
Pranati Rakshit, Sarbajeet Paul, Shruti Dey
Sign language recognition is an important social issue to be addressed which can benefit the deaf and hard of hearing community by providing easier and faster communication. Some previous studies on sign language recognition have used complex input modalities and feature extraction methods, limiting their practical applicability. This research aims to compare two custom-made convolutional neural network (CNN) models for recognizing American Sign Language (ASL) letters from A to Z, and determine which model performs better. The proposed models utilize a combination of CNN and Softmax activation function, which are powerful and widely used classification methods in the field of computer vision. The purpose of the proposed study is to compare the performance of two specially created CNN models for identifying 26 distinct hand signals that represent the 26 English alphabets. The study found that Model_2 had better overall performance than Model_1, with an accuracy of 98.44% and F1 score 98.41%. However, the performance of each model varied depending on the specific label, suggesting that the choice of model may depend on the specific use case and the labels of interest. This research contributes to the growing field of sign language recognition using deep learning techniques and highlights the importance of designing custom models.
手语识别是一个亟待解决的重要社会问题,它能为聋人和重听者提供更方便快捷的交流,从而使他们受益。之前一些关于手语识别的研究使用了复杂的输入模式和特征提取方法,限制了其实际应用性。本研究旨在比较两种定制的卷积神经网络(CNN)模型,以识别从 A 到 Z 的美国手语(ASL)字母,并确定哪种模型性能更好。所提出的模型结合使用了 CNN 和 Softmax 激活函数,这两种方法都是计算机视觉领域中强大且广泛使用的分类方法。拟议研究的目的是比较两个专门创建的 CNN 模型在识别代表 26 个英文字母的 26 个不同手势方面的性能。研究发现,Model_2 的整体性能优于 Model_1,准确率为 98.44%,F1 分数为 98.41%。然而,每个模型的性能因具体标签而异,这表明模型的选择可能取决于具体的使用情况和感兴趣的标签。这项研究为使用深度学习技术进行手语识别这一日益增长的领域做出了贡献,并强调了设计定制模型的重要性。
{"title":"Sign language detection using convolutional neural network","authors":"Pranati Rakshit, Sarbajeet Paul, Shruti Dey","doi":"10.1007/s12652-024-04761-7","DOIUrl":"https://doi.org/10.1007/s12652-024-04761-7","url":null,"abstract":"<p>Sign language recognition is an important social issue to be addressed which can benefit the deaf and hard of hearing community by providing easier and faster communication. Some previous studies on sign language recognition have used complex input modalities and feature extraction methods, limiting their practical applicability. This research aims to compare two custom-made convolutional neural network (CNN) models for recognizing American Sign Language (ASL) letters from A to Z, and determine which model performs better. The proposed models utilize a combination of CNN and Softmax activation function, which are powerful and widely used classification methods in the field of computer vision. The purpose of the proposed study is to compare the performance of two specially created CNN models for identifying 26 distinct hand signals that represent the 26 English alphabets. The study found that Model_2 had better overall performance than Model_1, with an accuracy of 98.44% and F1 score 98.41%. However, the performance of each model varied depending on the specific label, suggesting that the choice of model may depend on the specific use case and the labels of interest. This research contributes to the growing field of sign language recognition using deep learning techniques and highlights the importance of designing custom models.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-25DOI: 10.1007/s12652-024-04776-0
Arman Daliri, Roghaye Sadeghi, Neda Sedighian, Abbas Karimi, Javad Mohammadzadeh
There have been many connections between medical science and artificial intelligence in recent years. Many problems arise with the integrity of communication. Cardiac arrhythmia, carried out using artificial intelligence methods, is one of the most dangerous diseases in the field of prevention. Topics introduced in artificial intelligence are the automatic selection of balancing and classification algorithms. In this study, metrics for machine learning algorithm selection are presented. The first problem is the problem of choosing the best balancing algorithm to balance the data sets, introduced as triangle rate (TR). The second issue to be studied is selecting the best automatic classification algorithm. The third action was to use a scoring algorithm to predict sinus and non-sinus arrhythmias. The heptagonal reinforcement learning (HRL) achieved results competitive with standard algorithms by combining three types of algorithms. The data used in this study was a 12-lead electrocardiogram (ECG) database of arrhythmias. The number of patients examined in this dataset is 10,646. The HRL algorithm has improved the previous algorithms by 5%, achieving 86% cardiac arrhythmia prediction.
{"title":"Heptagonal Reinforcement Learning (HRL): a novel algorithm for early prevention of non-sinus cardiac arrhythmia","authors":"Arman Daliri, Roghaye Sadeghi, Neda Sedighian, Abbas Karimi, Javad Mohammadzadeh","doi":"10.1007/s12652-024-04776-0","DOIUrl":"https://doi.org/10.1007/s12652-024-04776-0","url":null,"abstract":"<p>There have been many connections between medical science and artificial intelligence in recent years. Many problems arise with the integrity of communication. Cardiac arrhythmia, carried out using artificial intelligence methods, is one of the most dangerous diseases in the field of prevention. Topics introduced in artificial intelligence are the automatic selection of balancing and classification algorithms. In this study, metrics for machine learning algorithm selection are presented. The first problem is the problem of choosing the best balancing algorithm to balance the data sets, introduced as triangle rate (TR). The second issue to be studied is selecting the best automatic classification algorithm. The third action was to use a scoring algorithm to predict sinus and non-sinus arrhythmias. The heptagonal reinforcement learning (HRL) achieved results competitive with standard algorithms by combining three types of algorithms. The data used in this study was a 12-lead electrocardiogram (ECG) database of arrhythmias. The number of patients examined in this dataset is 10,646. The HRL algorithm has improved the previous algorithms by 5%, achieving 86% cardiac arrhythmia prediction.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}