International Journal of Image and Graphics最新文献_第5页

Remote Sensing Pansharpening with TV-H−1 Decomposition and PSO-Based Adaptive Weighting Method 基于TV-H−1分解和PSO的自适应加权方法的遥感全景锐化

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-25 DOI: 10.1142/s021946782450061x

Dharaj. Sangani, R. Thakker, S. Panchal, Rajesh Gogineni

In remote sensing, owing to existing sensors’ limitations and the tradeoff between signal-to-noise ratio (SNR) and instantaneous field of view (IFOV), it is difficult to obtain a single image with good spectral and spatial resolution. Pansharpening (PS) is the technique for sharpening multispectral (MS) images by extracting structural and edge information of panchromatic (PAN) image. Multiscale decomposition methods are used for decomposing image in sub-bands but are affected by ringing artifacts, therefore the resultant image seems to be blurred and misregistered. The proposed method overcomes this drawback by decomposing PAN and four band MS image into cartoon and texture components with total variation (TV) Hilbert[Formula: see text] model. The particle swarm optimization (PSO) algorithm is used for finding the optimum weight for fusing texture and cartoon details of PAN and MS images. The proposed method is practically validated on both full-scale and reduced-scale. Robustness of our proposed approach is tested on different geographical areas such as hilly, urban, and vegetation areas. From the visual analysis and qualitative parameters, the proposed method is proved effective compared with other traditional approaches.

在遥感技术中，由于现有传感器的局限性以及在信噪比(SNR)和瞬时视场(IFOV)之间的权衡，很难获得具有良好光谱分辨率和空间分辨率的单幅图像。Pansharpening (PS)是一种通过提取全色图像的结构信息和边缘信息对多光谱图像进行锐化的技术。多尺度分解方法用于分解子带图像，但受环形伪影的影响，因此所得图像似乎模糊不清和配错。该方法采用全变分(TV) Hilbert[公式:见文]模型，将PAN和四波段MS图像分解为卡通和纹理分量。采用粒子群优化(PSO)算法对PAN和MS图像的纹理和卡通细节进行融合，找出最优权值。该方法在全尺寸和缩小尺寸上都得到了实际验证。我们提出的方法的稳健性在不同的地理区域，如丘陵，城市和植被区进行了测试。从视觉分析和定性参数的角度，与其他传统方法相比，证明了该方法的有效性。

{"title":"Remote Sensing Pansharpening with TV-H−1 Decomposition and PSO-Based Adaptive Weighting Method","authors":"Dharaj. Sangani, R. Thakker, S. Panchal, Rajesh Gogineni","doi":"10.1142/s021946782450061x","DOIUrl":"https://doi.org/10.1142/s021946782450061x","url":null,"abstract":"In remote sensing, owing to existing sensors’ limitations and the tradeoff between signal-to-noise ratio (SNR) and instantaneous field of view (IFOV), it is difficult to obtain a single image with good spectral and spatial resolution. Pansharpening (PS) is the technique for sharpening multispectral (MS) images by extracting structural and edge information of panchromatic (PAN) image. Multiscale decomposition methods are used for decomposing image in sub-bands but are affected by ringing artifacts, therefore the resultant image seems to be blurred and misregistered. The proposed method overcomes this drawback by decomposing PAN and four band MS image into cartoon and texture components with total variation (TV) Hilbert[Formula: see text] model. The particle swarm optimization (PSO) algorithm is used for finding the optimum weight for fusing texture and cartoon details of PAN and MS images. The proposed method is practically validated on both full-scale and reduced-scale. Robustness of our proposed approach is tested on different geographical areas such as hilly, urban, and vegetation areas. From the visual analysis and qualitative parameters, the proposed method is proved effective compared with other traditional approaches.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43317066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Breast Mass Lesion Detection in Mammogram Image 乳房x光图像中肿块病灶的自动检测

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-22 DOI: 10.1142/s0219467824500566

R. Bania, A. Halder

Mammography imaging is one of the most successful techniques for breast cancer screening and detecting breast lesions. Detection of the Region of Interest (ROI) (where the possible abnormalities could be present) is the backbone for the success of any Computer-Aided Detection or Diagnosis (CADx) system. In this paper, to assist the CADx system, one computational model is proposed to detect breast mass lesions from mammogram images. At the beginning of the process, pectoral muscles from the mammograms are removed as a pre-processing step. Then by applying an automatic thresholding scheme with the required image processing techniques, different regions of breast tissues are ranked to detect the possible suspected region to refine the further segmentation task. One seeded region growing approach is proposed with an automatic seed selection criterion to detect the suspected region to segment the ROI. The proposed model has very less user intervention as maximum of the parameters are computed automatically. To evaluate the performance of the proposed model, it is compared with four different methods with six different evaluation metrics viz., Jaccard & Dice co-efficient, relative error, segmentation accuracy, error and Fowlkes–Mallows index (FMI). On the proposed model, 57 mammogram images are tested, consisting of four different cases that are collected from the publicly available benchmark database. The qualitative and quantitative analyses are performed to evaluate the proposed model. The best dice co-efficient, Jaccard co-efficient, accuracy, error and FMI values observed are 0.9506, 0.9471, 95.62%, 4.38% and 0.932, respectively. The superiority of the model over six state-of-the-art compared methods is well evident from the experimental results.

乳腺造影是癌症筛查和检测乳腺病变最成功的技术之一。感兴趣区域（ROI）的检测（可能存在异常的地方）是任何计算机辅助检测或诊断（CADx）系统成功的支柱。在本文中，为了辅助CADx系统，提出了一个计算模型来从乳房X光图像中检测乳腺肿块病变。在这个过程的开始，乳房X光片中的胸肌被去除，作为预处理步骤。然后，通过应用具有所需图像处理技术的自动阈值化方案，对乳腺组织的不同区域进行排序，以检测可能的可疑区域，从而细化进一步的分割任务。提出了一种基于自动种子选择准则的种子区域生长方法，用于检测可疑区域以分割ROI。所提出的模型具有非常少的用户干预，因为参数的最大值是自动计算的。为了评估所提出的模型的性能，将其与四种不同的方法进行了比较，并采用了六种不同的评估指标，即Jaccard&Dice系数、相对误差、分割精度、误差和Fowlkes–Mallows指数（FMI）。在所提出的模型上，测试了57张乳房X光图像，包括从公开的基准数据库中收集的四个不同病例。对所提出的模型进行了定性和定量分析。观察到的最佳骰子系数、Jaccard系数、准确度、误差和FMI值分别为0.9506、0.9471、95.62%、4.38%和0.932。实验结果表明，该模型优于六种最先进的比较方法。

{"title":"Automatic Breast Mass Lesion Detection in Mammogram Image","authors":"R. Bania, A. Halder","doi":"10.1142/s0219467824500566","DOIUrl":"https://doi.org/10.1142/s0219467824500566","url":null,"abstract":"Mammography imaging is one of the most successful techniques for breast cancer screening and detecting breast lesions. Detection of the Region of Interest (ROI) (where the possible abnormalities could be present) is the backbone for the success of any Computer-Aided Detection or Diagnosis (CADx) system. In this paper, to assist the CADx system, one computational model is proposed to detect breast mass lesions from mammogram images. At the beginning of the process, pectoral muscles from the mammograms are removed as a pre-processing step. Then by applying an automatic thresholding scheme with the required image processing techniques, different regions of breast tissues are ranked to detect the possible suspected region to refine the further segmentation task. One seeded region growing approach is proposed with an automatic seed selection criterion to detect the suspected region to segment the ROI. The proposed model has very less user intervention as maximum of the parameters are computed automatically. To evaluate the performance of the proposed model, it is compared with four different methods with six different evaluation metrics viz., Jaccard & Dice co-efficient, relative error, segmentation accuracy, error and Fowlkes–Mallows index (FMI). On the proposed model, 57 mammogram images are tested, consisting of four different cases that are collected from the publicly available benchmark database. The qualitative and quantitative analyses are performed to evaluate the proposed model. The best dice co-efficient, Jaccard co-efficient, accuracy, error and FMI values observed are 0.9506, 0.9471, 95.62%, 4.38% and 0.932, respectively. The superiority of the model over six state-of-the-art compared methods is well evident from the experimental results.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43630262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fine_Denseiganet: Automatic Medical Image Classification in Chest CT Scan Using Hybrid Deep Learning Framework Fine_Denseiganet:基于混合深度学习框架的胸部CT图像自动分类

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-22 DOI: 10.1142/s0219467825500044

Hemlata Sahu, R. Kashyap

Medical image classification is one of the most significant tasks in computer-aided diagnosis. In the era of modern healthcare, the progress of digitalized medical images has led to a crucial role in analyzing medical image analysis. Recently, accurate disease recognition from medical Computed Tomography (CT) images remains a challenging scenario which is important in rendering effective treatment to patients. The infectious COVID-19 disease is highly contagious and leads to a rapid increase in infected individuals. Some drawbacks noticed with RT-PCR kits are high false negative rate (FNR) and a shortage in the number of test kits. Hence, a Chest CT scan is introduced instead of RT-PCR which plays an important role in diagnosing and screening COVID-19 infections. However, manual examination of CT scans performed by radiologists can be time-consuming, and a manual review of each individual CT image may not be feasible in emergencies. Therefore, there is a need to perform automated COVID-19 detection with the advances in AI-based models. This work presents effective and automatic Deep Learning (DL)-based COVID-19 detection using Chest CT images. Initially, the data is gathered and pre-processed through Spatial Weighted Bilateral Filter (SWBF) to eradicate unwanted distortions. The extraction of deep features is processed using Fine_Dense Convolutional Network (Fine_DenseNet). For classification, the Softmax layer of Fine_DenseNet is replaced using Improved Generative Adversarial Network_Artificial Hummingbird (IGAN_AHb) model in order to train the data on the labeled and unlabeled dataset. The loss in the network model is optimized using Artificial Hummingbird (AHb) optimization algorithm. Here, the proposed DL model (Fine_DenseIGANet) is used to perform automated multi-class classification of COVID-19 using CT scan images and attained a superior classification accuracy of 95.73% over other DL models.

医学图像分类是计算机辅助诊断中最重要的任务之一。在现代医疗保健时代，数字化医学图像的进步在分析医学图像分析中发挥了至关重要的作用。最近，从医学计算机断层扫描（CT）图像中准确识别疾病仍然是一个具有挑战性的场景，这对于为患者提供有效治疗非常重要。传染性新冠肺炎疾病具有高度传染性，并导致感染者迅速增加。RT-PCR试剂盒的一些缺点是假阴性率高（FNR）和检测试剂盒数量短缺。因此，引入了胸部CT扫描，而不是在诊断和筛查新冠肺炎感染中发挥重要作用的RT-PCR。然而，放射科医生对CT扫描进行的手动检查可能很耗时，在紧急情况下，对每个单独的CT图像进行手动检查可能不可行。因此，随着人工智能模型的进步，有必要进行新冠肺炎的自动检测。这项工作提出了有效和自动的基于深度学习（DL）的新冠肺炎检测使用胸部CT图像。最初，数据通过空间加权双边滤波器（SWBF）进行收集和预处理，以消除不必要的失真。深度特征的提取使用Fine_Dense卷积网络（Fine_DenseNet）进行处理。对于分类，使用改进的生成对抗性网络-人工蜂鸟（IGAN_AHb）模型替换Fine_DenseNet的Softmax层，以便在标记和未标记的数据集上训练数据。使用人工蜂鸟（AHb）优化算法对网络模型中的损耗进行优化。在此，所提出的DL模型（Fine_DenseIGANet）用于使用CT扫描图像对新冠肺炎进行自动多类别分类，并获得了比其他DL模型高95.73%的分类精度。

{"title":"Fine_Denseiganet: Automatic Medical Image Classification in Chest CT Scan Using Hybrid Deep Learning Framework","authors":"Hemlata Sahu, R. Kashyap","doi":"10.1142/s0219467825500044","DOIUrl":"https://doi.org/10.1142/s0219467825500044","url":null,"abstract":"Medical image classification is one of the most significant tasks in computer-aided diagnosis. In the era of modern healthcare, the progress of digitalized medical images has led to a crucial role in analyzing medical image analysis. Recently, accurate disease recognition from medical Computed Tomography (CT) images remains a challenging scenario which is important in rendering effective treatment to patients. The infectious COVID-19 disease is highly contagious and leads to a rapid increase in infected individuals. Some drawbacks noticed with RT-PCR kits are high false negative rate (FNR) and a shortage in the number of test kits. Hence, a Chest CT scan is introduced instead of RT-PCR which plays an important role in diagnosing and screening COVID-19 infections. However, manual examination of CT scans performed by radiologists can be time-consuming, and a manual review of each individual CT image may not be feasible in emergencies. Therefore, there is a need to perform automated COVID-19 detection with the advances in AI-based models. This work presents effective and automatic Deep Learning (DL)-based COVID-19 detection using Chest CT images. Initially, the data is gathered and pre-processed through Spatial Weighted Bilateral Filter (SWBF) to eradicate unwanted distortions. The extraction of deep features is processed using Fine_Dense Convolutional Network (Fine_DenseNet). For classification, the Softmax layer of Fine_DenseNet is replaced using Improved Generative Adversarial Network_Artificial Hummingbird (IGAN_AHb) model in order to train the data on the labeled and unlabeled dataset. The loss in the network model is optimized using Artificial Hummingbird (AHb) optimization algorithm. Here, the proposed DL model (Fine_DenseIGANet) is used to perform automated multi-class classification of COVID-19 using CT scan images and attained a superior classification accuracy of 95.73% over other DL models.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42842277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Aspect-Based Sentiment Analysis Using Fabricius Ringlet-Based Hybrid Deep Learning for Online Reviews 使用基于Fabricius Ringlet的混合深度学习进行基于方面的情绪分析用于在线评论

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-22 DOI: 10.1142/s0219467825500056

Santoshi Kumari, T. P. Pushphavathi

The sentiment analysis relying on the aspect of online reviews is utilized for identifying the polarity of the given review. Nowadays, many methods are introduced for aspect-based sentiment analysis (ABSA) using neural networks, and many methods failed to consider contextual information exploitation to make the performance more accurate. Hence, this research proposed an optimized deep learning method for the detection of the aspect and to identify the polarity. Hence, in this research, an optimized deep learning technique for the ABSA is introduced by considering the online reviews, in which the deep learning classifiers are trained with the proposed Fabricius ringlet optimization (FRO) algorithm to reduce the loss that helps to enhance the accuracy of sentiment polarity prediction. The proposed FRO is developed by the hybridization of the behavioral nature of the Fabricius and the ringlet in feeding for the determination of the global best solution. The tuning of the weights and biases of the classifier enhance the performance of the classifier. The objective behind the tuning is to minimize the loss function while training and to enhance the accuracy of aspect extraction and polarity prediction of sentiment. Based on a study of the existing approach, the suggested FRO-based hybrid deep learning method is significantly improved; its accuracy, sensitivity, and specificity are 87.06%, 90.83%, and 79.37%, respectively, with a training percentage of 40%. The accuracy, sensitivity, and specificity of the existing technique have also been enhanced for aspect restaurant values, which are 87.53%, 96.06%, and 79.88% with a 60% training percentage. Similar to that, Twitter values for accuracy, sensitivity, and specificity are reported to be 89.08%, 99.35%, and 79.70%, respectively, with an 80% training percentage. The proposed method obtained the 90.13%, 99.35%, and 81.10% accuracy, sensitivity, and specificity from the assessment of the FRO-based hybrid deep learning.

依赖于在线评论方面的情绪分析用于识别给定评论的极性。目前，使用神经网络进行基于方面的情感分析（ABSA）的方法很多，但许多方法都没有考虑上下文信息的利用来提高性能。因此，本研究提出了一种优化的深度学习方法，用于方位检测和极性识别。因此，在本研究中，通过考虑在线评论，引入了一种用于ABSA的优化深度学习技术，其中使用所提出的Fabricius小环优化（FRO）算法训练深度学习分类器，以减少有助于提高情绪极性预测准确性的损失。所提出的FRO是通过将法布里丘斯和小环在进食中的行为性质杂交来确定全局最佳解决方案而开发的。分类器的权重和偏差的调整提高了分类器的性能。调整背后的目标是在训练时最小化损失函数，并提高情绪的方面提取和极性预测的准确性。在研究现有方法的基础上，提出的基于FRO的混合深度学习方法得到了显著改进；其准确性、敏感性和特异性分别为87.06%、90.83%和79.37%，训练率为40%。现有技术对方面餐厅值的准确性、敏感性和特异性也得到了提高，分别为87.53%、96.06%和79.88%，训练百分比为60%。与此类似，Twitter的准确性、敏感性和特异性分别为89.08%、99.35%和79.70%，训练百分比为80%。通过对基于FRO的混合深度学习的评估，所提出的方法获得了90.13%、99.35%和81.10%的准确率、灵敏度和特异性。

{"title":"Aspect-Based Sentiment Analysis Using Fabricius Ringlet-Based Hybrid Deep Learning for Online Reviews","authors":"Santoshi Kumari, T. P. Pushphavathi","doi":"10.1142/s0219467825500056","DOIUrl":"https://doi.org/10.1142/s0219467825500056","url":null,"abstract":"The sentiment analysis relying on the aspect of online reviews is utilized for identifying the polarity of the given review. Nowadays, many methods are introduced for aspect-based sentiment analysis (ABSA) using neural networks, and many methods failed to consider contextual information exploitation to make the performance more accurate. Hence, this research proposed an optimized deep learning method for the detection of the aspect and to identify the polarity. Hence, in this research, an optimized deep learning technique for the ABSA is introduced by considering the online reviews, in which the deep learning classifiers are trained with the proposed Fabricius ringlet optimization (FRO) algorithm to reduce the loss that helps to enhance the accuracy of sentiment polarity prediction. The proposed FRO is developed by the hybridization of the behavioral nature of the Fabricius and the ringlet in feeding for the determination of the global best solution. The tuning of the weights and biases of the classifier enhance the performance of the classifier. The objective behind the tuning is to minimize the loss function while training and to enhance the accuracy of aspect extraction and polarity prediction of sentiment. Based on a study of the existing approach, the suggested FRO-based hybrid deep learning method is significantly improved; its accuracy, sensitivity, and specificity are 87.06%, 90.83%, and 79.37%, respectively, with a training percentage of 40%. The accuracy, sensitivity, and specificity of the existing technique have also been enhanced for aspect restaurant values, which are 87.53%, 96.06%, and 79.88% with a 60% training percentage. Similar to that, Twitter values for accuracy, sensitivity, and specificity are reported to be 89.08%, 99.35%, and 79.70%, respectively, with an 80% training percentage. The proposed method obtained the 90.13%, 99.35%, and 81.10% accuracy, sensitivity, and specificity from the assessment of the FRO-based hybrid deep learning.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43349570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combined Tri-Classifiers for IoT Botnet Detection with Tuned Training Weights 调整训练权值的物联网僵尸网络检测组合三分类器

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-22 DOI: 10.1142/s021946782550007x

Abhilash Kayyidavazhiyil

Although IoT sectors seem more popular and pervasively, they struggle with hazards. The botnet is one of the largest security dangers associated with IoT. It enables malicious software to administer and attack private network equipment collectively without the owners’ knowledge. Although many studies have used ML to detect botnets, these are either not very effective or only work with specific types of botnets or devices. As a result, the detection model for deep learning ideas is the focus of this research. It entails three key processes: (a) preprocessing, (b) feature extraction, and (c) classification. The input data are initially preprocessed using an improved data normalization approach. The preprocessed data are used to extract a number of features, including Tanimoto coefficient features, improved differential holoentropy-based features, Pearson r correlation-based features, and others. The detection process will be completed by an ensemble classification model that randomly shuffles models like the Deep Belief Network (DBN) model, Bidirectional Gated Recurrent Unit (Bi-GRU), and Long Short-Term Memory (LSTM). Bi-GRU, DBN, and LSTM will be averaged to provide the ensemble results. Bi-GRU is trained using the Self Improved Blue Monkey Optimization (SIBMO) Algorithm by selecting the optimal weights, which increases the detection accuracy. The overall performance of the suggested work is then evaluated in relation to other existing models using various methodologies. In comparison to existing methods, the created ensemble classifier [Formula: see text] SIBMO scheme obtains the highest accuracy (93%) at a learning percentage of 90%.

尽管物联网行业似乎更受欢迎和普遍，但它们也面临着风险。僵尸网络是与物联网相关的最大安全隐患之一。它使恶意软件能够在所有者不知情的情况下集体管理和攻击专用网络设备。尽管许多研究已经使用ML来检测僵尸网络，但这些要么不是很有效，要么只适用于特定类型的僵尸网络或设备。因此，深度学习思想的检测模型是本研究的重点。它包括三个关键过程:(a)预处理，(b)特征提取和(c)分类。输入数据最初使用改进的数据规范化方法进行预处理。预处理后的数据用于提取许多特征，包括谷本系数特征、改进的基于微分全熵的特征、基于Pearson或相关的特征等。检测过程将由一个集成分类模型完成，该模型随机洗刷深度信念网络(DBN)模型、双向门控制循环单元(Bi-GRU)和长短期记忆(LSTM)等模型。Bi-GRU, DBN和LSTM将被平均以提供集合结果。Bi-GRU采用自改进蓝猴优化算法(SIBMO)进行训练，通过选择最优权值，提高了检测精度。然后使用不同的方法来评估与其他现有模型相关的建议工作的总体性能。与现有方法相比，所创建的集成分类器[公式:见文本]SIBMO方案在90%的学习率下获得了最高的准确率(93%)。

{"title":"Combined Tri-Classifiers for IoT Botnet Detection with Tuned Training Weights","authors":"Abhilash Kayyidavazhiyil","doi":"10.1142/s021946782550007x","DOIUrl":"https://doi.org/10.1142/s021946782550007x","url":null,"abstract":"Although IoT sectors seem more popular and pervasively, they struggle with hazards. The botnet is one of the largest security dangers associated with IoT. It enables malicious software to administer and attack private network equipment collectively without the owners’ knowledge. Although many studies have used ML to detect botnets, these are either not very effective or only work with specific types of botnets or devices. As a result, the detection model for deep learning ideas is the focus of this research. It entails three key processes: (a) preprocessing, (b) feature extraction, and (c) classification. The input data are initially preprocessed using an improved data normalization approach. The preprocessed data are used to extract a number of features, including Tanimoto coefficient features, improved differential holoentropy-based features, Pearson r correlation-based features, and others. The detection process will be completed by an ensemble classification model that randomly shuffles models like the Deep Belief Network (DBN) model, Bidirectional Gated Recurrent Unit (Bi-GRU), and Long Short-Term Memory (LSTM). Bi-GRU, DBN, and LSTM will be averaged to provide the ensemble results. Bi-GRU is trained using the Self Improved Blue Monkey Optimization (SIBMO) Algorithm by selecting the optimal weights, which increases the detection accuracy. The overall performance of the suggested work is then evaluated in relation to other existing models using various methodologies. In comparison to existing methods, the created ensemble classifier [Formula: see text] SIBMO scheme obtains the highest accuracy (93%) at a learning percentage of 90%.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45568914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Overview of Speech Enhancement Based on Deep Learning Techniques 基于深度学习技术的语音增强综述

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-22 DOI: 10.1142/s0219467825500019

Chaitanya Jannu, S. Vanambathina

Recent years have seen a significant amount of studies in the area of speech enhancement. This review looks at several speech improvement methods as well as Deep Neural Network (DNN) functions in speech enhancement. Speech transmissions are frequently distorted by ambient noise, background noise, and reverberations. There are processing methods, such as Short-time Fourier Transform, Short-time Autocorrelation, and Short-time Energy (STE), that can be used to enhance speech. To reduce speech noise, features such as the Mel-Frequency Cepstral Coefficients (MFCCs), Logarithmic Power Spectrum (LPS), and Gammatone Frequency Cepstral Coefficients (GFCCs) can be retrieved and input to a DNN. DNN is essential to speech improvement since it builds models using a lot of training data and evaluates the efficacy of the enhanced speech using certain performance metrics. Since the beginning of deep learning publications in 1993, a variety of speech enhancement methods have been examined in this study. This review provides a thorough examination of the several neural network topologies, training algorithms, activation functions, training targets, acoustic features, and databases that were employed for the job of speech enhancement and were gathered from various articles published between 1993 and 2022.

近年来，在语音增强领域进行了大量的研究。本文综述了几种语音增强方法以及深度神经网络(DNN)在语音增强中的作用。语音传输经常受到环境噪声、背景噪声和混响的干扰。有短时傅里叶变换、短时自相关和短时能量(STE)等处理方法可用于增强语音。为了降低语音噪声，可以检索Mel-Frequency Cepstral系数(MFCCs)，对数功率谱(LPS)和gamma - one Frequency Cepstral系数(GFCCs)等特征并将其输入到DNN中。深度神经网络对语音改进至关重要，因为它使用大量训练数据构建模型，并使用某些性能指标评估增强语音的效果。自1993年深度学习出版物开始以来，本研究对各种语音增强方法进行了研究。本文对1993年至2022年间发表的各种文章中用于语音增强工作的几种神经网络拓扑、训练算法、激活函数、训练目标、声学特征和数据库进行了全面的研究。

{"title":"An Overview of Speech Enhancement Based on Deep Learning Techniques","authors":"Chaitanya Jannu, S. Vanambathina","doi":"10.1142/s0219467825500019","DOIUrl":"https://doi.org/10.1142/s0219467825500019","url":null,"abstract":"Recent years have seen a significant amount of studies in the area of speech enhancement. This review looks at several speech improvement methods as well as Deep Neural Network (DNN) functions in speech enhancement. Speech transmissions are frequently distorted by ambient noise, background noise, and reverberations. There are processing methods, such as Short-time Fourier Transform, Short-time Autocorrelation, and Short-time Energy (STE), that can be used to enhance speech. To reduce speech noise, features such as the Mel-Frequency Cepstral Coefficients (MFCCs), Logarithmic Power Spectrum (LPS), and Gammatone Frequency Cepstral Coefficients (GFCCs) can be retrieved and input to a DNN. DNN is essential to speech improvement since it builds models using a lot of training data and evaluates the efficacy of the enhanced speech using certain performance metrics. Since the beginning of deep learning publications in 1993, a variety of speech enhancement methods have been examined in this study. This review provides a thorough examination of the several neural network topologies, training algorithms, activation functions, training targets, acoustic features, and databases that were employed for the job of speech enhancement and were gathered from various articles published between 1993 and 2022.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46800367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image Processing-Based Method of Evaluation of Stress from Grain Structures of Through Silicon Via (TSV) 基于图像处理的硅通孔(TSV)晶粒结构应力评估方法

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-22 DOI: 10.1142/s0219467825500081

Mamvinder Sharma, Sudhakara Reddy Saripalli, A. Gupta, Pankaj Palta, D. Pandey

Visualization of material composition across numerous grains and complicated networks of grain boundaries using image processing techniques can reveal fresh insights into the material’s structural evolution and upcoming functional capabilities for a variety of applications. Three-dimensional integrated circuits (3D IC) are the most practical technology for increasing transistor density in future semiconductor applications. One of the key benefits of 3D IC is heterogeneous integration, which results in shorter interconnections due to vertical stacking. However, one of the most significant challenges in building higher-density microelectronics devices is the stress generated by material mismatches in the coefficient of thermal expansion (CTE). The purpose of this study is to analyze grain boundary migration caused by variations in strain energy density using image processing methods for 3D grain continuum modeling. Temperature changes in polycrystalline structures generate stresses and strain energy densities, which may be calculated using FEM software. Single crystal Cu’s anisotropic elastic properties are twisted to suit grain orientation in space and each grain is treated as a single crystal. Grain boundary speeds are calculated using a simple model that relates grain boundary mobility to variations in strain energy density on both sides of grain boundaries. Using the grain continuum model, researchers will be able to investigate the effect of thermally generated stresses on grain boundary motion caused by atomic flux driven by strain energy. Using finite-element modeling of the grain structure in a Through Silicon Via, the stress effect on grain boundaries caused by grain rotation due to CTE mismatch was investigated (TSV). The structure must be modeled using a scanning electron microscopes Electron Backscatter Diffraction (EBSD) image (SEM). Grain growth and subsequent grain boundary rotation can be performed using the appropriate extrapolation method to measure their influence on stress and, as a result, the TSV’s overall reliability.

使用图像处理技术对众多晶粒和复杂晶界网络的材料组成进行可视化，可以揭示材料结构演变的新见解，以及各种应用即将具备的功能。三维集成电路（3DIC）是在未来半导体应用中提高晶体管密度的最实用的技术。3D IC的主要优点之一是异构集成，这会由于垂直堆叠而导致更短的互连。然而，在构建更高密度微电子器件时，最重大的挑战之一是由热膨胀系数（CTE）中的材料失配产生的应力。本研究的目的是利用图像处理方法对三维晶粒连续体建模，分析应变能密度变化引起的晶界迁移。多晶结构中的温度变化产生应力和应变能量密度，可以使用FEM软件计算。单晶Cu的各向异性弹性特性被扭曲以适应空间中的晶粒取向，并且每个晶粒都被视为单晶。晶界速度是使用一个简单的模型计算的，该模型将晶界迁移率与晶界两侧应变能量密度的变化联系起来。使用晶粒连续体模型，研究人员将能够研究热产生的应力对应变能驱动的原子通量引起的晶界运动的影响。通过对硅通孔中晶粒结构的有限元建模，研究了CTE失配引起的晶粒旋转对晶界的应力效应。该结构必须使用扫描电子显微镜电子背散射衍射（EBSD）图像（SEM）进行建模。晶粒生长和随后的晶界旋转可以使用适当的外推方法来测量它们对应力的影响，从而测量TSV的整体可靠性。

{"title":"Image Processing-Based Method of Evaluation of Stress from Grain Structures of Through Silicon Via (TSV)","authors":"Mamvinder Sharma, Sudhakara Reddy Saripalli, A. Gupta, Pankaj Palta, D. Pandey","doi":"10.1142/s0219467825500081","DOIUrl":"https://doi.org/10.1142/s0219467825500081","url":null,"abstract":"Visualization of material composition across numerous grains and complicated networks of grain boundaries using image processing techniques can reveal fresh insights into the material’s structural evolution and upcoming functional capabilities for a variety of applications. Three-dimensional integrated circuits (3D IC) are the most practical technology for increasing transistor density in future semiconductor applications. One of the key benefits of 3D IC is heterogeneous integration, which results in shorter interconnections due to vertical stacking. However, one of the most significant challenges in building higher-density microelectronics devices is the stress generated by material mismatches in the coefficient of thermal expansion (CTE). The purpose of this study is to analyze grain boundary migration caused by variations in strain energy density using image processing methods for 3D grain continuum modeling. Temperature changes in polycrystalline structures generate stresses and strain energy densities, which may be calculated using FEM software. Single crystal Cu’s anisotropic elastic properties are twisted to suit grain orientation in space and each grain is treated as a single crystal. Grain boundary speeds are calculated using a simple model that relates grain boundary mobility to variations in strain energy density on both sides of grain boundaries. Using the grain continuum model, researchers will be able to investigate the effect of thermally generated stresses on grain boundary motion caused by atomic flux driven by strain energy. Using finite-element modeling of the grain structure in a Through Silicon Via, the stress effect on grain boundaries caused by grain rotation due to CTE mismatch was investigated (TSV). The structure must be modeled using a scanning electron microscopes Electron Backscatter Diffraction (EBSD) image (SEM). Grain growth and subsequent grain boundary rotation can be performed using the appropriate extrapolation method to measure their influence on stress and, as a result, the TSV’s overall reliability.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45018506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Novel Enrichment of Brightness-Distorted Chest X-Ray Images Using Fusion-Based Contrast-Limited Adaptive Fuzzy Gamma Algorithm 基于融合的对比度受限自适应模糊Gamma算法对亮度失真胸部X射线图像的新富集

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-21 DOI: 10.1142/s021946782450058x

K. Kiruthika, Rashmita Khilar

As innovations for image handling, image enrichment (IE) can give more effective information and image compression can decrease memory space. IE plays a vital role in the medical field for which we have to use a noiseless image. IE applies to all areas of understanding and analysis of images. This paper provides an innovative algorithm called contrast-limited adaptive fuzzy gamma (CLAFG) for IE using chest X-ray (CXR) images. The image dissimilarity is enriched by computing several histograms and membership planes. The proposed algorithm comprises various steps. Firstly, CXR is separated into contextual region (CR). Secondly, the cliplimit, a threshold value which alters the dissimilarity of the CXR and applies it to the histogram which, is generated by CR and then applies the fuzzification technique via the membership plane to the CXR. Thirdly, the clipped histograms are performed in two ways, i.e. it is merged using bi-cubic interpolation techniques and it is modified with membership function. Finally, the resulting output from bi-cubic interpolation and membership function are fond of using upgrade contemplate standard methods for a richer CXR image.

作为图像处理的创新，图像增强（IE）可以提供更有效的信息，图像压缩可以减少内存空间。IE在医学领域发挥着至关重要的作用，我们必须使用无噪声图像。IE适用于理解和分析图像的所有领域。本文提出了一种基于胸部X射线（CXR）图像的IE的创新算法，称为对比度受限自适应模糊伽玛（CLAFG）。通过计算几个直方图和隶属度平面来丰富图像的相异性。所提出的算法包括各种步骤。首先，将CXR划分为上下文区域（CR）。其次，cliplimit，一个改变CXR的相异性并将其应用于直方图的阈值，由CR生成，然后通过隶属平面将模糊化技术应用于CXR。第三，截取的直方图有两种方法，即使用双三次插值技术对其进行合并，并使用隶属函数对其进行修改。最后，双三次插值和隶属函数的结果输出喜欢使用升级设想的标准方法来获得更丰富的CXR图像。

{"title":"Novel Enrichment of Brightness-Distorted Chest X-Ray Images Using Fusion-Based Contrast-Limited Adaptive Fuzzy Gamma Algorithm","authors":"K. Kiruthika, Rashmita Khilar","doi":"10.1142/s021946782450058x","DOIUrl":"https://doi.org/10.1142/s021946782450058x","url":null,"abstract":"As innovations for image handling, image enrichment (IE) can give more effective information and image compression can decrease memory space. IE plays a vital role in the medical field for which we have to use a noiseless image. IE applies to all areas of understanding and analysis of images. This paper provides an innovative algorithm called contrast-limited adaptive fuzzy gamma (CLAFG) for IE using chest X-ray (CXR) images. The image dissimilarity is enriched by computing several histograms and membership planes. The proposed algorithm comprises various steps. Firstly, CXR is separated into contextual region (CR). Secondly, the cliplimit, a threshold value which alters the dissimilarity of the CXR and applies it to the histogram which, is generated by CR and then applies the fuzzification technique via the membership plane to the CXR. Thirdly, the clipped histograms are performed in two ways, i.e. it is merged using bi-cubic interpolation techniques and it is modified with membership function. Finally, the resulting output from bi-cubic interpolation and membership function are fond of using upgrade contemplate standard methods for a richer CXR image.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43869871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Convolutional Neural Network based on UNet for Iris Segmentation 基于UNet的鲁棒卷积神经网络用于虹膜分割

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-21 DOI: 10.1142/s0219467824500426

A. Khaki

Nowadays, the iris recognition system is one of the most widely used and most accurate biometric systems. The iris segmentation is the most crucial stage of iris recognition system. The accurate iris segmentation can improve the efficiency of iris recognition. The main objective of iris segmentation is to obtain the iris area. Recently, the iris segmentation methods based on convolutional neural networks (CNNs) have been grown, and they have improved the accuracy greatly. Nevertheless, their accuracy is decreased by low-quality images captured in uncontrolled conditions. Therefore, the existing methods cannot segment low-quality images precisely. To overcome the challenge, this paper proposes a robust convolutional neural network (R-Net) inspired by UNet for iris segmentation. R-Net is divided into two parts: encoder and decoder. In this network, several layers are added to ResNet-34, and used in the encoder path. In the decoder path, four convolutions are applied at each level. Both help to obtain suitable feature maps and increase the network accuracy. The proposed network has been tested on four datasets: UBIRIS v2 (UBIRIS), CASIA iris v4.0 (CASIA) distance, CASIA interval, and IIT Delhi v1.0 (IITD). UBIRIS is a dataset that is used for low-quality images. The error rate (NICE1) of proposed network is 0.0055 on UBIRIS, 0.0105 on CASIA interval, 0.0043 on CASIA distance, and 0.0154 on IITD. Results show better performance of the proposed network compared to other methods.

虹膜识别系统是目前应用最广泛、精度最高的生物识别系统之一。虹膜分割是虹膜识别系统中最关键的环节。准确的虹膜分割可以提高虹膜识别的效率。虹膜分割的主要目的是获得虹膜区域。近年来，基于卷积神经网络（CNNs）的虹膜分割方法得到了发展，并大大提高了精度。然而，在不受控制的条件下拍摄的低质量图像降低了它们的准确性。因此，现有的方法不能精确地分割低质量的图像。为了克服这一挑战，本文提出了一种受UNet启发的鲁棒卷积神经网络（R-Net）用于虹膜分割。R-Net分为编码器和解码器两部分。在这个网络中，几个层被添加到ResNet-34，并在编码器路径中使用。在解码器路径中，在每个级别应用四个卷积。两者都有助于获得合适的特征图并提高网络的准确性。所提出的网络已经在四个数据集上进行了测试：UBIRIS v2（UBIRIS）、CASIA iris v4.0（CASIA）距离、CASIA间隔和IIT Delhi v1.0（IITD）。UBIRIS是一个用于低质量图像的数据集。所提出的网络的错误率（NICE1）在UBIRIS上为0.0055，在CASIA间隔上为0.0105，在CASIA距离上为0.0043，在IITD上为0.0154。结果表明，与其他方法相比，所提出的网络具有更好的性能。

{"title":"Robust Convolutional Neural Network based on UNet for Iris Segmentation","authors":"A. Khaki","doi":"10.1142/s0219467824500426","DOIUrl":"https://doi.org/10.1142/s0219467824500426","url":null,"abstract":"Nowadays, the iris recognition system is one of the most widely used and most accurate biometric systems. The iris segmentation is the most crucial stage of iris recognition system. The accurate iris segmentation can improve the efficiency of iris recognition. The main objective of iris segmentation is to obtain the iris area. Recently, the iris segmentation methods based on convolutional neural networks (CNNs) have been grown, and they have improved the accuracy greatly. Nevertheless, their accuracy is decreased by low-quality images captured in uncontrolled conditions. Therefore, the existing methods cannot segment low-quality images precisely. To overcome the challenge, this paper proposes a robust convolutional neural network (R-Net) inspired by UNet for iris segmentation. R-Net is divided into two parts: encoder and decoder. In this network, several layers are added to ResNet-34, and used in the encoder path. In the decoder path, four convolutions are applied at each level. Both help to obtain suitable feature maps and increase the network accuracy. The proposed network has been tested on four datasets: UBIRIS v2 (UBIRIS), CASIA iris v4.0 (CASIA) distance, CASIA interval, and IIT Delhi v1.0 (IITD). UBIRIS is a dataset that is used for low-quality images. The error rate (NICE1) of proposed network is 0.0055 on UBIRIS, 0.0105 on CASIA interval, 0.0043 on CASIA distance, and 0.0154 on IITD. Results show better performance of the proposed network compared to other methods.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45410552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Yoga Posture Recognition by Learning Spatial-Temporal Feature with Deep Learning Techniques 利用深度学习技术学习时空特征识别瑜伽姿势

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-07-21 DOI: 10.1142/s0219467824500554

J. Palanimeera, K. Ponmozhi

Yoga posture recognition remains a difficult issue because of crowded backgrounds, varied settings, occlusions, viewpoint alterations, and camera motions, despite recent promising advances in deep learning. In this paper, the method for accurately detecting various yoga poses using DL (Deep Learning) algorithms is provided. Using a standard RGB camera, six yoga poses — Sukhasana, Kakasana, Naukasana, Dhanurasana, Tadasana, and Vrikshasana — were captured on ten people, five men and five women. In this study, a brand-new DL model is presented for representing the spatio-temporal (ST) variation of skeleton-based yoga poses in movies. It is advised to use a variety of representation learners to pry video-level temporal recordings, which combine spatio-temporal sampling with long-range time mastering to produce a successful and effective training approach. A novel feature extraction method using Open Pose is described, together with a DenceBi-directional LSTM network to represent spatial-temporal links in both the forward and backward directions. This will increase the efficacy and consistency of modeling long-range action detection. To improve temporal pattern modeling capability, they are stacked and combined with dense skip connections. To improve performance, two modalities from look and motion are fused with a fusion module and compared to other deep learning models are LSTMs including LSTM, Bi-LSTM, Res-LSTM, and Res-BiLSTM. Studies on real-time datasets of yoga poses show that the suggested DenseBi-LSTM model performs better and yields better results than state-of-the-art techniques for yoga pose detection.

瑜伽姿势识别仍然是一个困难的问题，因为拥挤的背景、不同的环境、遮挡、视点改变和相机运动，尽管最近在深度学习方面取得了可喜的进展。本文提供了一种使用DL（深度学习）算法精确检测各种瑜伽姿势的方法。使用标准RGB相机，在10个人身上拍摄到了六个瑜伽姿势——Sukhasana、Kakasana、Naukasana、Dhanurasana、Tadasana和Vrikshasana，其中包括5男5女。在这项研究中，提出了一个全新的DL模型来表示电影中基于骨骼的瑜伽姿势的时空变化。建议使用各种表示学习器来窥探视频级别的时间记录，将时空采样与长时间掌握相结合，以产生一种成功有效的训练方法。描述了一种使用开放姿态的新特征提取方法，以及DenceBi-directional LSTM网络来表示前向和后向的时空链路。这将提高远程动作检测建模的有效性和一致性。为了提高时间模式建模能力，它们被堆叠并与密集的跳跃连接相结合。为了提高性能，将来自视觉和运动的两种模式与融合模块融合，并与其他深度学习模型相比，LSTM包括LSTM、Bi-LSTM、Res-LSTM和Res-BiLSTM。对瑜伽姿势实时数据集的研究表明，与最先进的瑜伽姿势检测技术相比，所提出的DenseBi LSTM模型表现更好，产生更好的结果。

{"title":"Yoga Posture Recognition by Learning Spatial-Temporal Feature with Deep Learning Techniques","authors":"J. Palanimeera, K. Ponmozhi","doi":"10.1142/s0219467824500554","DOIUrl":"https://doi.org/10.1142/s0219467824500554","url":null,"abstract":"Yoga posture recognition remains a difficult issue because of crowded backgrounds, varied settings, occlusions, viewpoint alterations, and camera motions, despite recent promising advances in deep learning. In this paper, the method for accurately detecting various yoga poses using DL (Deep Learning) algorithms is provided. Using a standard RGB camera, six yoga poses — Sukhasana, Kakasana, Naukasana, Dhanurasana, Tadasana, and Vrikshasana — were captured on ten people, five men and five women. In this study, a brand-new DL model is presented for representing the spatio-temporal (ST) variation of skeleton-based yoga poses in movies. It is advised to use a variety of representation learners to pry video-level temporal recordings, which combine spatio-temporal sampling with long-range time mastering to produce a successful and effective training approach. A novel feature extraction method using Open Pose is described, together with a DenceBi-directional LSTM network to represent spatial-temporal links in both the forward and backward directions. This will increase the efficacy and consistency of modeling long-range action detection. To improve temporal pattern modeling capability, they are stacked and combined with dense skip connections. To improve performance, two modalities from look and motion are fused with a fusion module and compared to other deep learning models are LSTMs including LSTM, Bi-LSTM, Res-LSTM, and Res-BiLSTM. Studies on real-time datasets of yoga poses show that the suggested DenseBi-LSTM model performs better and yields better results than state-of-the-art techniques for yoga pose detection.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49029323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0