Pub Date : 2024-07-10DOI: 10.1142/s0219467826500105
B. Manjulatha, Suresh Pabboju
Automatic depression classification from multimodal input data is a challenging task. Modern methods use paralinguistic information such as audio and video signals. Using linguistic information such as speech signals and text data for depression classification is a complicated task in deep learning models. Best audio and video features are built to produce a dependable depression classification system. Textual signals related to depression classification are analyzed using text-based content data. Moreover, to increase the achievements of the depression classification system, audio, visual, and text descriptors are used. So, a deep learning-based depression classification model is developed to detect the person with depression from multimodal data. The EEG signal, Speech signal, video, and text are gathered from standard databases. Four stages of feature extraction take place. In the first stage, the features from the decomposed EEG signals are attained by the empirical mode decomposition (EMD) method, and features are extracted by means of linear and nonlinear feature extraction. In the second stage, the spectral features of the speech signals from the Mel-frequency cepstral coefficients (MFCC) are extracted. In the third stage, the facial texture features from the input video are extracted. In the fourth stage of feature extraction, the input text data are pre-processed, and from the pre-processed data, the textual features are extracted by using the Transformer Net. All four sets of features are optimally selected and combined with the optimal weights to get the weighted fused features using the enhanced mountaineering team-based optimization algorithm (EMTOA). The optimal weighted fused features are finally given to the hybrid attention-based dilated network (HADN). The HDAN is developed by combining temporal convolutional network (TCN) with bidirectional long short-term memory (Bi-LSTM). The parameters in the HDAN are optimized with the assistance of the developed EMTOA algorithm. At last, the classified output of depression is obtained from the HDAN. The efficiency of the developed deep learning HDAN is validated by comparing it with various traditional classification models.
{"title":"A Novel Hybrid Attention-Based Dilated Network for Depression Classification Model from Multimodal Data Using Improved Heuristic Approach","authors":"B. Manjulatha, Suresh Pabboju","doi":"10.1142/s0219467826500105","DOIUrl":"https://doi.org/10.1142/s0219467826500105","url":null,"abstract":"Automatic depression classification from multimodal input data is a challenging task. Modern methods use paralinguistic information such as audio and video signals. Using linguistic information such as speech signals and text data for depression classification is a complicated task in deep learning models. Best audio and video features are built to produce a dependable depression classification system. Textual signals related to depression classification are analyzed using text-based content data. Moreover, to increase the achievements of the depression classification system, audio, visual, and text descriptors are used. So, a deep learning-based depression classification model is developed to detect the person with depression from multimodal data. The EEG signal, Speech signal, video, and text are gathered from standard databases. Four stages of feature extraction take place. In the first stage, the features from the decomposed EEG signals are attained by the empirical mode decomposition (EMD) method, and features are extracted by means of linear and nonlinear feature extraction. In the second stage, the spectral features of the speech signals from the Mel-frequency cepstral coefficients (MFCC) are extracted. In the third stage, the facial texture features from the input video are extracted. In the fourth stage of feature extraction, the input text data are pre-processed, and from the pre-processed data, the textual features are extracted by using the Transformer Net. All four sets of features are optimally selected and combined with the optimal weights to get the weighted fused features using the enhanced mountaineering team-based optimization algorithm (EMTOA). The optimal weighted fused features are finally given to the hybrid attention-based dilated network (HADN). The HDAN is developed by combining temporal convolutional network (TCN) with bidirectional long short-term memory (Bi-LSTM). The parameters in the HDAN are optimized with the assistance of the developed EMTOA algorithm. At last, the classified output of depression is obtained from the HDAN. The efficiency of the developed deep learning HDAN is validated by comparing it with various traditional classification models.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141662764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1142/s0219467826500099
H. H. Razzaq, Laith F. M. H. Al-Rammahi, Ahmed Mounaf Mahdi
Intrusion detection averts a network from probable intrusions by inspecting network traffic to ensure its integrity, availability, and confidentiality. Though IDS seems to eliminate malicious traffic, intruders have endeavored to use different approaches for undertaking attacks. Hence, effective intrusion detection is vital to detect attacks. Concurrently, the evolvement of machine learning (ML), attacks could be identified by evaluating the patterns and learning from them. Considering this, conventional works have attempted to perform intrusion detection. Nevertheless, they lacked about high false alarm rate (FAR) and low accuracy rate due to inefficient feature selection. To resolve these existing pitfalls, this research proposed a modified whale algorithm (MWA) based on nonlinear information gain to select significant and relevant features. This algorithm assures huge initialization to improve local search ability as the agent’s positions are usually near the optimal solution. It is also utilized for an adaptive search for an optimal combination of features. Following this, the research proposes Morlet particle swarm optimization hyperparameter optimization (MPSO-HO) to improve the convergence rate of the algorithm by consenting it to produce from the local optimization by improving its capability. Standard metrics assess the proposed system to confirm the optimal performance of the proposed system. Outcomes explore the effective ability of the proposed system in intrusion detection.
{"title":"Modified Whale Algorithm and Morley PSO-ML-Based Hyperparameter Optimization for Intrusion Detection","authors":"H. H. Razzaq, Laith F. M. H. Al-Rammahi, Ahmed Mounaf Mahdi","doi":"10.1142/s0219467826500099","DOIUrl":"https://doi.org/10.1142/s0219467826500099","url":null,"abstract":"Intrusion detection averts a network from probable intrusions by inspecting network traffic to ensure its integrity, availability, and confidentiality. Though IDS seems to eliminate malicious traffic, intruders have endeavored to use different approaches for undertaking attacks. Hence, effective intrusion detection is vital to detect attacks. Concurrently, the evolvement of machine learning (ML), attacks could be identified by evaluating the patterns and learning from them. Considering this, conventional works have attempted to perform intrusion detection. Nevertheless, they lacked about high false alarm rate (FAR) and low accuracy rate due to inefficient feature selection. To resolve these existing pitfalls, this research proposed a modified whale algorithm (MWA) based on nonlinear information gain to select significant and relevant features. This algorithm assures huge initialization to improve local search ability as the agent’s positions are usually near the optimal solution. It is also utilized for an adaptive search for an optimal combination of features. Following this, the research proposes Morlet particle swarm optimization hyperparameter optimization (MPSO-HO) to improve the convergence rate of the algorithm by consenting it to produce from the local optimization by improving its capability. Standard metrics assess the proposed system to confirm the optimal performance of the proposed system. Outcomes explore the effective ability of the proposed system in intrusion detection.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141661134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1142/s0219467825500317
Rajesh Singh
The categorization and identification of lung disorders in medical imageries are made easier by recent advances in deep learning (DL). As a result, various studies using DL to identify lung illnesses were developed. This study aims to analyze different publications that have been contributed to in order to recognize lung cancer. This literature review examines the many methods for detecting lung cancer. It analyzes several segmentation models that have been used and reviews different research papers. It examines several feature extraction methods, such as those using texture-based and other features. The investigation then concentrates on several cancer detection strategies, including “DL models” and machine learning (ML) models. It is possible to examine and analyze the performance metrics. Finally, research gaps are presented to encourage additional investigation of lung detection models.
{"title":"An Extensive Review on Lung Cancer Detection Models","authors":"Rajesh Singh","doi":"10.1142/s0219467825500317","DOIUrl":"https://doi.org/10.1142/s0219467825500317","url":null,"abstract":"The categorization and identification of lung disorders in medical imageries are made easier by recent advances in deep learning (DL). As a result, various studies using DL to identify lung illnesses were developed. This study aims to analyze different publications that have been contributed to in order to recognize lung cancer. This literature review examines the many methods for detecting lung cancer. It analyzes several segmentation models that have been used and reviews different research papers. It examines several feature extraction methods, such as those using texture-based and other features. The investigation then concentrates on several cancer detection strategies, including “DL models” and machine learning (ML) models. It is possible to examine and analyze the performance metrics. Finally, research gaps are presented to encourage additional investigation of lung detection models.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141664379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-06DOI: 10.1142/s0219467824500608
Chunxia Mao, Jun Li, Tao Hu, Xu Zhao
Vision transformers are deep neural networks applied to image classification based on a self-attention mechanism and can process data in parallel. Aiming at the structural loss of Vision transformers, this paper combines ConViT and Convolutional Neural Network (CNN) and proposes a new model Convolution Meet Vision Transformers (CMVT). This model adds a convolution module to the ConViT network to solve the structural loss of the transformer. By adding hierarchical data representation, the ability to gradually extract more image classification features is improved. We have conducted comparative experiments on multiple dataset, and all of them have been enhanced to improve the efficiency and performance of the model.
{"title":"CMVT: ConVit Transformer Network Recombined with Convolutional Layer","authors":"Chunxia Mao, Jun Li, Tao Hu, Xu Zhao","doi":"10.1142/s0219467824500608","DOIUrl":"https://doi.org/10.1142/s0219467824500608","url":null,"abstract":"Vision transformers are deep neural networks applied to image classification based on a self-attention mechanism and can process data in parallel. Aiming at the structural loss of Vision transformers, this paper combines ConViT and Convolutional Neural Network (CNN) and proposes a new model Convolution Meet Vision Transformers (CMVT). This model adds a convolution module to the ConViT network to solve the structural loss of the transformer. By adding hierarchical data representation, the ability to gradually extract more image classification features is improved. We have conducted comparative experiments on multiple dataset, and all of them have been enhanced to improve the efficiency and performance of the model.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141006138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1142/s0219467825500718
S. L. Shabana Sulthana, M. Sucharitha
Medial images are contaminated by multiplicative speckle noise, which dramatically reduces ultrasound images and has a detrimental impact on a variety of image interpretation tasks. Hence, to overcome this issue, this paper presented a Two-Phase Speckle Reduction approach with Improved Anisotropic Diffusion and Optimal Bayes Threshold termed TPSR-IADOT, which includes the phases like image enhancement and two-level decomposition processes. Initially, the speckle noise is subjected to an image enhancement process where the Speckle Reducing Improved Anisotropic Diffusion (SRAID) filtering process is carried out for the speckle removal process. Afterwards, two-level decomposition takes place which utilizes Discrete Wavelet Transform (DWT) to remove the residual noise. As the speckle noise is mostly present in the high-frequency band, Improved Bayes Threshold will be applied to the high- frequency subbands. Finally, to provide the best outcomes, an optimization algorithm termed Self Improved Pelican Optimization Algorithm (SI-POA) in this work via choosing the optimal threshold value. The efficiency of the proposed method has been validated on an ultrasound image database using Simulink in terms of PSNR, SSIM, SDME and MAPE. Accordingly, from the analysis, it is proved that the proposed TPSR-IADOT attains the PSNR of 40.074, whereas the POA is 38.572, COOT is 38.572, BES is 37.003, PRO is 30.419, WOA is 33.218, RFU-LA is 29.935 and SSI-COA is 39.256, for noise variance[Formula: see text]0.1.
{"title":"Two-Phase Speckle Noise Removal in US Images: Speckle Reducing Improved Anisotropic Diffusion and Optimal Bayes Threshold","authors":"S. L. Shabana Sulthana, M. Sucharitha","doi":"10.1142/s0219467825500718","DOIUrl":"https://doi.org/10.1142/s0219467825500718","url":null,"abstract":"Medial images are contaminated by multiplicative speckle noise, which dramatically reduces ultrasound images and has a detrimental impact on a variety of image interpretation tasks. Hence, to overcome this issue, this paper presented a Two-Phase Speckle Reduction approach with Improved Anisotropic Diffusion and Optimal Bayes Threshold termed TPSR-IADOT, which includes the phases like image enhancement and two-level decomposition processes. Initially, the speckle noise is subjected to an image enhancement process where the Speckle Reducing Improved Anisotropic Diffusion (SRAID) filtering process is carried out for the speckle removal process. Afterwards, two-level decomposition takes place which utilizes Discrete Wavelet Transform (DWT) to remove the residual noise. As the speckle noise is mostly present in the high-frequency band, Improved Bayes Threshold will be applied to the high- frequency subbands. Finally, to provide the best outcomes, an optimization algorithm termed Self Improved Pelican Optimization Algorithm (SI-POA) in this work via choosing the optimal threshold value. The efficiency of the proposed method has been validated on an ultrasound image database using Simulink in terms of PSNR, SSIM, SDME and MAPE. Accordingly, from the analysis, it is proved that the proposed TPSR-IADOT attains the PSNR of 40.074, whereas the POA is 38.572, COOT is 38.572, BES is 37.003, PRO is 30.419, WOA is 33.218, RFU-LA is 29.935 and SSI-COA is 39.256, for noise variance[Formula: see text]0.1.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140664241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1142/s0219467825500731
M. Balamurugan, R. Balamurugan
Tuberculosis (TB) stands as the leading cause of death and a significant threat to humanity in the contemporary world. Early detection of TB is crucial for precise identification and treatment, and Chest X-Rays (CXR) serve as a valuable tool in this regard. Computer-Aided Diagnosis (CAD) systems play a vital role in easing the classification process of active and latent TB. This paper uses an approach called the Double Attention Res-U-Net-based Deep Neural Network (DARUNDNN) to enhance TB detection in the lungs. The detection process involves pre-processing, noise removal, image level balancing, the application of the DARUNDNN model and using the Whale Optimization Algorithm (WOA) for improved accuracy. Experimental validation using Montgomery Country (MC), Shenzhen China (SC), and NIH CXR Datasets compares the results with U-Net, AlexNet, GoogleNet, and convolutional neural network (CNN) models. The findings, particularly from the SC dataset, demonstrate the efficiency of the proposed DARUNDNN model with an accuracy of 98.6%, specificity of 96.24%, and sensitivity of 97.66%, outperforming benchmarked deep learning models. Additionally, validation with the MC dataset reveals an excellent accuracy of 98%, specificity of 97.56%, and sensitivity of 98.52%.
{"title":"Double attention Res-U-Net-based Deep Neural Network Model for Automatic Detection of Tuberculosis in Human Lungs","authors":"M. Balamurugan, R. Balamurugan","doi":"10.1142/s0219467825500731","DOIUrl":"https://doi.org/10.1142/s0219467825500731","url":null,"abstract":"Tuberculosis (TB) stands as the leading cause of death and a significant threat to humanity in the contemporary world. Early detection of TB is crucial for precise identification and treatment, and Chest X-Rays (CXR) serve as a valuable tool in this regard. Computer-Aided Diagnosis (CAD) systems play a vital role in easing the classification process of active and latent TB. This paper uses an approach called the Double Attention Res-U-Net-based Deep Neural Network (DARUNDNN) to enhance TB detection in the lungs. The detection process involves pre-processing, noise removal, image level balancing, the application of the DARUNDNN model and using the Whale Optimization Algorithm (WOA) for improved accuracy. Experimental validation using Montgomery Country (MC), Shenzhen China (SC), and NIH CXR Datasets compares the results with U-Net, AlexNet, GoogleNet, and convolutional neural network (CNN) models. The findings, particularly from the SC dataset, demonstrate the efficiency of the proposed DARUNDNN model with an accuracy of 98.6%, specificity of 96.24%, and sensitivity of 97.66%, outperforming benchmarked deep learning models. Additionally, validation with the MC dataset reveals an excellent accuracy of 98%, specificity of 97.56%, and sensitivity of 98.52%.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140686229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiming at the problem of error estimation of smart meters in distribution network, a method of error estimation of smart meters based on particle swarm optimization convolutional neural network is proposed. This method establishes an intelligent energy meter error estimation model through data collection, data prediction, and preprocessing. To address the convergence issue in training, the interlayer distribution of weights is adjusted to improve training quality. This method fully utilizes template calibration information to transform indicator detection under complex conditions into simple and effective isometric segmentation, transforming label recognition from complex text detection and recognition tasks to simple and efficient binary detection tasks, with better robustness. The effectiveness and high robustness of the proposed method have been demonstrated through experimental verification.
{"title":"A Method for Analyzing the Operating Data of Electric Energy Meters Based on Data Mining Analysis","authors":"Chencheng Wang, Lijuan Pu, Zhihui Zhao, Zhang Jiefu","doi":"10.1142/s0219467826500014","DOIUrl":"https://doi.org/10.1142/s0219467826500014","url":null,"abstract":"Aiming at the problem of error estimation of smart meters in distribution network, a method of error estimation of smart meters based on particle swarm optimization convolutional neural network is proposed. This method establishes an intelligent energy meter error estimation model through data collection, data prediction, and preprocessing. To address the convergence issue in training, the interlayer distribution of weights is adjusted to improve training quality. This method fully utilizes template calibration information to transform indicator detection under complex conditions into simple and effective isometric segmentation, transforming label recognition from complex text detection and recognition tasks to simple and efficient binary detection tasks, with better robustness. The effectiveness and high robustness of the proposed method have been demonstrated through experimental verification.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140709856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-09DOI: 10.1142/s0219467825500706
Ming Gao, Zhiyan Zhou, Jinjie Huang, Kewei Ding
To address the problems of insufficient accuracy and slow reconstruction speed of Planar Electrical Capacitance Tomography (PECT) detection of damaged specimens, a Dual Generative Adversarial Networks (DualGAN)-based PECT image defect detection method is proposed in this paper. The improved particle swarm algorithm with adaptive particle number and L2-norm is used to optimize the sensitivity field, combined with the parallel Landweber algorithm to solve the PECT inverse problem to obtain the dielectric constant distribution map. In the DualGAN network, the Unet generator utilizes an Adam-based local attention mechanism to adjust module weights, facilitating feature extraction and the generation of high-quality transformation images of the Landweber dielectric constant distribution. A PatchGAN discriminator is employed to distinguish between transformation images and real images, using the generated transformation images as target images. Experimental results demonstrate that the sensitivity field, enhanced by the improved particle swarm algorithm and L2-norm normalization, achieves better balance. Furthermore, the addition of a network transformation using the Adam-based local attention weight mechanism on the DualGAN network reduces artifacts in the reconstructed images, resulting in more accurate PECT reconstructions. The PECT image defect detection method, integrating DualGAN, an improved particle swarm optimization algorithm, and a local attention mechanism, has made significant strides in addressing challenges related to image reconstruction accuracy and speed. This technological advancement has enhanced the precision and efficiency of defect detection in carbon fiber composite materials, thereby fostering the broader utilization of planar capacitance tomography technology in industrial damage detection and material defect analysis.
{"title":"PECT Composite Defect Detection Algorithm Based on DualGAN","authors":"Ming Gao, Zhiyan Zhou, Jinjie Huang, Kewei Ding","doi":"10.1142/s0219467825500706","DOIUrl":"https://doi.org/10.1142/s0219467825500706","url":null,"abstract":"To address the problems of insufficient accuracy and slow reconstruction speed of Planar Electrical Capacitance Tomography (PECT) detection of damaged specimens, a Dual Generative Adversarial Networks (DualGAN)-based PECT image defect detection method is proposed in this paper. The improved particle swarm algorithm with adaptive particle number and L2-norm is used to optimize the sensitivity field, combined with the parallel Landweber algorithm to solve the PECT inverse problem to obtain the dielectric constant distribution map. In the DualGAN network, the Unet generator utilizes an Adam-based local attention mechanism to adjust module weights, facilitating feature extraction and the generation of high-quality transformation images of the Landweber dielectric constant distribution. A PatchGAN discriminator is employed to distinguish between transformation images and real images, using the generated transformation images as target images. Experimental results demonstrate that the sensitivity field, enhanced by the improved particle swarm algorithm and L2-norm normalization, achieves better balance. Furthermore, the addition of a network transformation using the Adam-based local attention weight mechanism on the DualGAN network reduces artifacts in the reconstructed images, resulting in more accurate PECT reconstructions. The PECT image defect detection method, integrating DualGAN, an improved particle swarm optimization algorithm, and a local attention mechanism, has made significant strides in addressing challenges related to image reconstruction accuracy and speed. This technological advancement has enhanced the precision and efficiency of defect detection in carbon fiber composite materials, thereby fostering the broader utilization of planar capacitance tomography technology in industrial damage detection and material defect analysis.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140726200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-09DOI: 10.1142/s0219467825500688
K. R. Gite, Praveen Gupta
The pivotal task of remote sensing image (RSI) processing change detection (CD) highly aims to accurately detect changes in land cover based on multi-temporal images. With the advent of deep learning, technology has delivered remarkable results in the last years in the detection of variations in forest land cover data. Some of the conventional CD techniques are weak and are highly susceptible to errors and can result even in inaccurate outcomes. Thus, certain techniques are not desirable for real-time CD applications. To abridge this gap, this research introduces an innovative work for forest CD utilizing the proposed Taylor Shepherd Golden Optimization_ResUNet (TSGO_ResUNet) and Fuzzy Neural network (Fuzzy NN) for segment mapping. Here, the segmentation process is accomplished using ResUNet to determine the exact boundary or shape of each object for every pixel in the image. Furthermore, TSGO is achieved by consolidating Taylor Shuffled Shepherd Optimization (TSSO) with Golden Search Optimization (GSO). In addition, the devised TSGO_ResUNet + Fuzzy NN has gained maximum accuracy and kappa coefficient of 0.952 and 0.785, and minimum error rate of 0.051.
遥感图像(RSI)处理中的关键任务--变化检测(CD)--高度旨在基于多时相图像准确检测土地覆盖的变化。随着深度学习技术的出现,过去几年中,该技术在森林土地覆盖数据变化检测方面取得了显著成果。一些传统的 CD 技术比较薄弱,极易出错,甚至会导致结果不准确。因此,某些技术在实时 CD 应用中并不可取。为了弥补这一不足,本研究介绍了一种利用泰勒-谢泼德黄金优化_ResUNet(TSGO_ResUNet)和模糊神经网络(Fuzzy NN)进行分段映射的森林 CD 创新方法。在这里,使用 ResUNet 完成分割过程,以确定图像中每个像素的每个对象的准确边界或形状。此外,TSGO 是通过将泰勒洗牌牧羊人优化法(TSSO)与黄金搜索优化法(GSO)相结合来实现的。此外,所设计的 TSGO_ResUNet + Fuzzy NN 获得了 0.952 和 0.785 的最高精确度和卡帕系数,以及 0.051 的最低错误率。
{"title":"Taylor Shepherd Golden Optimization-Enabled ResUNet for Forest Change Detection Using Satellite Images","authors":"K. R. Gite, Praveen Gupta","doi":"10.1142/s0219467825500688","DOIUrl":"https://doi.org/10.1142/s0219467825500688","url":null,"abstract":"The pivotal task of remote sensing image (RSI) processing change detection (CD) highly aims to accurately detect changes in land cover based on multi-temporal images. With the advent of deep learning, technology has delivered remarkable results in the last years in the detection of variations in forest land cover data. Some of the conventional CD techniques are weak and are highly susceptible to errors and can result even in inaccurate outcomes. Thus, certain techniques are not desirable for real-time CD applications. To abridge this gap, this research introduces an innovative work for forest CD utilizing the proposed Taylor Shepherd Golden Optimization_ResUNet (TSGO_ResUNet) and Fuzzy Neural network (Fuzzy NN) for segment mapping. Here, the segmentation process is accomplished using ResUNet to determine the exact boundary or shape of each object for every pixel in the image. Furthermore, TSGO is achieved by consolidating Taylor Shuffled Shepherd Optimization (TSSO) with Golden Search Optimization (GSO). In addition, the devised TSGO_ResUNet + Fuzzy NN has gained maximum accuracy and kappa coefficient of 0.952 and 0.785, and minimum error rate of 0.051.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140722280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-09DOI: 10.1142/s0219467825500676
Veeraswamy Parisae, S. Nagakishore Bhavanam
Deep neural networks have significantly promoted the progress of speech enhancement technology. However, a great number of speech enhancement approaches are unable to fully utilize context information from various scales, hindering performance enhancement. To tackle this issue, we introduce a method called TFADCSU-Net (Stacked U-Net with Time-Frequency Attention (TFA) and Deep Connection Layer (DCL)) for enhancing noisy speech in the time–frequency domain. TFADCSU-Net adopts an encoder-decoder structure with skip links. Within TFADCSU-Net, a multiscale feature extraction layer (MSFEL) is proposed to effectively capture contextual data from various scales. This allows us to leverage both global and local speech features to enhance the reconstruction of speech signals. Moreover, we incorporate deep connection layer and TFA mechanisms into the network to further improve feature extraction and aggregate utterance level context. The deep connection layer effectively captures rich and precise features by establishing direct connections starting from the initial layer to all subsequent layers, rather than relying on connections from earlier layers to subsequent layers. This approach not only enhances the information flow within the network but also avoids a significant rise in computational complexity as the number of network layers increases. The TFA module consists of two attention branches operating concurrently: one directed towards the temporal dimension and the other towards the frequency dimension. These branches generate distinct forms of attention — one for identifying relevant time frames and another for selecting frequency wise channels. These attention mechanisms assist the models in discerning “where” and “what” to prioritize. Subsequently, the TA and FA branches are combined to produce a comprehensive attention map in two dimensions. This map assigns specific attention weights to individual spectral components in the time–frequency representation, enabling the networks to proficiently capture the speech characteristics in the T-F representation. The results confirm that the proposed method outperforms other models in terms of objective speech quality as well as intelligibility.
深度神经网络极大地推动了语音增强技术的进步。然而,大量语音增强方法无法充分利用各种尺度的上下文信息,从而阻碍了性能的提升。为解决这一问题,我们引入了一种名为 TFADCSU-Net (Stacked U-Net with Time-Frequency Attention (TFA) and Deep Connection Layer (DCL))的方法,用于增强时频域的噪声语音。TFADCSU-Net 采用带跳过链接的编码器-解码器结构。在 TFADCSU-Net 中,我们提出了多尺度特征提取层 (MSFEL),以有效捕捉来自不同尺度的上下文数据。这样,我们就能利用全局和局部语音特征来增强语音信号的重构。此外,我们还在网络中加入了深度连接层和 TFA 机制,以进一步改进特征提取和语句级上下文聚合。深度连接层通过建立从初始层到所有后续层的直接连接,而不是依赖于从早期层到后续层的连接,从而有效地捕捉丰富而精确的特征。这种方法不仅增强了网络内的信息流,还避免了因网络层数增加而导致的计算复杂度大幅上升。TFA 模块由两个同时运行的注意力分支组成:一个针对时间维度,另一个针对频率维度。这些分支产生了不同形式的注意力--一种用于识别相关的时间框架,另一种用于选择频率明智的通道。这些注意机制有助于模型辨别 "哪里 "和 "什么 "需要优先处理。随后,TA 和 FA 分支结合在一起,生成一个两维的综合注意力地图。该图谱为时频表征中的各个频谱成分分配了特定的注意力权重,使网络能够熟练捕捉时频表征中的语音特征。结果证实,就客观语音质量和可懂度而言,所提出的方法优于其他模型。
{"title":"Stacked U-Net with Time–Frequency Attention and Deep Connection Net for Single Channel Speech Enhancement","authors":"Veeraswamy Parisae, S. Nagakishore Bhavanam","doi":"10.1142/s0219467825500676","DOIUrl":"https://doi.org/10.1142/s0219467825500676","url":null,"abstract":"Deep neural networks have significantly promoted the progress of speech enhancement technology. However, a great number of speech enhancement approaches are unable to fully utilize context information from various scales, hindering performance enhancement. To tackle this issue, we introduce a method called TFADCSU-Net (Stacked U-Net with Time-Frequency Attention (TFA) and Deep Connection Layer (DCL)) for enhancing noisy speech in the time–frequency domain. TFADCSU-Net adopts an encoder-decoder structure with skip links. Within TFADCSU-Net, a multiscale feature extraction layer (MSFEL) is proposed to effectively capture contextual data from various scales. This allows us to leverage both global and local speech features to enhance the reconstruction of speech signals. Moreover, we incorporate deep connection layer and TFA mechanisms into the network to further improve feature extraction and aggregate utterance level context. The deep connection layer effectively captures rich and precise features by establishing direct connections starting from the initial layer to all subsequent layers, rather than relying on connections from earlier layers to subsequent layers. This approach not only enhances the information flow within the network but also avoids a significant rise in computational complexity as the number of network layers increases. The TFA module consists of two attention branches operating concurrently: one directed towards the temporal dimension and the other towards the frequency dimension. These branches generate distinct forms of attention — one for identifying relevant time frames and another for selecting frequency wise channels. These attention mechanisms assist the models in discerning “where” and “what” to prioritize. Subsequently, the TA and FA branches are combined to produce a comprehensive attention map in two dimensions. This map assigns specific attention weights to individual spectral components in the time–frequency representation, enabling the networks to proficiently capture the speech characteristics in the T-F representation. The results confirm that the proposed method outperforms other models in terms of objective speech quality as well as intelligibility.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140726805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}