In the exploration of cross-media retrieval encompassing images and text, an advanced method incorporating two-level similarity and collaborative representation (TLSCR) is presented. Initially, two sub-networks were designed to handle both global and local features, facilitating enhanced semantic associations between images and textual content. Whole images, along with regional image sectors, served as representations for images, while textual content was depicted both through complete sentences and select keywords. An innovative two-level alignment approach was introduced to segregate and then amalgamate the global and local depictions of paired images and texts. Subsequently, employing collaborative representation (CR) technology, each experimental image was collaboratively reconstructed by utilising the entirety of the training images, and every experimental text by incorporating all the training texts. The collaborative coefficients derived were subsequently employed as congruent dimensional representations for both images and texts. Upon completion of these operations, cross-media retrieval between the two modalities was conducted. Experimental outcomes on datasets like Wikipedia and Pascal Sentence confirm the superior precision of the proposed method, surpassing conventional cross-media retrieval methodologies.
{"title":"Cross-Media Retrieval Based on Two-Level Similarity and Collaborative Representation","authors":"Jiahua Zhang","doi":"10.18280/ts.400533","DOIUrl":"https://doi.org/10.18280/ts.400533","url":null,"abstract":"In the exploration of cross-media retrieval encompassing images and text, an advanced method incorporating two-level similarity and collaborative representation (TLSCR) is presented. Initially, two sub-networks were designed to handle both global and local features, facilitating enhanced semantic associations between images and textual content. Whole images, along with regional image sectors, served as representations for images, while textual content was depicted both through complete sentences and select keywords. An innovative two-level alignment approach was introduced to segregate and then amalgamate the global and local depictions of paired images and texts. Subsequently, employing collaborative representation (CR) technology, each experimental image was collaboratively reconstructed by utilising the entirety of the training images, and every experimental text by incorporating all the training texts. The collaborative coefficients derived were subsequently employed as congruent dimensional representations for both images and texts. Upon completion of these operations, cross-media retrieval between the two modalities was conducted. Experimental outcomes on datasets like Wikipedia and Pascal Sentence confirm the superior precision of the proposed method, surpassing conventional cross-media retrieval methodologies.","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":"36 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136104120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Review on Machine Learning Approaches for Enhancing Human Speech Recognition","authors":"Maha Adnan Shanshool, Husam Ali Abdulmohsin","doi":"10.18280/ts.400529","DOIUrl":"https://doi.org/10.18280/ts.400529","url":null,"abstract":"ABSTRACT","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":"30 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136104298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the realm of waste management, the accurate identification of biodegradable and non-biodegradable items remains a critical challenge. An advanced real-time object detection method, termed “MobileYOLO”, was proposed, leveraging the strengths of the YOLO v4 framework. The MobileNetv2 network was integrated, and a section of the conventional computation was substituted with depth-wise separable convolutions utilizing the PAnet and head network. To enhance feature expressiveness capabilities during feature fusion, a refined lightweight channel attention mechanism, known as Efficient Channel Attention (ECA), was introduced. The Improved Single Stage Headless (ISSH) context module was incorporated into the micro-object identification branch to broaden the receptive field. Evaluations conducted on the KITTI dataset indicated an impressive accuracy of 95.7%. Remarkably, when compared to the standard YOLOv4, the MobileYOLO model exhibited a reduction in model parameters by 53.12M, a decrease in connectivity size by one-fifth, and an augmentation in detection speed by 85%.
{"title":"Efficient Differentiation of Biodegradable and Non-Biodegradable Municipal Waste Using a Novel MobileYOLO Algorithm","authors":"Menaka Suman, Gayathri Arulanantham","doi":"10.18280/ts.400505","DOIUrl":"https://doi.org/10.18280/ts.400505","url":null,"abstract":"In the realm of waste management, the accurate identification of biodegradable and non-biodegradable items remains a critical challenge. An advanced real-time object detection method, termed “MobileYOLO”, was proposed, leveraging the strengths of the YOLO v4 framework. The MobileNetv2 network was integrated, and a section of the conventional computation was substituted with depth-wise separable convolutions utilizing the PAnet and head network. To enhance feature expressiveness capabilities during feature fusion, a refined lightweight channel attention mechanism, known as Efficient Channel Attention (ECA), was introduced. The Improved Single Stage Headless (ISSH) context module was incorporated into the micro-object identification branch to broaden the receptive field. Evaluations conducted on the KITTI dataset indicated an impressive accuracy of 95.7%. Remarkably, when compared to the standard YOLOv4, the MobileYOLO model exhibited a reduction in model parameters by 53.12M, a decrease in connectivity size by one-fifth, and an augmentation in detection speed by 85%.","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":"40 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136067991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This review offers an exhaustive examination of phoneme recognition, an essential sub-word acoustic unit in speech processing. Phoneme-based systems find widespread utility in diverse applications including speech recognition, speaker identification
{"title":"A Comprehensive Examination of Phoneme Recognition in Automatic Speech Recognition Systems","authors":"Shobha Bhatt, Shweta Bansal, Ankit Kumar, Saroj Kumar Pandey, Manoj Kumar Ojha, Kamred Udham Singh, Sanjay Chakraborty, Teekam Singh, Chetan Swarup","doi":"10.18280/ts.400518","DOIUrl":"https://doi.org/10.18280/ts.400518","url":null,"abstract":"This review offers an exhaustive examination of phoneme recognition, an essential sub-word acoustic unit in speech processing. Phoneme-based systems find widespread utility in diverse applications including speech recognition, speaker identification","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136102670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tamara A. Dawood, Ashwaq T. Hashim, Ahmed R. Nasser
The timely diagnosis of brain tumors plays a critical role in enhancing patient prognosis and survival rates. Despite its superior accuracy, manual tumor segmentation is known to be a labor-intensive process. Over the years, a collection of automated tumor segmentation methodologies has been devised and investigated. However, a universally applicable resolution that consistently delivers reliable outcomes across diverse datasets continues to be elusive. Additionally, skull stripping remains a crucial prerequisite to the tumor segmentation procedure. This paper introduces an integrated 3D Attention Residual U-Net (3D_Att_Res_U-Net) model that seamlessly merges attention mechanisms and residual units within the U-Net architecture to augment the performance of brain tumor segmentation and skull stripping in Magnetic Resonance Imaging (MRI). An initial preprocessing stage is implemented, incorporating bias field correction and intensity normalization to optimize performance. The proposed model is trained using the Brain Tumor Segmentation (BraTS) 2020 dataset, along with the Neurofeedback Skull Stripping (NFBS) dataset. The proposed methodology achieved Dice Similarity Coefficients (DSC) of 0.9961 for skull stripping, and 0.9985, 0.9982, and 0.9980 for whole tumor, enhanced tumor, and tumor core segmentation, respectively. Experimental results underscore the applicability and superiority of the proposed approach compared to existing methods in this research domain.
{"title":"Advances in Brain Tumor Segmentation and Skull Stripping: A 3D Residual Attention U-Net Approach","authors":"Tamara A. Dawood, Ashwaq T. Hashim, Ahmed R. Nasser","doi":"10.18280/ts.400510","DOIUrl":"https://doi.org/10.18280/ts.400510","url":null,"abstract":"The timely diagnosis of brain tumors plays a critical role in enhancing patient prognosis and survival rates. Despite its superior accuracy, manual tumor segmentation is known to be a labor-intensive process. Over the years, a collection of automated tumor segmentation methodologies has been devised and investigated. However, a universally applicable resolution that consistently delivers reliable outcomes across diverse datasets continues to be elusive. Additionally, skull stripping remains a crucial prerequisite to the tumor segmentation procedure. This paper introduces an integrated 3D Attention Residual U-Net (3D_Att_Res_U-Net) model that seamlessly merges attention mechanisms and residual units within the U-Net architecture to augment the performance of brain tumor segmentation and skull stripping in Magnetic Resonance Imaging (MRI). An initial preprocessing stage is implemented, incorporating bias field correction and intensity normalization to optimize performance. The proposed model is trained using the Brain Tumor Segmentation (BraTS) 2020 dataset, along with the Neurofeedback Skull Stripping (NFBS) dataset. The proposed methodology achieved Dice Similarity Coefficients (DSC) of 0.9961 for skull stripping, and 0.9985, 0.9982, and 0.9980 for whole tumor, enhanced tumor, and tumor core segmentation, respectively. Experimental results underscore the applicability and superiority of the proposed approach compared to existing methods in this research domain.","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":"13 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136104393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid progression of diseases in the elderly, such as Alzheimer's Dementia (AD), necessitates effective early detection mechanisms to ensure appropriate healthcare provision. Given the consistently increasing prevalence of AD, the potential for emerging socio-economic challenges is significant. This underlines the importance of developing early detection strategies to mitigate the progression of this disease. Electroencephalograms (EEG) present a promising avenue for the early diagnosis of AD. EEG signals harbor crucial information pertaining to neuronal death triggered by amyloid plaque accumulation, a characteristic feature of AD. Spectral analysis reveals a deceleration in signal activity in AD patients when compared to healthy elderly individuals. However, this method is frequently compromised by low-frequency noise, necessitating the exploration of alternative approaches for analyzing EEG signal features for early AD detection. Considering the complex nature of EEG signals, it is hypothesized that pathological conditions, such as AD, may induce alterations in signal complexity. In this study, an early detection model for AD was simulated utilizing an approach that focused on EEG signal complexity. Complexity analysis, incorporating Spectral Entropy (SpecEn) and fractal dimensions, was calculated across 19 EEG channels from a total of 34 subjects (16 normal and 18 with Mild Cognitive Impairment (MCI)). Performance validation of the proposed method was achieved through Linear Discriminant Analysis (LDA), yielding an accuracy of 82.4%, specificity of 77.8%, and sensitivity of 87.5%. The findings from this study suggest that EEG analysis can serve as a reliable tool for the early detection of AD.
{"title":"Entropy and Fractal Analysis of EEG Signals for Early Detection of Alzheimer's Dementia","authors":"S. Hadiyoso, I. Wijayanto, A. Humairani","doi":"10.18280/ts.400435","DOIUrl":"https://doi.org/10.18280/ts.400435","url":null,"abstract":"The rapid progression of diseases in the elderly, such as Alzheimer's Dementia (AD), necessitates effective early detection mechanisms to ensure appropriate healthcare provision. Given the consistently increasing prevalence of AD, the potential for emerging socio-economic challenges is significant. This underlines the importance of developing early detection strategies to mitigate the progression of this disease. Electroencephalograms (EEG) present a promising avenue for the early diagnosis of AD. EEG signals harbor crucial information pertaining to neuronal death triggered by amyloid plaque accumulation, a characteristic feature of AD. Spectral analysis reveals a deceleration in signal activity in AD patients when compared to healthy elderly individuals. However, this method is frequently compromised by low-frequency noise, necessitating the exploration of alternative approaches for analyzing EEG signal features for early AD detection. Considering the complex nature of EEG signals, it is hypothesized that pathological conditions, such as AD, may induce alterations in signal complexity. In this study, an early detection model for AD was simulated utilizing an approach that focused on EEG signal complexity. Complexity analysis, incorporating Spectral Entropy (SpecEn) and fractal dimensions, was calculated across 19 EEG channels from a total of 34 subjects (16 normal and 18 with Mild Cognitive Impairment (MCI)). Performance validation of the proposed method was achieved through Linear Discriminant Analysis (LDA), yielding an accuracy of 82.4%, specificity of 77.8%, and sensitivity of 87.5%. The findings from this study suggest that EEG analysis can serve as a reliable tool for the early detection of AD.","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46174395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COVID-19 pandemic has precipitated an unprecedented surge in the proliferation of online E-learning platforms, designed to cater to a wide array of subjects across all age groups. However, a paucity of these platforms adopts a learner-centric approach or validates user learning, underscoring the need for effective E-learning validation and personalized learning recommendations. This paper addresses these challenges by implementing an innovative approach that leverages real-time electroencephalogram (EEG) signals collected from learners, who don neuro headsets while partaking in online courses. These EEG signals are subsequently classified using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) deep learning models, with the intent of discerning the efficacy of the E-learning process. The proposed models have yielded promising classification accuracies of 68% and 97% for the CNN and LSTM models, respectively, demonstrating their rapidity and precision in classifying E-learning EEG signals. Thus, these models hold substantial potential for application in similar E-learning validation scenarios. Furthermore, this study introduces an automated framework designed to track the learning curve of users and furnish valuable recommendations for E-learning materials. The presented approach, therefore, not only validates the E-learning process but also aids in optimizing the learning experiences on E-learning platforms.
{"title":"Neural Correlate-Based E-Learning Validation and Classification Using Convolutional and Long Short-Term Memory Networks","authors":"Dharmendra Pathak, R. Kashyap","doi":"10.18280/ts.400414","DOIUrl":"https://doi.org/10.18280/ts.400414","url":null,"abstract":"The COVID-19 pandemic has precipitated an unprecedented surge in the proliferation of online E-learning platforms, designed to cater to a wide array of subjects across all age groups. However, a paucity of these platforms adopts a learner-centric approach or validates user learning, underscoring the need for effective E-learning validation and personalized learning recommendations. This paper addresses these challenges by implementing an innovative approach that leverages real-time electroencephalogram (EEG) signals collected from learners, who don neuro headsets while partaking in online courses. These EEG signals are subsequently classified using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) deep learning models, with the intent of discerning the efficacy of the E-learning process. The proposed models have yielded promising classification accuracies of 68% and 97% for the CNN and LSTM models, respectively, demonstrating their rapidity and precision in classifying E-learning EEG signals. Thus, these models hold substantial potential for application in similar E-learning validation scenarios. Furthermore, this study introduces an automated framework designed to track the learning curve of users and furnish valuable recommendations for E-learning materials. The presented approach, therefore, not only validates the E-learning process but also aids in optimizing the learning experiences on E-learning platforms.","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46308157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ATRLeNet: A Deep Learning Model for Enhanced Classification of Oryza Sativa Pathologies","authors":"Naga Venkata RajaReddy Goluguri, S. K, Sandhya Devi Gogula, Gurpreet Singh Chhabra","doi":"10.18280/ts.400422","DOIUrl":"https://doi.org/10.18280/ts.400422","url":null,"abstract":"ABSTRACT","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46796080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SVM-CNN Hybrid Classification for Waste Image Using Morphology and HSV Color Model Image Processing","authors":"Sunardi, A. Yudhana, Miftahuddin Fahmi","doi":"10.18280/ts.400446","DOIUrl":"https://doi.org/10.18280/ts.400446","url":null,"abstract":"ABSTRACT","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49423248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic channel modelling allows communication interfaces to integrate continuous learning operations for incremental BER reductions. These models scan temporal BER patterns, and then tune internal-channel parameters in order to improving communication efficiency under real-time traffic scenarios. But these models showcase high complexity, thus cannot be scaled to large-scale network deployments. Moreover, these models are not flexible, and do not support denser channel models, which restricts their applicability under real-time scenarios. To overcome these issues, this text proposes design of a novel dynamic learning method for improved channel modelling in Phased array antennas mm Wave radios via temporal breakpoint analysis. The model initially collects information about channel BER and uses a Grey Wolf Optimization (GWO) technique to improve its internal model parameters. These parameters are further tuned via a novel breakpoint model, which enables for continuous and light-weighted tuning of channel modelling parameters. This allows the model to incrementally reduce BER even under denser noise levels. The model is further cascaded with a Q-Learning based optimization process, which assists in improving channel modelling efficiency for large-scale networks. Due to these integrations, the model is capable of reducing Bit Error Rate (BER) by 8.3% when compared with standard channel modelling techniques that use Convolutional Neural Networks (CNNs), Sparse Bayesian Learning, etc. These methods were selected for comparison due to their higher efficiency and scalability when applied to real-time communication scenarios. The model also showcased 6.5% lower computational delay due to linear processing operations. It was able to achieve 10.4% better channel coverage, 8.5% higher throughput, and 4.9% higher channel estimation accuracy, which makes it useful for a wide
{"title":"Design of a Genetic Algorithm Based Dynamic Learning Method for Improved Channel Modelling in mmWave Radios via Temporal Breakpoint Analysis","authors":"A. Bhoi, V. Hendre","doi":"10.18280/ts.400420","DOIUrl":"https://doi.org/10.18280/ts.400420","url":null,"abstract":"Dynamic channel modelling allows communication interfaces to integrate continuous learning operations for incremental BER reductions. These models scan temporal BER patterns, and then tune internal-channel parameters in order to improving communication efficiency under real-time traffic scenarios. But these models showcase high complexity, thus cannot be scaled to large-scale network deployments. Moreover, these models are not flexible, and do not support denser channel models, which restricts their applicability under real-time scenarios. To overcome these issues, this text proposes design of a novel dynamic learning method for improved channel modelling in Phased array antennas mm Wave radios via temporal breakpoint analysis. The model initially collects information about channel BER and uses a Grey Wolf Optimization (GWO) technique to improve its internal model parameters. These parameters are further tuned via a novel breakpoint model, which enables for continuous and light-weighted tuning of channel modelling parameters. This allows the model to incrementally reduce BER even under denser noise levels. The model is further cascaded with a Q-Learning based optimization process, which assists in improving channel modelling efficiency for large-scale networks. Due to these integrations, the model is capable of reducing Bit Error Rate (BER) by 8.3% when compared with standard channel modelling techniques that use Convolutional Neural Networks (CNNs), Sparse Bayesian Learning, etc. These methods were selected for comparison due to their higher efficiency and scalability when applied to real-time communication scenarios. The model also showcased 6.5% lower computational delay due to linear processing operations. It was able to achieve 10.4% better channel coverage, 8.5% higher throughput, and 4.9% higher channel estimation accuracy, which makes it useful for a wide","PeriodicalId":49430,"journal":{"name":"Traitement Du Signal","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44818974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}