This paper presents DSR-YOLO, a pedestrian detection network that addresses critical challenges, such as scale variations and complex backgrounds. Built on the lightweight YOLOv8n architecture, it incorporates DCNv4 modules to enhance the detection rates and reduce missed detections by effectively learning key pedestrian features. A new head component enables detection across various scales, whereas RFB modules improve accuracy for smaller or occluded objects. Additionally, we enhance the initial C2f layers with a modified block that integrates SimAM and DCNv4, minimizing the background noise and sharpening the focus on the relevant features. A second version of the C2f block using SimAM and standard convolutions ensures robust feature extraction in deeper layers with optimized computational efficiency. The WIoUv3 loss function was utilized to reduce the regression loss associated with bounding boxes, further boosting the performance. Evaluated on the CityPersons dataset, DSR-YOLO outperformed YOLOv8n with a 14.9 % increase in mAP@50 and 6.3 % increase in mAP@50:95, while maintaining competitive FLOPS, parameter counts, and inference speed.
{"title":"DSR-YOLO: A lightweight and efficient YOLOv8 model for enhanced pedestrian detection","authors":"Mustapha Oussouaddi , Omar Bouazizi , Aimad El mourabit , Zine el Abidine Alaoui Ismaili , Yassine Attaoui , Mohamed Chentouf","doi":"10.1016/j.cogr.2025.04.001","DOIUrl":"10.1016/j.cogr.2025.04.001","url":null,"abstract":"<div><div>This paper presents DSR-YOLO, a pedestrian detection network that addresses critical challenges, such as scale variations and complex backgrounds. Built on the lightweight YOLOv8n architecture, it incorporates DCNv4 modules to enhance the detection rates and reduce missed detections by effectively learning key pedestrian features. A new head component enables detection across various scales, whereas RFB modules improve accuracy for smaller or occluded objects. Additionally, we enhance the initial C2f layers with a modified block that integrates SimAM and DCNv4, minimizing the background noise and sharpening the focus on the relevant features. A second version of the C2f block using SimAM and standard convolutions ensures robust feature extraction in deeper layers with optimized computational efficiency. The WIoUv3 loss function was utilized to reduce the regression loss associated with bounding boxes, further boosting the performance. Evaluated on the CityPersons dataset, DSR-YOLO outperformed YOLOv8n with a 14.9 % increase in mAP@50 and 6.3 % increase in mAP@50:95, while maintaining competitive FLOPS, parameter counts, and inference speed.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 152-165"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143844145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2025.01.001
Xujie Wan , Siyu Xu , Guangwei Gao
We propose a deep learning-based Attention-Assisted Dual-Branch Interactive Network (ADBINet) to improve facial super-resolution by addressing key challenges like inadequate feature extraction and poor multi-scale information handling. ADBINet features a multi-scale encoder-decoder architecture that captures and integrates features across scales, enhancing detail and reconstruction quality. The key to our approach is the Transformer and CNN Interaction Module (TCIM), which includes a Dual Attention Collaboration Module (DACM) for improved local and spatial feature extraction. The Channel Attention Guidance Module (CAGM) refines CNN and Transformer fusion, ensuring precise facial detail restoration. Additionally, the Attention Feature Fusion Unit (AFFM) optimizes multi-scale feature integration. Experimental results demonstrate that ADBINet outperforms existing methods in both quantitative and qualitative facial super-resolution metrics.
{"title":"Attention-assisted dual-branch interactive face super-resolution network","authors":"Xujie Wan , Siyu Xu , Guangwei Gao","doi":"10.1016/j.cogr.2025.01.001","DOIUrl":"10.1016/j.cogr.2025.01.001","url":null,"abstract":"<div><div>We propose a deep learning-based Attention-Assisted Dual-Branch Interactive Network (ADBINet) to improve facial super-resolution by addressing key challenges like inadequate feature extraction and poor multi-scale information handling. ADBINet features a multi-scale encoder-decoder architecture that captures and integrates features across scales, enhancing detail and reconstruction quality. The key to our approach is the Transformer and CNN Interaction Module (TCIM), which includes a Dual Attention Collaboration Module (DACM) for improved local and spatial feature extraction. The Channel Attention Guidance Module (CAGM) refines CNN and Transformer fusion, ensuring precise facial detail restoration. Additionally, the Attention Feature Fusion Unit (AFFM) optimizes multi-scale feature integration. Experimental results demonstrate that ADBINet outperforms existing methods in both quantitative and qualitative facial super-resolution metrics.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 77-85"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143143955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2025.06.003
Prabhakar Saxena , Gayatri M. Phade
Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) performs crucial function in many applications like military operations, disaster management, hazardous operations and surveillance. Efficient bidirectional communication between UAVs and UGVs is necessary for effective coordination and successful task completion. Traditional routing protocols facilitate communication either between UAVs or between UGVs, but not efficiently across both platforms. Moreover traditional routing protocol often fail to adapt dynamically to varying network conditions, such as mobility, interference, and congestion. To overcome these challenges, this paper presents a design, implementation, and optimization of adaptive routing protocol engineered for specific requirements of coordinated network consisting of UAV and UGV. This novel protocol design integrates the Greedy Perimeter Stateless Routing (GPSR) and Deep Reinforcement Learning (DRL) to optimize packet routing based on real-time network states and ensuring obstacle avoidance, enhanced throughput, minimal latency and reduced packet loss. Simulations are conducted in python to evaluate the performance of the proposed protocol. The results shows that the DRL-based routing protocol enables communication between UAVs and UGVs through the shortest and most efficient path. This research contributes to the advancement of AI enabled communication architecture for co-ordinated UAV-UGV networks, for robust and efficient mission-critical operations.
{"title":"Deep reinforcement learning-based routing framework for bidirectional communication in UAV-UGV networks","authors":"Prabhakar Saxena , Gayatri M. Phade","doi":"10.1016/j.cogr.2025.06.003","DOIUrl":"10.1016/j.cogr.2025.06.003","url":null,"abstract":"<div><div>Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) performs crucial function in many applications like military operations, disaster management, hazardous operations and surveillance. Efficient bidirectional communication between UAVs and UGVs is necessary for effective coordination and successful task completion. Traditional routing protocols facilitate communication either between UAVs or between UGVs, but not efficiently across both platforms. Moreover traditional routing protocol often fail to adapt dynamically to varying network conditions, such as mobility, interference, and congestion. To overcome these challenges, this paper presents a design, implementation, and optimization of adaptive routing protocol engineered for specific requirements of coordinated network consisting of UAV and UGV. This novel protocol design integrates the Greedy Perimeter Stateless Routing (GPSR) and Deep Reinforcement Learning (DRL) to optimize packet routing based on real-time network states and ensuring obstacle avoidance, enhanced throughput, minimal latency and reduced packet loss. Simulations are conducted in python to evaluate the performance of the proposed protocol. The results shows that the DRL-based routing protocol enables communication between UAVs and UGVs through the shortest and most efficient path. This research contributes to the advancement of AI enabled communication architecture for co-ordinated UAV-UGV networks, for robust and efficient mission-critical operations.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 249-259"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a proof-of-concept for a lower-extremity rehabilitation device, called Rehab-bot, that would aid patients with lower-limb impairments in continuing their rehabilitation in its required intensity at home after inpatient care. This research focuses on developing the patient‘s muscle training feature using admittance control to generate resistance for isotonic exercise, particularly emphasizing the potential for progressive resistance training. The mechanical structure of the Rehab-bot was inspired by a continuous passive motion machine that can be optimized to be a light and compact device suitable for home-based use. Systems design, development, and experimental evaluation are presented. Experiments were performed with one healthy subject by monitoring two parameters: the forces exerted by leg muscles through a force sensor and the resulting position of the foot support that is actuated by the robot. Results have shown that Rehab-bot can demonstrate lower-limb isotonic exercise by generating a virtual load that can be progressively increased.
{"title":"Rehab-Bot: A home-based lower-extremity rehabilitation robot for muscle recovery","authors":"Sandro Mihradi , Edgar Buwana Sutawika , Vani Virdyawan , Rachmat Zulkarnain Goesasi , Masahiro Todoh","doi":"10.1016/j.cogr.2025.02.001","DOIUrl":"10.1016/j.cogr.2025.02.001","url":null,"abstract":"<div><div>This paper presents a proof-of-concept for a lower-extremity rehabilitation device, called Rehab-bot, that would aid patients with lower-limb impairments in continuing their rehabilitation in its required intensity at home after inpatient care. This research focuses on developing the patient‘s muscle training feature using admittance control to generate resistance for isotonic exercise, particularly emphasizing the potential for progressive resistance training. The mechanical structure of the Rehab-bot was inspired by a continuous passive motion machine that can be optimized to be a light and compact device suitable for home-based use. Systems design, development, and experimental evaluation are presented. Experiments were performed with one healthy subject by monitoring two parameters: the forces exerted by leg muscles through a force sensor and the resulting position of the foot support that is actuated by the robot. Results have shown that Rehab-bot can demonstrate lower-limb isotonic exercise by generating a virtual load that can be progressively increased.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 114-125"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2024.11.004
Yuntao Wei, Xiujia Wang, Chunjuan Bo, Zhan Shi
The increasing use and militarization of UAV technology presents significant challenges to nations and societies. Notably, there is a deficit in anti- UAV technologies for civilian use, particularly in complex urban environments at low altitudes. This paper proposes the ESMS-YOLOv7 algorithm, which is specifically engineered to detect small target UAVs in such challenging urban landscapes. The algorithm focuses on the extraction of features from small target UAVs in urban contexts. Enhancements to YOLOv7 include the integration of the ELAN-C module, the SimSPPFCSPC-R module, and the MP-CBAM module, which collectively improve the network's ability to extract features and focus on small target UAVs. Additionally, the SIOU loss function is employed to increase the model's robustness. The effectiveness of the ESMS-YOLOv7 algorithm is validated through its performance on the DUT Anti-UAV dataset, where it exhibits superior capabilities relative to other leading algorithms.
{"title":"Small target drone algorithm in low-altitude complex urban scenarios based on ESMS-YOLOv7","authors":"Yuntao Wei, Xiujia Wang, Chunjuan Bo, Zhan Shi","doi":"10.1016/j.cogr.2024.11.004","DOIUrl":"10.1016/j.cogr.2024.11.004","url":null,"abstract":"<div><div>The increasing use and militarization of UAV technology presents significant challenges to nations and societies. Notably, there is a deficit in anti- UAV technologies for civilian use, particularly in complex urban environments at low altitudes. This paper proposes the ESMS-YOLOv7 algorithm, which is specifically engineered to detect small target UAVs in such challenging urban landscapes. The algorithm focuses on the extraction of features from small target UAVs in urban contexts. Enhancements to YOLOv7 include the integration of the ELAN-C module, the SimSPPFCSPC-R module, and the MP-CBAM module, which collectively improve the network's ability to extract features and focus on small target UAVs. Additionally, the SIOU loss function is employed to increase the model's robustness. The effectiveness of the ESMS-YOLOv7 algorithm is validated through its performance on the DUT Anti-UAV dataset, where it exhibits superior capabilities relative to other leading algorithms.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 14-25"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143143536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2025.07.001
Xuan Jin , Wen Zhou , Qinyou Zhu , Weijie Wang , Guoteng Xu
This paper employs text mining techniques, specifically Latent Dirichlet Allocation (LDA) and BERTopic topic models, to conduct an in-depth investigation of the supply and demand structure of regional scientific and technological achievements. The objective is to identify imbalances in supply and demand, thereby providing novel insights for enhancing the efficiency of technology transfer. The research findings indicate that the LDA model outperforms the BERTopic model in this study. Taking Guizhou Province, China, as a case study, the LDA model analysis categorizes the demand side into 16 domains and the supply side into 18 domains, both exhibiting a "long-tail distribution" characteristic. Further analysis reveals a structural imbalance in the supply and demand of scientific and technological achievements in Guizhou Province. For instance, there is a high demand in areas such as mineral extraction and utilization, as well as digital and intelligent applications, accounting for 20.3 % and 14.3 % respectively, yet the supply is insufficient, with only 5.1 % and 3.1 % respectively. Conversely, areas like mechanical processing, and bridge and building construction experience an oversupply, with the supply accounting for 17.9 % and 13.8 % respectively. Addressing the structural imbalance in the supply and demand of scientific and technological achievements, this study proposes development recommendations from three perspectives: policy and management systems, regional collaboration, and ecological construction. The aim is to optimize the supply and demand structure of scientific and technological achievements in Guizhou Province and promote the deep integration of technology and the economy.
{"title":"Research on the analysis and application of technological supply and demand structure based on LDA and BERTopic models","authors":"Xuan Jin , Wen Zhou , Qinyou Zhu , Weijie Wang , Guoteng Xu","doi":"10.1016/j.cogr.2025.07.001","DOIUrl":"10.1016/j.cogr.2025.07.001","url":null,"abstract":"<div><div>This paper employs text mining techniques, specifically Latent Dirichlet Allocation (LDA) and BERTopic topic models, to conduct an in-depth investigation of the supply and demand structure of regional scientific and technological achievements. The objective is to identify imbalances in supply and demand, thereby providing novel insights for enhancing the efficiency of technology transfer. The research findings indicate that the LDA model outperforms the BERTopic model in this study. Taking Guizhou Province, China, as a case study, the LDA model analysis categorizes the demand side into 16 domains and the supply side into 18 domains, both exhibiting a \"long-tail distribution\" characteristic. Further analysis reveals a structural imbalance in the supply and demand of scientific and technological achievements in Guizhou Province. For instance, there is a high demand in areas such as mineral extraction and utilization, as well as digital and intelligent applications, accounting for 20.3 % and 14.3 % respectively, yet the supply is insufficient, with only 5.1 % and 3.1 % respectively. Conversely, areas like mechanical processing, and bridge and building construction experience an oversupply, with the supply accounting for 17.9 % and 13.8 % respectively. Addressing the structural imbalance in the supply and demand of scientific and technological achievements, this study proposes development recommendations from three perspectives: policy and management systems, regional collaboration, and ecological construction. The aim is to optimize the supply and demand structure of scientific and technological achievements in Guizhou Province and promote the deep integration of technology and the economy.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 260-275"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144694867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2025.06.001
Xiang Liu, Shuntian Xie
This research explores underwater robot applications in marine cable inspection and maintenance with solutions to accuracy, reliability, and efficiency challenges. Current methods using human divers and remotely operated vehicles (ROVs) are expensive, time-consuming, and involve safety hazards. The suggested AI-based robotic system incorporates sensor technology, predictive maintenance, and statistical validation to maximize marine cable inspections. A quantitative research method was employed, surveying data from 400 Marine Engineers and Underwater Robotics Specialists. Statistical analysis, such as reliability analysis, regression model, and hypothesis testing, determined the influence of technology adoption, environmental aspects, and predictive maintenance on inspection accuracy and cost savings. Model fit was confirmed through CFI , RMSEA , and . Results show that Maintenance Strategy & Cost Reduction is most influential. The research assures that AI-enhanced underwater robots provide a cost-efficient, guaranteed substitute to conventional approaches, promoting efficiency, safety, and long-term sustainability in marine cable operations.
{"title":"Innovative strategy and practice of using underwater robot for marine cable inspection and operation and maintenance","authors":"Xiang Liu, Shuntian Xie","doi":"10.1016/j.cogr.2025.06.001","DOIUrl":"10.1016/j.cogr.2025.06.001","url":null,"abstract":"<div><div>This research explores underwater robot applications in marine cable inspection and maintenance with solutions to accuracy, reliability, and efficiency challenges. Current methods using human divers and remotely operated vehicles (ROVs) are expensive, time-consuming, and involve safety hazards. The suggested AI-based robotic system incorporates sensor technology, predictive maintenance, and statistical validation to maximize marine cable inspections. A quantitative research method was employed, surveying data from 400 Marine Engineers and Underwater Robotics Specialists. Statistical analysis, such as reliability analysis, regression model, and hypothesis testing, determined the influence of technology adoption, environmental aspects, and predictive maintenance on inspection accuracy and cost savings. Model fit was confirmed through CFI <span><math><mrow><mo>=</mo><mn>0.94</mn></mrow></math></span>, RMSEA <span><math><mrow><mo>=</mo><mn>0.047</mn></mrow></math></span>, and <span><math><mrow><mi>SRMR</mi><mo>=</mo><mn>0.052</mn></mrow></math></span>. Results show that Maintenance Strategy & Cost Reduction <span><math><mrow><mo>(</mo><mi>β</mi><mo>=</mo><mn>0.55</mn><mo>,</mo><mrow><mi>p</mi></mrow><mo><</mo><mn>0.01</mn><mo>)</mo></mrow></math></span> is most influential. The research assures that AI-enhanced underwater robots provide a cost-efficient, guaranteed substitute to conventional approaches, promoting efficiency, safety, and long-term sustainability in marine cable operations.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 226-239"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144570295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2025.06.002
Yuhang Liu, Chunjuan Bo, Chong Feng
The significance of fire detection lies in protecting public safety and safeguarding the lives and property of people. However, there exist some problems in traditional detection algorithms of fire, such as low accuracy, high miss rate, and low detection rate of small targets. To effectively solve these issues, a fire detection algorithm based on YOLOv8s is introduced in this paper, called FB-YOLOv8s. First, the FasterNet lightweight network is introduced into the YOLOv8s network, merging the FasterNet Block structure of FasterNet with the original C2f modules to reduce the number of model parameters. Second, the Bi-directional Feature Pyramid Network (BiFPN) is incorporated to replace the Path Aggregation Network (PANet) in the neck network to enhance the model’s feature fusion capability. Finally, we adopt the WIoUv3 loss function to optimize the training process and improve detection accuracy. The experimental results demonstrate that compared to the original algorithm, the mAP of FB-YOLOv8s increases by 2.0 %, and the number of parameters decreases by 25.23 %. This method has better detection performance for fire targets.
{"title":"FB-YOLOv8s: A fire detection algorithm based on YOLOv8s","authors":"Yuhang Liu, Chunjuan Bo, Chong Feng","doi":"10.1016/j.cogr.2025.06.002","DOIUrl":"10.1016/j.cogr.2025.06.002","url":null,"abstract":"<div><div>The significance of fire detection lies in protecting public safety and safeguarding the lives and property of people. However, there exist some problems in traditional detection algorithms of fire, such as low accuracy, high miss rate, and low detection rate of small targets. To effectively solve these issues, a fire detection algorithm based on YOLOv8s is introduced in this paper, called FB-YOLOv8s. First, the FasterNet lightweight network is introduced into the YOLOv8s network, merging the FasterNet Block structure of FasterNet with the original C2f modules to reduce the number of model parameters. Second, the Bi-directional Feature Pyramid Network (BiFPN) is incorporated to replace the Path Aggregation Network (PANet) in the neck network to enhance the model’s feature fusion capability. Finally, we adopt the WIoUv3 loss function to optimize the training process and improve detection accuracy. The experimental results demonstrate that compared to the original algorithm, the mAP<span><math><msub><mrow></mrow><mrow><mn>0.5</mn></mrow></msub></math></span> of FB-YOLOv8s increases by 2.0 %, and the number of parameters decreases by 25.23 %. This method has better detection performance for fire targets.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 240-248"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144588764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2025.08.001
Hui Chen , Runming Jiang , Fang Hu , Min Chen , Yin Zhang
In the context of natural scenes, traditional text recognition methods exhibit limitations when confronted with the substantial differences in characters and context among diverse languages. To address this challenge, we propose an approach LFEN for text recognition and correction in natural scenes. By directly embedding language features into the text recognition model, we effectively address the issue of accuracy in scene text recognition, reducing the potential risk of error accumulation compared to traditional language recognition-text recognition serial connections. Through a detailed analysis of global and local language features, this paper successfully achieves more accurate differentiation between languages with similar characters, thereby enhancing text recognition accuracy. Furthermore, by incorporating the intrinsic semantic relationships of text content, this paper employs a sequence-to-sequence (Seq2Seq) model based on convolutional neural networks for text correction. Through the integration of language information, different feature embeddings, and global residual connections, the paper provides a robust solution for text correction in scene text recognition. Compared to the baselines, the experimental results demonstrate that LFEN achieves superior performance in most evaluation metrics. Specifically, LFEN has around 2% in recall improved to BERT. This research contributes substantial support to the advancement of natural scene text recognition and correction.
{"title":"LFEN: A language feature enhanced network for scene text recognition","authors":"Hui Chen , Runming Jiang , Fang Hu , Min Chen , Yin Zhang","doi":"10.1016/j.cogr.2025.08.001","DOIUrl":"10.1016/j.cogr.2025.08.001","url":null,"abstract":"<div><div>In the context of natural scenes, traditional text recognition methods exhibit limitations when confronted with the substantial differences in characters and context among diverse languages. To address this challenge, we propose an approach LFEN for text recognition and correction in natural scenes. By directly embedding language features into the text recognition model, we effectively address the issue of accuracy in scene text recognition, reducing the potential risk of error accumulation compared to traditional language recognition-text recognition serial connections. Through a detailed analysis of global and local language features, this paper successfully achieves more accurate differentiation between languages with similar characters, thereby enhancing text recognition accuracy. Furthermore, by incorporating the intrinsic semantic relationships of text content, this paper employs a sequence-to-sequence (Seq2Seq) model based on convolutional neural networks for text correction. Through the integration of language information, different feature embeddings, and global residual connections, the paper provides a robust solution for text correction in scene text recognition. Compared to the baselines, the experimental results demonstrate that LFEN achieves superior performance in most evaluation metrics. Specifically, LFEN has around 2% in recall improved to BERT. This research contributes substantial support to the advancement of natural scene text recognition and correction.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 276-285"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144880314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cogr.2025.05.002
Teng Wang , Fenglian Li , Jia Yang , Wenhui Jia , Fengyun Hu
Stroke classification is crucial for timely diagnosis and treatment, as it helps differentiate between hemorrhagic and ischemic strokes, which require distinct clinical interventions. This paper proposes a stroke classification method using multi-channel electroencephalography (EEG) data. Unlike single-channel data or simple multi-channel concatenation, our method processes EEG data as a channel matrix, significantly improving classification performance. We employ two complementary feature extraction techniques: discrete wavelet transform (DWT) and empirical mode decomposition (EMD). DWT extracts multi-scale wavelet coefficients from stroke-related frequency bands, while EMD decomposes EEG signals into intrinsic mode functions (IMFs), representing narrowband oscillation components. To enhance feature quality, we propose a hybrid selection method that integrates four metrics—information entropy, power spectral density (PSD) distance, statistical significance, and maximum information coefficient (MIC)—to comprehensively evaluate IMFs. This method accounts for both the intrinsic information content of EEG signals and the inter-class differences between hemorrhagic and ischemic stroke subjects. Furthermore, this paper designs a pyramid cascade convolutional neural network (PCCNN) model with multi-branch independent learning and hierarchical fusion. Each DWT and EMD feature is processed by an independent one-dimensional convolutional neural networks (1D-CNN) branch for targeted extraction. A pyramid fusion mechanism integrates branch outputs into a fused feature vector, enabling the feature interaction through a top-level fusion CNN. Experimental results demonstrate that the proposed method, which integrates channel matrix processing, high-quality DWT and EMD feature selection, and multi-branch feature fusion, significantly outperforms single-feature methods. The fusion feature achieves a classification accuracy of 99.48 %, effectively distinguishing EEG data of hemorrhagic and ischemic stroke.
{"title":"PCCNN: A CNN classification model integrating EEG time-frequency features for stroke classification","authors":"Teng Wang , Fenglian Li , Jia Yang , Wenhui Jia , Fengyun Hu","doi":"10.1016/j.cogr.2025.05.002","DOIUrl":"10.1016/j.cogr.2025.05.002","url":null,"abstract":"<div><div>Stroke classification is crucial for timely diagnosis and treatment, as it helps differentiate between hemorrhagic and ischemic strokes, which require distinct clinical interventions. This paper proposes a stroke classification method using multi-channel electroencephalography (EEG) data. Unlike single-channel data or simple multi-channel concatenation, our method processes EEG data as a channel matrix, significantly improving classification performance. We employ two complementary feature extraction techniques: discrete wavelet transform (DWT) and empirical mode decomposition (EMD). DWT extracts multi-scale wavelet coefficients from stroke-related frequency bands, while EMD decomposes EEG signals into intrinsic mode functions (IMFs), representing narrowband oscillation components. To enhance feature quality, we propose a hybrid selection method that integrates four metrics—information entropy, power spectral density (PSD) distance, statistical significance, and maximum information coefficient (MIC)—to comprehensively evaluate IMFs. This method accounts for both the intrinsic information content of EEG signals and the inter-class differences between hemorrhagic and ischemic stroke subjects. Furthermore, this paper designs a pyramid cascade convolutional neural network (PCCNN) model with multi-branch independent learning and hierarchical fusion. Each DWT and EMD feature is processed by an independent one-dimensional convolutional neural networks (1D-CNN) branch for targeted extraction. A pyramid fusion mechanism integrates branch outputs into a fused feature vector, enabling the feature interaction through a top-level fusion CNN. Experimental results demonstrate that the proposed method, which integrates channel matrix processing, high-quality DWT and EMD feature selection, and multi-branch feature fusion, significantly outperforms single-feature methods. The fusion feature achieves a classification accuracy of 99.48 %, effectively distinguishing EEG data of hemorrhagic and ischemic stroke.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 211-225"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144189322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}