Pub Date : 2024-08-10DOI: 10.1007/s40747-024-01570-5
Xiangkui Jiang, Binglong Ren, Qing Wu, Wuwei Wang, Hong Li
Aspect-level sentiment analysis plays a pivotal role in fine-grained sentiment categorization, especially given the rapid expansion of online information. Traditional methods often struggle with accurately determining sentiment polarity when faced with implicit or ambiguous data, leading to limited accuracy and context-awareness. To address these challenges, we propose the Deep Context-Aware Sentiment Analysis Model (DCASAM). This model integrates the capabilities of Deep Bidirectional Long Short-Term Memory Network (DBiLSTM) and Densely Connected Graph Convolutional Network (DGCN), enhancing the ability to capture long-distance dependencies and subtle contextual variations.The DBiLSTM component effectively captures sequential dependencies, while the DGCN component leverages densely connected structures to model intricate relationships within the data. This combination allows DCASAM to maintain a high level of contextual understanding and sentiment detection accuracy.Experimental evaluations on well-known public datasets, including Restaurant14, Laptop14, and Twitter, demonstrate the superior performance of DCASAM over existing models. Our model achieves an average improvement in accuracy by 1.07% and F1 score by 1.68%, showcasing its robustness and efficacy in handling complex sentiment analysis tasks.These results highlight the potential of DCASAM for real-world applications, offering a solid foundation for future research in aspect-level sentiment analysis. By providing a more nuanced understanding of sentiment, our model contributes significantly to the advancement of fine-grained sentiment analysis techniques.
{"title":"DCASAM: advancing aspect-based sentiment analysis through a deep context-aware sentiment analysis model","authors":"Xiangkui Jiang, Binglong Ren, Qing Wu, Wuwei Wang, Hong Li","doi":"10.1007/s40747-024-01570-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01570-5","url":null,"abstract":"<p>Aspect-level sentiment analysis plays a pivotal role in fine-grained sentiment categorization, especially given the rapid expansion of online information. Traditional methods often struggle with accurately determining sentiment polarity when faced with implicit or ambiguous data, leading to limited accuracy and context-awareness. To address these challenges, we propose the Deep Context-Aware Sentiment Analysis Model (DCASAM). This model integrates the capabilities of Deep Bidirectional Long Short-Term Memory Network (DBiLSTM) and Densely Connected Graph Convolutional Network (DGCN), enhancing the ability to capture long-distance dependencies and subtle contextual variations.The DBiLSTM component effectively captures sequential dependencies, while the DGCN component leverages densely connected structures to model intricate relationships within the data. This combination allows DCASAM to maintain a high level of contextual understanding and sentiment detection accuracy.Experimental evaluations on well-known public datasets, including Restaurant14, Laptop14, and Twitter, demonstrate the superior performance of DCASAM over existing models. Our model achieves an average improvement in accuracy by 1.07% and F1 score by 1.68%, showcasing its robustness and efficacy in handling complex sentiment analysis tasks.These results highlight the potential of DCASAM for real-world applications, offering a solid foundation for future research in aspect-level sentiment analysis. By providing a more nuanced understanding of sentiment, our model contributes significantly to the advancement of fine-grained sentiment analysis techniques.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"191 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1007/s40747-024-01583-0
Duo Peng, Ming Shuo Liu, Kun Xie
The aim of this study is to address the issue of TDOA/FDOA measurement accuracy in complex underwater environments, which is affected by multipath effects and variations in water sound velocity induced by the challenging nature of the underwater environment. To this end, a novel cooperative localisation algorithm has been developed, integrating the attention mechanism and convolutional neural network-bidirectional gated recurrent unit (CNN-BiGRU) with TDOA/FDOA and two-step weighted least squares (ImTSWLS). This algorithm is designed to enhance the accuracy of TDOA/FDOA measurements in complex underwater environments. The algorithm initially makes use of the considerable capacity of a convolutional neural network (CNN) to extract profound spatial and frequency domain characteristics from multimodal data. These features are of paramount importance for the characterisation of underwater signal propagation, particularly in complex environments. Subsequently, through the use of a bidirectional gated recurrent unit (BiGRU), the algorithm is able to effectively capture long-term dependencies in time series data. This enables a more comprehensive analysis and understanding of the changing pattern of signals over time. Furthermore, the incorporation of an attention mechanism within the algorithm enables the model to focus more on the signal features that have a significant impact on localisation, while simultaneously suppressing the interference of extraneous information. This further enhances the efficiency of identifying and utilising the key signal features. ImTSWLS is employed to resolve the position and velocity data following the acquisition of the predicted TDOA/FDOA, thereby enabling the accurate estimation of the position and velocity of the mobile radiation source. The algorithm was subjected to a series of tests in a variety of simulated underwater environments, including different sea states, target motion speeds and base station configurations. The experimental results demonstrate that the algorithm exhibits a deviation of only 2.88 m/s in velocity estimation and 2.58 m in position estimation when the noise level is 20 dB. The algorithm presented in this paper demonstrates superior performance in both position and velocity estimation compared to other algorithms.
{"title":"Integration of attention mechanism and CNN-BiGRU for TDOA/FDOA collaborative mobile underwater multi-scene localization algorithm","authors":"Duo Peng, Ming Shuo Liu, Kun Xie","doi":"10.1007/s40747-024-01583-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01583-0","url":null,"abstract":"<p>The aim of this study is to address the issue of TDOA/FDOA measurement accuracy in complex underwater environments, which is affected by multipath effects and variations in water sound velocity induced by the challenging nature of the underwater environment. To this end, a novel cooperative localisation algorithm has been developed, integrating the attention mechanism and convolutional neural network-bidirectional gated recurrent unit (CNN-BiGRU) with TDOA/FDOA and two-step weighted least squares (ImTSWLS). This algorithm is designed to enhance the accuracy of TDOA/FDOA measurements in complex underwater environments. The algorithm initially makes use of the considerable capacity of a convolutional neural network (CNN) to extract profound spatial and frequency domain characteristics from multimodal data. These features are of paramount importance for the characterisation of underwater signal propagation, particularly in complex environments. Subsequently, through the use of a bidirectional gated recurrent unit (BiGRU), the algorithm is able to effectively capture long-term dependencies in time series data. This enables a more comprehensive analysis and understanding of the changing pattern of signals over time. Furthermore, the incorporation of an attention mechanism within the algorithm enables the model to focus more on the signal features that have a significant impact on localisation, while simultaneously suppressing the interference of extraneous information. This further enhances the efficiency of identifying and utilising the key signal features. ImTSWLS is employed to resolve the position and velocity data following the acquisition of the predicted TDOA/FDOA, thereby enabling the accurate estimation of the position and velocity of the mobile radiation source. The algorithm was subjected to a series of tests in a variety of simulated underwater environments, including different sea states, target motion speeds and base station configurations. The experimental results demonstrate that the algorithm exhibits a deviation of only 2.88 m/s in velocity estimation and 2.58 m in position estimation when the noise level is 20 dB. The algorithm presented in this paper demonstrates superior performance in both position and velocity estimation compared to other algorithms.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"14 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1007/s40747-024-01592-z
Branislav Radomirovic, Nebojsa Bacanin, Luka Jovanovic, Vladimir Simic, Angelinu Njegus, Dragan Pamucar, Mario Köppen, Miodrag Zivkovic
Electroencephalography (EEG) serves as a crucial neurodiagnostic tool by recording the electrical brain activity via attached electrodes on the patient’s head. While artificial intelligence (AI) exhibited considerable promise in medical diagnostics, its potential in the realm of neurodiagnostics remains underexplored. This research addresses this gap by proposing an innovative approach employing time-series classification of EEG data, leveraging long-short-term memory (LSTM) neural networks for the identification of abnormal brain activity, particularly seizures. To enhance the performance of the proposed model, metaheuristic algorithms were employed for optimizing hyperparameter collection. Additionally, a tailored modification of the variable neighborhood search (VNS) is introduced, specifically tailored for this neurodiagnostic application. The effectiveness of this methodology is evaluated using a carefully curated dataset comprising real-world EEG recordings from both healthy individuals and those affected by epilepsy. This software-based approach demonstrates noteworthy results, showcasing its efficacy in anomaly and seizure detection, even when working with relatively modest sample sizes. This research contributes to the field by illuminating the potential of AI in neurodiagnostics, presenting a methodology that enhances accuracy in identifying abnormal brain activities, with implications for improved patient care and diagnostic precision.
{"title":"Optimizing long-short term memory neural networks for electroencephalogram anomaly detection using variable neighborhood search with dynamic strategy change","authors":"Branislav Radomirovic, Nebojsa Bacanin, Luka Jovanovic, Vladimir Simic, Angelinu Njegus, Dragan Pamucar, Mario Köppen, Miodrag Zivkovic","doi":"10.1007/s40747-024-01592-z","DOIUrl":"https://doi.org/10.1007/s40747-024-01592-z","url":null,"abstract":"<p>Electroencephalography (EEG) serves as a crucial neurodiagnostic tool by recording the electrical brain activity via attached electrodes on the patient’s head. While artificial intelligence (AI) exhibited considerable promise in medical diagnostics, its potential in the realm of neurodiagnostics remains underexplored. This research addresses this gap by proposing an innovative approach employing time-series classification of EEG data, leveraging long-short-term memory (LSTM) neural networks for the identification of abnormal brain activity, particularly seizures. To enhance the performance of the proposed model, metaheuristic algorithms were employed for optimizing hyperparameter collection. Additionally, a tailored modification of the variable neighborhood search (VNS) is introduced, specifically tailored for this neurodiagnostic application. The effectiveness of this methodology is evaluated using a carefully curated dataset comprising real-world EEG recordings from both healthy individuals and those affected by epilepsy. This software-based approach demonstrates noteworthy results, showcasing its efficacy in anomaly and seizure detection, even when working with relatively modest sample sizes. This research contributes to the field by illuminating the potential of AI in neurodiagnostics, presenting a methodology that enhances accuracy in identifying abnormal brain activities, with implications for improved patient care and diagnostic precision.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"34 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1007/s40747-024-01590-1
Jianjun Ni, Tong Shen, Yonghao Zhao, Guangyi Tang, Yang Gu
Cross-domain recommendation aims to integrate data from multiple domains and introduce information from source domains, thereby achieving good recommendations on the target domain. Recently, contrastive learning has been introduced into the cross-domain recommendations and has obtained some better results. However, most cross-domain recommendation algorithms based on contrastive learning suffer from the bias problem. In addition, the correlation between the user’s single-domain and cross-domain preferences is not considered. To address these problems, a new recommendation model is proposed for cross-domain scenarios based on intra-domain and inter-domain contrastive learning, which aims to obtain unbiased user preferences in cross-domain scenarios and improve the recommendation performance of both domains. Firstly, a network enhancement module is proposed to capture users’ complete preference by applying a graphical convolution and attentional aggregator. This module can reduce the limitations of only considering user preferences in a single domain. Then, a cross-domain infomax objective with noise contrast is presented to ensure that users’ single-domain and cross-domain preferences are correlated closely in sequential interactions. Finally, a joint training strategy is designed to improve the recommendation performances of two domains, which can achieve unbiased cross-domain recommendation results. At last, extensive experiments are conducted on two real-world cross-domain scenarios. The experimental results show that the proposed model in this paper achieves the best recommendation results in comparison with existing models.
{"title":"An improved cross-domain sequential recommendation model based on intra-domain and inter-domain contrastive learning","authors":"Jianjun Ni, Tong Shen, Yonghao Zhao, Guangyi Tang, Yang Gu","doi":"10.1007/s40747-024-01590-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01590-1","url":null,"abstract":"<p>Cross-domain recommendation aims to integrate data from multiple domains and introduce information from source domains, thereby achieving good recommendations on the target domain. Recently, contrastive learning has been introduced into the cross-domain recommendations and has obtained some better results. However, most cross-domain recommendation algorithms based on contrastive learning suffer from the bias problem. In addition, the correlation between the user’s single-domain and cross-domain preferences is not considered. To address these problems, a new recommendation model is proposed for cross-domain scenarios based on intra-domain and inter-domain contrastive learning, which aims to obtain unbiased user preferences in cross-domain scenarios and improve the recommendation performance of both domains. Firstly, a network enhancement module is proposed to capture users’ complete preference by applying a graphical convolution and attentional aggregator. This module can reduce the limitations of only considering user preferences in a single domain. Then, a cross-domain infomax objective with noise contrast is presented to ensure that users’ single-domain and cross-domain preferences are correlated closely in sequential interactions. Finally, a joint training strategy is designed to improve the recommendation performances of two domains, which can achieve unbiased cross-domain recommendation results. At last, extensive experiments are conducted on two real-world cross-domain scenarios. The experimental results show that the proposed model in this paper achieves the best recommendation results in comparison with existing models.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"152 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1007/s40747-024-01595-w
Jun Li, Yuchen Zhu, Kexue Sun
Pre-trained models based on the Transformer architecture have significantly advanced research within the domain of Natural Language Processing (NLP) due to their superior performance and extensive applicability across multiple technological sectors. Despite these advantages, there is a significant challenge in optimizing these models for more efficient deployment. To be concrete, the existing post-training pruning frameworks of transformer models suffer from inefficiencies in the crucial stage of pruning accuracy recovery, which impacts the overall pruning efficiency. To address this issue, this paper introduces a novel and efficient iteration scheme with conjugate gradient in the pruning recovery stage. By constructing a series of conjugate iterative directions, this approach ensures each optimization step is orthogonal to the previous ones, which effectively reduces redundant explorations of the search space. Consequently, each iteration progresses effectively towards the global optimum, thereby significantly enhancing search efficiency. The conjugate gradient-based faster-pruner reduces the time expenditure of the pruning process while maintaining accuracy, demonstrating a high degree of solution stability and exceptional model acceleration effects. In pruning experiments conducted on the BERTBASE and DistilBERT models, the faster-pruner exhibited outstanding performance on the GLUE benchmark dataset, achieving a reduction of up to 36.27% in pruning time and a speed increase of up to 1.45× on an RTX 3090 GPU.
{"title":"A novel iteration scheme with conjugate gradient for faster pruning on transformer models","authors":"Jun Li, Yuchen Zhu, Kexue Sun","doi":"10.1007/s40747-024-01595-w","DOIUrl":"https://doi.org/10.1007/s40747-024-01595-w","url":null,"abstract":"<p>Pre-trained models based on the Transformer architecture have significantly advanced research within the domain of Natural Language Processing (NLP) due to their superior performance and extensive applicability across multiple technological sectors. Despite these advantages, there is a significant challenge in optimizing these models for more efficient deployment. To be concrete, the existing post-training pruning frameworks of transformer models suffer from inefficiencies in the crucial stage of pruning accuracy recovery, which impacts the overall pruning efficiency. To address this issue, this paper introduces a novel and efficient iteration scheme with conjugate gradient in the pruning recovery stage. By constructing a series of conjugate iterative directions, this approach ensures each optimization step is orthogonal to the previous ones, which effectively reduces redundant explorations of the search space. Consequently, each iteration progresses effectively towards the global optimum, thereby significantly enhancing search efficiency. The conjugate gradient-based faster-pruner reduces the time expenditure of the pruning process while maintaining accuracy, demonstrating a high degree of solution stability and exceptional model acceleration effects. In pruning experiments conducted on the BERT<sub>BASE</sub> and DistilBERT models, the faster-pruner exhibited outstanding performance on the GLUE benchmark dataset, achieving a reduction of up to 36.27% in pruning time and a speed increase of up to 1.45× on an RTX 3090 GPU.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"55 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141899854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1007/s40747-024-01588-9
Zhigang Ren, Jianpu Cai, Bo Zhang, Zongze Wu
Injection molding is a pivotal industrial process renowned for its high production speed, efficiency, and automation. Controlling the motion speed of injection molding machines is a crucial factor that influences production processes, directly affecting product quality and efficiency. This paper aims to tackle the challenge of achieving optimal tracking control of injection speed in a standard class of injection molding machines (IMMs) characterized by nonlinear dynamics. To achieve this goal, we propose a learning-based model predictive control (LMPC) scheme that incorporates Gaussian process regression (GPR) to predict and model uncertainty in the injection molding process (IMP). Specifically, the scheme formulates a nonlinear tracking control problem for injection speed, utilizing a GPR-based learning residual model to capture uncertainty and provide accurate predictions. It learns the dynamics model and historical data of the IMM, automatically adjusting the injection speed according to target requirements for optimal production control. Additionally, the optimization problem is efficiently solved using a control-constrained differential dynamic programming approach. Finally, we conduct comprehensive numerical experiments to demonstrate the effectiveness and efficiency of the proposed LMPC scheme for controlling injection speed in IMP.
{"title":"A learning-based model predictive control scheme for injection speed tracking in injection molding process","authors":"Zhigang Ren, Jianpu Cai, Bo Zhang, Zongze Wu","doi":"10.1007/s40747-024-01588-9","DOIUrl":"https://doi.org/10.1007/s40747-024-01588-9","url":null,"abstract":"<p>Injection molding is a pivotal industrial process renowned for its high production speed, efficiency, and automation. Controlling the motion speed of injection molding machines is a crucial factor that influences production processes, directly affecting product quality and efficiency. This paper aims to tackle the challenge of achieving optimal tracking control of injection speed in a standard class of injection molding machines (IMMs) characterized by nonlinear dynamics. To achieve this goal, we propose a learning-based model predictive control (LMPC) scheme that incorporates Gaussian process regression (GPR) to predict and model uncertainty in the injection molding process (IMP). Specifically, the scheme formulates a nonlinear tracking control problem for injection speed, utilizing a GPR-based learning residual model to capture uncertainty and provide accurate predictions. It learns the dynamics model and historical data of the IMM, automatically adjusting the injection speed according to target requirements for optimal production control. Additionally, the optimization problem is efficiently solved using a control-constrained differential dynamic programming approach. Finally, we conduct comprehensive numerical experiments to demonstrate the effectiveness and efficiency of the proposed LMPC scheme for controlling injection speed in IMP.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"33 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1007/s40747-024-01581-2
Weilong Lv, Wei Zhou, Gang Wang
Loop closure detection is a key technology for robotic navigation. Existing research primarily focuses on feature extraction from global scenes but often neglects local overhead occlusion scenes. In these local scenes, objects such as vehicles, trees, and buildings vary in height, creating a complex multi-layered structure with vertical occlusions. Current methods predominantly employ a single-level extraction strategy to construct descriptors, which fails to capture the characteristics of occluded objects. This limitation results in descriptors with restricted descriptive capabilities. This paper introduces a descriptor named Hierarchy Scan Context (HSC) to address this shortfall. HSC effectively extracts height feature information of objects at different levels in overhead occlusion scenes through hierarchical division, demonstrating enhanced descriptive capabilities. Additionally, a time series enhancement strategy is proposed to reduce the number of algorithmic missed detections. In the experiments, the proposed method is validated using a self-collected dataset and the public KITTI and NCLT datasets, demonstrating superior performance compared to competitive methods. Furthermore, the proposed method also achieves an average maximum F1 score of 0.92 in experiments conducted on nine selected road segments with overhead occlusion.
环路闭合检测是机器人导航的一项关键技术。现有研究主要关注全局场景的特征提取,但往往忽视局部高空遮挡场景。在这些局部场景中,车辆、树木和建筑物等物体的高度各不相同,形成了复杂的多层结构和垂直遮挡。目前的方法主要采用单层提取策略来构建描述符,这种方法无法捕捉到遮挡物体的特征。这种局限性导致描述符的描述能力受到限制。本文引入了一种名为 "层次扫描上下文"(HSC)的描述符来解决这一不足。HSC 通过层次划分,有效地提取了高空遮挡场景中不同层次物体的高度特征信息,显示出更强的描述能力。此外,还提出了一种时间序列增强策略,以减少算法漏检的次数。在实验中,使用自收集的数据集以及公开的 KITTI 和 NCLT 数据集对所提出的方法进行了验证,结果表明该方法的性能优于其他竞争方法。此外,在对九个有高空遮挡的选定路段进行的实验中,所提方法的平均最高 F1 分数也达到了 0.92。
{"title":"HSC: a multi-hierarchy descriptor for loop closure detection in overhead occlusion scenes","authors":"Weilong Lv, Wei Zhou, Gang Wang","doi":"10.1007/s40747-024-01581-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01581-2","url":null,"abstract":"<p>Loop closure detection is a key technology for robotic navigation. Existing research primarily focuses on feature extraction from global scenes but often neglects local overhead occlusion scenes. In these local scenes, objects such as vehicles, trees, and buildings vary in height, creating a complex multi-layered structure with vertical occlusions. Current methods predominantly employ a single-level extraction strategy to construct descriptors, which fails to capture the characteristics of occluded objects. This limitation results in descriptors with restricted descriptive capabilities. This paper introduces a descriptor named Hierarchy Scan Context (HSC) to address this shortfall. HSC effectively extracts height feature information of objects at different levels in overhead occlusion scenes through hierarchical division, demonstrating enhanced descriptive capabilities. Additionally, a time series enhancement strategy is proposed to reduce the number of algorithmic missed detections. In the experiments, the proposed method is validated using a self-collected dataset and the public KITTI and NCLT datasets, demonstrating superior performance compared to competitive methods. Furthermore, the proposed method also achieves an average maximum F1 score of 0.92 in experiments conducted on nine selected road segments with overhead occlusion.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"100 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1007/s40747-024-01582-1
Wenyi Chen, Zongcheng Miao, Yang Qu, Guokai Shi
Semantic segmentation of urban street scenes has attracted much attention in the field of autonomous driving, which not only helps vehicles perceive the environment in real time, but also significantly improves the decision-making ability of autonomous driving systems. However, most of the current methods based on Convolutional Neural Network (CNN) mainly use coding the input image to a low resolution and then try to recover the high resolution, which leads to problems such as loss of spatial information, accumulation of errors, and difficulty in dealing with large-scale changes. To address these problems, in this paper, we propose a new semantic segmentation network (HRDLNet) for urban street scene images with high-resolution representation, which improves the accuracy of segmentation by always maintaining a high-resolution representation of the image. Specifically, we propose a feature extraction module (FHR) with high-resolution representation, which efficiently handles multi-scale targets and high-resolution image information by efficiently fusing high-resolution information and multi-scale features. Secondly, we design a multi-scale feature extraction enhancement (MFE) module, which significantly expands the sensory field of the network, thus enhancing the ability to capture correlations between image details and global contextual information. In addition, we introduce a dual-attention mechanism module (CSD), which dynamically adjusts the network to more accurately capture subtle features and rich semantic information in images. We trained and evaluated HRDLNet on the Cityscapes Dataset and the PASCAL VOC 2012 Augmented Dataset, and verified the model’s excellent performance in the field of urban streetscape image segmentation. The unique advantages of our proposed HRDLNet in the field of semantic segmentation of urban streetscapes are also verified by comparing it with the state-of-the-art methods.
{"title":"HRDLNet: a semantic segmentation network with high resolution representation for urban street view images","authors":"Wenyi Chen, Zongcheng Miao, Yang Qu, Guokai Shi","doi":"10.1007/s40747-024-01582-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01582-1","url":null,"abstract":"<p>Semantic segmentation of urban street scenes has attracted much attention in the field of autonomous driving, which not only helps vehicles perceive the environment in real time, but also significantly improves the decision-making ability of autonomous driving systems. However, most of the current methods based on Convolutional Neural Network (CNN) mainly use coding the input image to a low resolution and then try to recover the high resolution, which leads to problems such as loss of spatial information, accumulation of errors, and difficulty in dealing with large-scale changes. To address these problems, in this paper, we propose a new semantic segmentation network (HRDLNet) for urban street scene images with high-resolution representation, which improves the accuracy of segmentation by always maintaining a high-resolution representation of the image. Specifically, we propose a feature extraction module (FHR) with high-resolution representation, which efficiently handles multi-scale targets and high-resolution image information by efficiently fusing high-resolution information and multi-scale features. Secondly, we design a multi-scale feature extraction enhancement (MFE) module, which significantly expands the sensory field of the network, thus enhancing the ability to capture correlations between image details and global contextual information. In addition, we introduce a dual-attention mechanism module (CSD), which dynamically adjusts the network to more accurately capture subtle features and rich semantic information in images. We trained and evaluated HRDLNet on the Cityscapes Dataset and the PASCAL VOC 2012 Augmented Dataset, and verified the model’s excellent performance in the field of urban streetscape image segmentation. The unique advantages of our proposed HRDLNet in the field of semantic segmentation of urban streetscapes are also verified by comparing it with the state-of-the-art methods.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"44 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Person Re-identification (Re-ID) technology aims to solve the matching problem of the same pedestrians at different times and places, which has important application value in the field of public safety. At present, most scholars focus on designing complex models to improve the accuracy of Re-ID, but the high complexity of the model further restricts the practical application of Re-ID algorithm. To solve the above problems, this paper designs a lightweight Self-selective Receptive Field (SRF) block instead of directly designing complex models. Specifically, the module can be plug-and-play on the general backbone network, so as to significantly improve the performance of Re-ID while effectively controlling the amount of its own parameter and calculation: (1) the SRF block encodes pedestrian targets and image contexts at different scales by constructing pyramidal convolution group and allows the module to independently select the size of the receptive field through training by means of self-adaptive weighting; (2) in order to reduce the complexity of SRF block, we introduce a "channel scaling factor" and design a "grouped convolution operation" by constraining the channels of the feature map and changing the structure of the convolution kernel respectively. Experiments on multiple datasets show that SRF Network (SRFNet) for Re-ID can achieve a good balance between performance and complexity, which fully demonstrates the effectiveness of SRF block.
{"title":"Self-selective receptive field network for person re-identification","authors":"Shaoqi Hou, Xueting liu, Chenyu Wu, Guangqiang Yin, Xinzhong Wang, Zhiguo Wang","doi":"10.1007/s40747-024-01565-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01565-2","url":null,"abstract":"<p>Person Re-identification (Re-ID) technology aims to solve the matching problem of the same pedestrians at different times and places, which has important application value in the field of public safety. At present, most scholars focus on designing complex models to improve the accuracy of Re-ID, but the high complexity of the model further restricts the practical application of Re-ID algorithm. To solve the above problems, this paper designs a lightweight Self-selective Receptive Field (SRF) block instead of directly designing complex models. Specifically, the module can be plug-and-play on the general backbone network, so as to significantly improve the performance of Re-ID while effectively controlling the amount of its own parameter and calculation: (1) the SRF block encodes pedestrian targets and image contexts at different scales by constructing pyramidal convolution group and allows the module to independently select the size of the receptive field through training by means of self-adaptive weighting; (2) in order to reduce the complexity of SRF block, we introduce a \"channel scaling factor\" and design a \"grouped convolution operation\" by constraining the channels of the feature map and changing the structure of the convolution kernel respectively. Experiments on multiple datasets show that SRF Network (SRFNet) for Re-ID can achieve a good balance between performance and complexity, which fully demonstrates the effectiveness of SRF block.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"22 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1007/s40747-024-01566-1
Xianjie Zhang, Min Li, Yujie He, Yao Gou, Yusen Zhang
Score-based diffusion models have shown promising results in unpaired image-to-image translation (I2I). However, the existing methods only perform unpaired I2I in pixel space, which requires high computation costs. To this end, we propose guiding stochastic differential equations in latent space (Latent-SDE) that extracts domain-specific and domain-independent features of the image in the latent space to calculate the loss and guides the inference process of a pretrained SDE in the latent space for unpaired I2I. To refine the image in the latent space, we propose a latent time-travel strategy that increases the sampling timestep. Empirically, we compare Latent-SDE to the baseline of the score-based diffusion model on three widely adopted unpaired I2I tasks under two metrics. Latent-SDE achieves state-of-the-art on Cat (rightarrow ) Dog and is competitive on the other two tasks. Our code will be freely available for public use upon acceptance at https://github.com/zhangXJ147/Latent-SDE.
{"title":"Latent-SDE: guiding stochastic differential equations in latent space for unpaired image-to-image translation","authors":"Xianjie Zhang, Min Li, Yujie He, Yao Gou, Yusen Zhang","doi":"10.1007/s40747-024-01566-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01566-1","url":null,"abstract":"<p>Score-based diffusion models have shown promising results in unpaired image-to-image translation (I2I). However, the existing methods only perform unpaired I2I in pixel space, which requires high computation costs. To this end, we propose guiding stochastic differential equations in latent space (Latent-SDE) that extracts domain-specific and domain-independent features of the image in the latent space to calculate the loss and guides the inference process of a pretrained SDE in the latent space for unpaired I2I. To refine the image in the latent space, we propose a latent time-travel strategy that increases the sampling timestep. Empirically, we compare Latent-SDE to the baseline of the score-based diffusion model on three widely adopted unpaired I2I tasks under two metrics. Latent-SDE achieves state-of-the-art on Cat <span>(rightarrow )</span> Dog and is competitive on the other two tasks. Our code will be freely available for public use upon acceptance at https://github.com/zhangXJ147/Latent-SDE.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"356 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}