Pub Date : 2024-07-31DOI: 10.1007/s40747-024-01564-3
Wenwen Ye, Jia Cai, Shengping Li
Target search using a swarm of robots is a classic research topic that poses challenges, particularly in conducting multi-target searching in unknown environments. Key challenges include high communication cost among robots, unknown positions of obstacles, and the presence of multiple targets. To address these challenges, we propose a novel Robotic Flow Direction Algorithm (RFDA), building upon the modified Flow Direction Algorithm (FDA) to suit the characteristics of the robot’s motion. RFDA efficiently reduces the communication cost and navigates around unknown obstacles. The algorithm also accounts for scenarios involving isolated robots. The pipeline of the proposed RFDA method is outlined as follows: (1). Learning strategy: a neighborhood information based learning strategy is adopted to enhance the FDA’s position update formula. This allows swarm robots to systematically locate the target (the lowest height) in a stepwise manner. (2). Adaptive inertia weighting: An adaptive inertia weighting mechanism is employed to maintain diversity among robots during the search and avoid premature convergence. (3). Sink-filling process: The algorithm simulates the sink-filling process and moving to the aspect slope to escape from local optima. (4). Isolated robot scenario: The case of an isolated robot (a robot without neighbors) is considered. Global optimal information is only required when the robot is isolated or undergoing the sink-filling process, thereby reducing communication costs. We not only demonstrate the probabilistic completeness of RFDA but also validate its effectiveness by comparing it with six other competing algorithms in a simulated environment. Experiments cover various aspects such as target number, population size, and environment size. Our findings indicate that RFDA outperforms other methods in terms of the number of required iterations and the full success rate. The Friedman and Wilcoxon tests further demonstrate the superiority of RFDA.
{"title":"A FDA-based multi-robot cooperation algorithm for multi-target searching in unknown environments","authors":"Wenwen Ye, Jia Cai, Shengping Li","doi":"10.1007/s40747-024-01564-3","DOIUrl":"https://doi.org/10.1007/s40747-024-01564-3","url":null,"abstract":"<p>Target search using a swarm of robots is a classic research topic that poses challenges, particularly in conducting multi-target searching in unknown environments. Key challenges include high communication cost among robots, unknown positions of obstacles, and the presence of multiple targets. To address these challenges, we propose a novel <b>R</b>obotic <b>F</b>low <b>D</b>irection <b>A</b>lgorithm (RFDA), building upon the modified Flow Direction Algorithm (FDA) to suit the characteristics of the robot’s motion. RFDA efficiently reduces the communication cost and navigates around unknown obstacles. The algorithm also accounts for scenarios involving isolated robots. The pipeline of the proposed RFDA method is outlined as follows: (1). <b>Learning strategy</b>: a neighborhood information based learning strategy is adopted to enhance the FDA’s position update formula. This allows swarm robots to systematically locate the target (the lowest height) in a stepwise manner. (2). <b>Adaptive inertia weighting</b>: An adaptive inertia weighting mechanism is employed to maintain diversity among robots during the search and avoid premature convergence. (3). <b>Sink-filling process</b>: The algorithm simulates the sink-filling process and moving to the aspect slope to escape from local optima. (4). <b>Isolated robot scenario</b>: The case of an isolated robot (a robot without neighbors) is considered. Global optimal information is only required when the robot is isolated or undergoing the sink-filling process, thereby reducing communication costs. We not only demonstrate the probabilistic completeness of RFDA but also validate its effectiveness by comparing it with six other competing algorithms in a simulated environment. Experiments cover various aspects such as target number, population size, and environment size. Our findings indicate that RFDA outperforms other methods in terms of the number of required iterations and the full success rate. The Friedman and Wilcoxon tests further demonstrate the superiority of RFDA.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1007/s40747-024-01569-y
Yongchuan Tang, Rongfei Li, He Guan, Deyun Zhou, Yubo Huang
Negation provides a novel perspective for the representation of information. However, current research seldom addresses the issue of negation within the random permutation set theory. Based on the concept of belief reassignment, this paper proposes a method for obtaining the negation of permutation mass function in the of random set theory. The convergence of proposed negation is verified, the trends of uncertainty and dissimilarity after each negation operation are investigated. Furthermore, this paper introduces a negation-based uncertainty measure, and designs a multi-source information fusion approach based on the proposed measure. Numerical examples are used to verify the rationality of proposed method.
{"title":"Negation of permutation mass function in random permutation sets theory for uncertain information modeling","authors":"Yongchuan Tang, Rongfei Li, He Guan, Deyun Zhou, Yubo Huang","doi":"10.1007/s40747-024-01569-y","DOIUrl":"https://doi.org/10.1007/s40747-024-01569-y","url":null,"abstract":"<p>Negation provides a novel perspective for the representation of information. However, current research seldom addresses the issue of negation within the random permutation set theory. Based on the concept of belief reassignment, this paper proposes a method for obtaining the negation of permutation mass function in the of random set theory. The convergence of proposed negation is verified, the trends of uncertainty and dissimilarity after each negation operation are investigated. Furthermore, this paper introduces a negation-based uncertainty measure, and designs a multi-source information fusion approach based on the proposed measure. Numerical examples are used to verify the rationality of proposed method.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141836818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-27DOI: 10.1007/s40747-024-01562-5
Chaoran Li, Xiyin Wu, Pai Peng, Zhuhong Zhang, Xiaohuan Lu
Recent advances in multi-view multi-label learning are often hampered by the prevalent challenges of incomplete views and missing labels, common in real-world data due to uncertainties in data collection and manual annotation. These challenges restrict the capacity of the model to fully utilize the diverse semantic information of each sample, posing significant barriers to effective learning. Despite substantial scholarly efforts, many existing methods inadequately capture the depth of semantic information, focusing primarily on shallow feature extractions that fail to maintain semantic consistency. To address these shortcomings, we propose a novel Deep semantic structure-preserving (SSP) model that effectively tackles both incomplete views and missing labels. SSP innovatively incorporates a graph constraint learning (GCL) scheme to ensure the preservation of semantic structure throughout the feature extraction process across different views. Additionally, the SSP integrates a pseudo-labeling self-paced learning (PSL) strategy to address the often-overlooked issue of missing labels, enhancing the classification accuracy while preserving the distribution structure of data. The SSP model creates a unified framework that synergistically employs GCL and PSL to maintain the integrity of semantic structural information during both feature extraction and classification phases. Extensive evaluations across five real datasets demonstrate that the SSP method outperforms existing approaches, including lrMMC, MVL-IV, MvEL, iMSF, iMvWL, NAIML, and DD-IMvMLC-net. It effectively mitigates the impacts of data incompleteness and enhances semantic representation fidelity.
{"title":"Incomplete multi-view partial multi-label classification via deep semantic structure preservation","authors":"Chaoran Li, Xiyin Wu, Pai Peng, Zhuhong Zhang, Xiaohuan Lu","doi":"10.1007/s40747-024-01562-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01562-5","url":null,"abstract":"<p>Recent advances in multi-view multi-label learning are often hampered by the prevalent challenges of incomplete views and missing labels, common in real-world data due to uncertainties in data collection and manual annotation. These challenges restrict the capacity of the model to fully utilize the diverse semantic information of each sample, posing significant barriers to effective learning. Despite substantial scholarly efforts, many existing methods inadequately capture the depth of semantic information, focusing primarily on shallow feature extractions that fail to maintain semantic consistency. To address these shortcomings, we propose a novel Deep semantic structure-preserving (SSP) model that effectively tackles both incomplete views and missing labels. SSP innovatively incorporates a graph constraint learning (GCL) scheme to ensure the preservation of semantic structure throughout the feature extraction process across different views. Additionally, the SSP integrates a pseudo-labeling self-paced learning (PSL) strategy to address the often-overlooked issue of missing labels, enhancing the classification accuracy while preserving the distribution structure of data. The SSP model creates a unified framework that synergistically employs GCL and PSL to maintain the integrity of semantic structural information during both feature extraction and classification phases. Extensive evaluations across five real datasets demonstrate that the SSP method outperforms existing approaches, including lrMMC, MVL-IV, MvEL, iMSF, iMvWL, NAIML, and DD-IMvMLC-net. It effectively mitigates the impacts of data incompleteness and enhances semantic representation fidelity.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-27DOI: 10.1007/s40747-024-01553-6
Ning Liu, Shangkun Liu, Wei-Min Zheng
The security of wireless sensor networks is a hot topic in current research. Game theory can provide the optimal selection strategy for attackers and defenders in the attack-defense confrontation. Aiming at the problem of poor generality of previous game models, we propose a generalized Bayesian game model to analyze the intrusion detection of nodes in wireless sensor networks. Because it is difficult to solve the Nash equilibrium of the Bayesian game by the traditional method, a parallel particle swarm optimization is proposed to solve the Nash equilibrium of the Bayesian game and analyze the optimal action of the defender. The simulation results show the superiority of the parallel particle swarm optimization compared with other heuristic algorithms. This algorithm is proved to be effective in finding optimal defense strategy. The influence of the detection rate and false alarm rate of nodes on the profit of defender is analyzed by simulation experiments. Simulation experiments show that the profit of defender decreases as false alarm rate increases and decreases as detection rate decreases. Using heuristic algorithm to solve Nash equilibrium of Bayesian game provides a new method for the research of attack-defense confrontation. Predicting the actions of attacker and defender through the game model can provide ideas for the defender to take active defense.
{"title":"PPSO and Bayesian game for intrusion detection in WSN from a macro perspective","authors":"Ning Liu, Shangkun Liu, Wei-Min Zheng","doi":"10.1007/s40747-024-01553-6","DOIUrl":"https://doi.org/10.1007/s40747-024-01553-6","url":null,"abstract":"<p>The security of wireless sensor networks is a hot topic in current research. Game theory can provide the optimal selection strategy for attackers and defenders in the attack-defense confrontation. Aiming at the problem of poor generality of previous game models, we propose a generalized Bayesian game model to analyze the intrusion detection of nodes in wireless sensor networks. Because it is difficult to solve the Nash equilibrium of the Bayesian game by the traditional method, a parallel particle swarm optimization is proposed to solve the Nash equilibrium of the Bayesian game and analyze the optimal action of the defender. The simulation results show the superiority of the parallel particle swarm optimization compared with other heuristic algorithms. This algorithm is proved to be effective in finding optimal defense strategy. The influence of the detection rate and false alarm rate of nodes on the profit of defender is analyzed by simulation experiments. Simulation experiments show that the profit of defender decreases as false alarm rate increases and decreases as detection rate decreases. Using heuristic algorithm to solve Nash equilibrium of Bayesian game provides a new method for the research of attack-defense confrontation. Predicting the actions of attacker and defender through the game model can provide ideas for the defender to take active defense.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-27DOI: 10.1007/s40747-024-01567-0
Peicheng Shi, Zhiqiang Liu, Xinlong Dong, Aixi Yang
In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.
{"title":"CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View","authors":"Peicheng Shi, Zhiqiang Liu, Xinlong Dong, Aixi Yang","doi":"10.1007/s40747-024-01567-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01567-0","url":null,"abstract":"<p>In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, recommendation explanation methods have received widespread attention due to their potentials to enhance user experience and streamline transactions. In scenarios where auxiliary information such as text and attributes are lacking, counterfactual explanation has emerged as a crucial technique for explaining recommendations. However, existing counterfactual explanation methods encounter two primary challenges. First, a substantial bias indeed exists in the calculation of the group impact function, leading to the inaccurate predictions as the counterfactual explanation group expands. In addition, the importance of collaborative filtering as a counterfactual explanation is overlooked, which results in lengthy, narrow, and inaccurate explanations. To address such issues, we propose a counterfactual explanation method based on Modified Group Influence Function for recommendation. In particular, via a rigorous formula derivation, we demonstrate that a simple summation of individual influence functions cannot reflect the group impact in recommendations. After that, building upon the improved influence function, we construct the counterfactual groups by iteratively incorporating the individuals from the training samples, which possess the greatest influence on the recommended results, and continuously adjusting the parameters to ensure accuracy. Finally, we expand the scope of searching for counterfactual groups by incorporating the collaborative filtering information from different users. To evaluate the effectiveness of our method, we employ it to explain the recommendations generated by two common recommendation models, i.e., Matrix Factorization and Neural Collaborative Filtering, on two publicly available datasets. The evaluation of the proposed counterfactual explanation method showcases its superior performance in providing counterfactual explanations. In the most significant case, our proposed method achieves a 17% lead in terms of Counterfactual precision compared to the best baseline explanation method.
{"title":"A counterfactual explanation method based on modified group influence function for recommendation","authors":"Yupu Guo, Fei Cai, Zhiqiang Pan, Taihua Shao, Honghui Chen, Xin Zhang","doi":"10.1007/s40747-024-01547-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01547-4","url":null,"abstract":"<p>In recent years, recommendation explanation methods have received widespread attention due to their potentials to enhance user experience and streamline transactions. In scenarios where auxiliary information such as text and attributes are lacking, counterfactual explanation has emerged as a crucial technique for explaining recommendations. However, existing counterfactual explanation methods encounter two primary challenges. First, a substantial bias indeed exists in the calculation of the group impact function, leading to the inaccurate predictions as the counterfactual explanation group expands. In addition, the importance of collaborative filtering as a counterfactual explanation is overlooked, which results in lengthy, narrow, and inaccurate explanations. To address such issues, we propose a counterfactual explanation method based on Modified Group Influence Function for recommendation. In particular, via a rigorous formula derivation, we demonstrate that a simple summation of individual influence functions cannot reflect the group impact in recommendations. After that, building upon the improved influence function, we construct the counterfactual groups by iteratively incorporating the individuals from the training samples, which possess the greatest influence on the recommended results, and continuously adjusting the parameters to ensure accuracy. Finally, we expand the scope of searching for counterfactual groups by incorporating the collaborative filtering information from different users. To evaluate the effectiveness of our method, we employ it to explain the recommendations generated by two common recommendation models, i.e., Matrix Factorization and Neural Collaborative Filtering, on two publicly available datasets. The evaluation of the proposed counterfactual explanation method showcases its superior performance in providing counterfactual explanations. In the most significant case, our proposed method achieves a 17% lead in terms of Counterfactual precision compared to the best baseline explanation method.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1007/s40747-024-01543-8
Qingyang Chen, Zhengping Qiang, Yue Zhao, Hong Lin, Libo He, Fei Dai
The majority of existing face inpainting methods primarily focus on generating a single result that visually resembles the original image. The generation of diverse and plausible results has emerged as a new branch in image restoration, often referred to as “Pluralistic Image Completion”. However, most diversity methods simply use random latent vectors to generate multiple results, leading to uncontrollable outcomes. To overcome these limitations, we introduce a novel architecture known as the Reference-Guided Directional Diverse Face Inpainting Network. In this paper, instead of using a background image as reference, which is typically used in image restoration, we have used a face image, which can have many different characteristics from the original image, including but not limited to gender and age, to serve as a reference face style. Our network firstly infers the semantic information of the masked face, i.e., the face parsing map, based on the partial image and its mask, which subsequently guides and constrains directional diverse generator network. The network will learn the distribution of face images from different domains in a low-dimensional manifold space. To validate our method, we conducted extensive experiments on the CelebAMask-HQ dataset. Our method not only produces high-quality oriented diverse results but also complements the images with the style of the reference face image. Additionally, our diverse results maintain correct facial feature distribution and sizes, rather than being random. Our network has achieved SOTA results in face diverse inpainting when writing. Code will is available at https://github.com/nothingwithyou/RDFINet.
现有的大多数人脸涂色方法主要侧重于生成视觉上与原始图像相似的单一结果。生成多样且可信的结果已成为图像修复的一个新分支,通常被称为 "多元图像补全"。然而,大多数多样性方法只是简单地使用随机潜向量来生成多种结果,导致结果不可控。为了克服这些局限性,我们引入了一种新颖的架构,即 "参考引导的定向多样性人脸涂色网络"。在本文中,我们没有使用通常在图像修复中使用的背景图像作为参考,而是使用了人脸图像作为参考人脸样式,人脸图像可以有许多不同于原始图像的特征,包括但不限于性别和年龄。我们的网络首先根据局部图像及其遮罩推断出被遮罩人脸的语义信息,即人脸解析图,然后对定向多样化生成器网络进行指导和约束。该网络将学习来自不同领域的人脸图像在低维流形空间中的分布。为了验证我们的方法,我们在 CelebAMask-HQ 数据集上进行了大量实验。我们的方法不仅能生成高质量的面向多样化的结果,还能根据参考人脸图像的风格对图像进行补充。此外,我们的多样化结果保持了正确的面部特征分布和大小,而不是随机的。我们的网络在编写人脸多样化内绘时取得了 SOTA 结果。代码可在 https://github.com/nothingwithyou/RDFINet 上获取。
{"title":"Rdfinet: reference-guided directional diverse face inpainting network","authors":"Qingyang Chen, Zhengping Qiang, Yue Zhao, Hong Lin, Libo He, Fei Dai","doi":"10.1007/s40747-024-01543-8","DOIUrl":"https://doi.org/10.1007/s40747-024-01543-8","url":null,"abstract":"<p>The majority of existing face inpainting methods primarily focus on generating a single result that visually resembles the original image. The generation of diverse and plausible results has emerged as a new branch in image restoration, often referred to as “Pluralistic Image Completion”. However, most diversity methods simply use random latent vectors to generate multiple results, leading to uncontrollable outcomes. To overcome these limitations, we introduce a novel architecture known as the Reference-Guided Directional Diverse Face Inpainting Network. In this paper, instead of using a background image as reference, which is typically used in image restoration, we have used a face image, which can have many different characteristics from the original image, including but not limited to gender and age, to serve as a reference face style. Our network firstly infers the semantic information of the masked face, i.e., the face parsing map, based on the partial image and its mask, which subsequently guides and constrains directional diverse generator network. The network will learn the distribution of face images from different domains in a low-dimensional manifold space. To validate our method, we conducted extensive experiments on the CelebAMask-HQ dataset. Our method not only produces high-quality oriented diverse results but also complements the images with the style of the reference face image. Additionally, our diverse results maintain correct facial feature distribution and sizes, rather than being random. Our network has achieved SOTA results in face diverse inpainting when writing. Code will is available at https://github.com/nothingwithyou/RDFINet.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141764061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1007/s40747-024-01546-5
Xinpan Yuan, Shaojun Xie, Zhigao Zeng, Changyun Li, Luda Wang
Humans excel at learning and recognizing objects, swiftly adapting to new concepts with just a few samples. However, current studies in computer vision on few-shot learning have not yet achieved human performance in integrating prior knowledge during the learning process. Humans utilize a hierarchical structure of object categories based on past experiences to facilitate learning and classification. Therefore, we propose a method named n-Hierarchy SEmantic Guided Attention (nHi-SEGA) that acquires abstract superclasses. This allows the model to associate with and pay attention to different levels of objects utilizing semantics and visual features embedded in the class hierarchy (e.g., house finch-bird-animal, goldfish-fish-animal, rose-flower-plant), resembling human cognition. We constructed an nHi-Tree using WordNet and Glove tools and devised two methods to extract hierarchical semantic features, which were then fused with visual features to improve sample feature prototypes.
{"title":"nHi-SEGA: n-Hierarchy SEmantic Guided Attention for few-shot learning","authors":"Xinpan Yuan, Shaojun Xie, Zhigao Zeng, Changyun Li, Luda Wang","doi":"10.1007/s40747-024-01546-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01546-5","url":null,"abstract":"<p>Humans excel at learning and recognizing objects, swiftly adapting to new concepts with just a few samples. However, current studies in computer vision on few-shot learning have not yet achieved human performance in integrating prior knowledge during the learning process. Humans utilize a hierarchical structure of object categories based on past experiences to facilitate learning and classification. Therefore, we propose a method named n-Hierarchy SEmantic Guided Attention (nHi-SEGA) that acquires abstract superclasses. This allows the model to associate with and pay attention to different levels of objects utilizing semantics and visual features embedded in the class hierarchy (e.g., house finch-bird-animal, goldfish-fish-animal, rose-flower-plant), resembling human cognition. We constructed an nHi-Tree using WordNet and Glove tools and devised two methods to extract hierarchical semantic features, which were then fused with visual features to improve sample feature prototypes.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141755094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1007/s40747-024-01554-5
Liying Zhu, Sen Wang, Mingfang Chen, Aiping Shen, Xuangang Li
High-quality printed circuit boards (PCBs) are essential components in modern electronic circuits. Nevertheless, most of the existing methods for PCB surface defect detection neglect the fact that PCB surface defects in complex backgrounds are prone to long-tailed data distributions, which in turn affects the effectiveness of defect detection. Additionally, most of the existing methods ignore the intra-scale features of defects and do not utilize auxiliary supervision strategies to improve the detection performance of the network. To tackle these issues, we propose a lightweight long-tailed data mining network (LLM-Net) for identifying PCB surface defects. Firstly, the proposed Efficient Feature Fusion Network (EFFNet) is applied to embed intra-scale feature associations and multi-scale features of defects into LLM-Net. Next, an auxiliary supervision method with a soft label assignment strategy is designed to help LLM-Net learn more accurate defect features. Finally, the issue of inadequate tail data detection is addressed by employing the devised Binary Cross-Entropy Loss Rank Mining method (BCE-LRM) to identify challenging samples. The performance of LLM-Net was evaluated on a homemade dataset of PCB surface soldering defects, and the results show that LLM-Net achieves the best accuracy of mAP@0.5 for the evaluation metric of the COCO dataset, and it has a real-time inference speed of 188 frames per second (FPS).
{"title":"Incorporating long-tail data in complex backgrounds for visual surface defect detection in PCBs","authors":"Liying Zhu, Sen Wang, Mingfang Chen, Aiping Shen, Xuangang Li","doi":"10.1007/s40747-024-01554-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01554-5","url":null,"abstract":"<p>High-quality printed circuit boards (PCBs) are essential components in modern electronic circuits. Nevertheless, most of the existing methods for PCB surface defect detection neglect the fact that PCB surface defects in complex backgrounds are prone to long-tailed data distributions, which in turn affects the effectiveness of defect detection. Additionally, most of the existing methods ignore the intra-scale features of defects and do not utilize auxiliary supervision strategies to improve the detection performance of the network. To tackle these issues, we propose a lightweight long-tailed data mining network (LLM-Net) for identifying PCB surface defects. Firstly, the proposed Efficient Feature Fusion Network (EFFNet) is applied to embed intra-scale feature associations and multi-scale features of defects into LLM-Net. Next, an auxiliary supervision method with a soft label assignment strategy is designed to help LLM-Net learn more accurate defect features. Finally, the issue of inadequate tail data detection is addressed by employing the devised Binary Cross-Entropy Loss Rank Mining method (BCE-LRM) to identify challenging samples. The performance of LLM-Net was evaluated on a homemade dataset of PCB surface soldering defects, and the results show that LLM-Net achieves the best accuracy of mAP@0.5 for the evaluation metric of the COCO dataset, and it has a real-time inference speed of 188 frames per second (FPS).</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141755098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1007/s40747-024-01555-4
Heba M. Afify, Kamel K. Mohammed, Aboul Ella Hassanien
Leveraging deep learning (DL) approaches in genomics data has led to significant advances in cancer prediction. The continuous availability of gene expression datasets over the preceding years has made them one of the most accessible sources of genome-wide data, advancing cancer bioinformatics research and advanced prediction of cancer genomic data. To contribute to this topic, the proposed work is based on DL prediction in both convolutional neural network (CNN) and recurrent neural network (RNN) for five classes in brain cancer using gene expression data obtained from Curated Microarray Database (CuMiDa). This database is used for cancer classification and is publicly accessible on the official CuMiDa website. This paper implemented DL approaches using a One Dimensional-Convolutional Neural Network (1D-CNN) followed by an RNN classifier with and without Bayesian hyperparameter optimization (BO). The accuracy of this hybrid model combination of (BO + 1D-CNN + RNN) produced the highest classification accuracy of 100% instead of the 95% for the ML model in prior work and 90% for the (1D-CNN + RNN) algorithm considered in the paper. Therefore, the classification of brain cancer gene expression according to the hybrid model (BO + 1D-CNN + RNN) provides more accurate and useful assessments for patients with different types of brain cancers. Thus, gene expression data are used to create a DL classification-based- hybrid model that will hold senior promise in the treatment of brain cancer.
{"title":"Leveraging hybrid 1D-CNN and RNN approach for classification of brain cancer gene expression","authors":"Heba M. Afify, Kamel K. Mohammed, Aboul Ella Hassanien","doi":"10.1007/s40747-024-01555-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01555-4","url":null,"abstract":"<p>Leveraging deep learning (DL) approaches in genomics data has led to significant advances in cancer prediction. The continuous availability of gene expression datasets over the preceding years has made them one of the most accessible sources of genome-wide data, advancing cancer bioinformatics research and advanced prediction of cancer genomic data. To contribute to this topic, the proposed work is based on DL prediction in both convolutional neural network (CNN) and recurrent neural network (RNN) for five classes in brain cancer using gene expression data obtained from Curated Microarray Database (CuMiDa). This database is used for cancer classification and is publicly accessible on the official CuMiDa website. This paper implemented DL approaches using a One Dimensional-Convolutional Neural Network (1D-CNN) followed by an RNN classifier with and without Bayesian hyperparameter optimization (BO). The accuracy of this hybrid model combination of (BO + 1D-CNN + RNN) produced the highest classification accuracy of 100% instead of the 95% for the ML model in prior work and 90% for the (1D-CNN + RNN) algorithm considered in the paper. Therefore, the classification of brain cancer gene expression according to the hybrid model (BO + 1D-CNN + RNN) provides more accurate and useful assessments for patients with different types of brain cancers. Thus, gene expression data are used to create a DL classification-based- hybrid model that will hold senior promise in the treatment of brain cancer.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141755095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}