In this paper, we introduce Neural Collaborative Search (NCS), a novel learning-based framework for efficiently solving pickup and delivery problems (PDPs). NCS pioneers the collaboration between the latest prevalent neural construction and neural improvement models, establishing a collaborative framework where an improvement model iteratively refines solutions initiated by a construction model. Our NCS collaboratively trains the two models via reinforcement learning with an effective shared-critic mechanism. In addition, the construction model enhances the improvement model with high-quality initial solutions via curriculum learning, while the improvement model accelerates the convergence of the construction model through imitation learning. Besides the new framework design, we also propose the efficient Neural Neighborhood Search (N2S), an efficient improvement model employed within the NCS framework. N2S exploits a tailored Markov decision process formulation and two customized decoders for removing and then reinserting a pair of pickup-delivery nodes, thereby learning a ruin-repair search process for addressing the precedence constraints in PDPs efficiently. To balance the computation cost between encoders and decoders, N2S streamlines the existing encoder design through a light Synthesis Attention mechanism that allows the vanilla self-attention to synthesize various features regarding a route solution. Moreover, a diversity enhancement scheme is further leveraged to ameliorate the performance during the inference of N2S. Our NCS and N2S are both generic, and extensive experiments on two canonical PDP variants show that they can produce state-of-the-art results among existing neural methods. Remarkably, our NCS and N2S could surpass the well-known LKH3 solver especially on the more constrained PDP variant.
{"title":"Efficient Neural Collaborative Search for Pickup and Delivery Problems","authors":"Detian Kong;Yining Ma;Zhiguang Cao;Tianshu Yu;Jianhua Xiao","doi":"10.1109/TPAMI.2024.3450850","DOIUrl":"10.1109/TPAMI.2024.3450850","url":null,"abstract":"In this paper, we introduce Neural Collaborative Search (NCS), a novel learning-based framework for efficiently solving pickup and delivery problems (PDPs). NCS pioneers the collaboration between the latest prevalent neural construction and neural improvement models, establishing a collaborative framework where an improvement model iteratively refines solutions initiated by a construction model. Our NCS collaboratively trains the two models via reinforcement learning with an effective shared-critic mechanism. In addition, the construction model enhances the improvement model with high-quality initial solutions via curriculum learning, while the improvement model accelerates the convergence of the construction model through imitation learning. Besides the new framework design, we also propose the efficient Neural Neighborhood Search (N2S), an efficient improvement model employed within the NCS framework. N2S exploits a tailored Markov decision process formulation and two customized decoders for removing and then reinserting a pair of pickup-delivery nodes, thereby learning a ruin-repair search process for addressing the precedence constraints in PDPs efficiently. To balance the computation cost between encoders and decoders, N2S streamlines the existing encoder design through a light Synthesis Attention mechanism that allows the vanilla self-attention to synthesize various features regarding a route solution. Moreover, a diversity enhancement scheme is further leveraged to ameliorate the performance during the inference of N2S. Our NCS and N2S are both generic, and extensive experiments on two canonical PDP variants show that they can produce state-of-the-art results among existing neural methods. Remarkably, our NCS and N2S could surpass the well-known LKH3 solver especially on the more constrained PDP variant.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11019-11034"},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1109/TPAMI.2024.3453916
Xiangtai Li;Shilin Xu;Yibo Yang;Haobo Yuan;Guangliang Cheng;Yunhai Tong;Zhouchen Lin;Ming-Hsuan Yang;Dacheng Tao
Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework, Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we first design a meta-architecture that decouples part features and things/stuff features, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Second, we propose a new metric Part-Whole Quality (PWQ), better to measure this task from pixel-region and part-whole perspectives. It also decouples the errors for part segmentation and panoptic segmentation. Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further. We design a new part-whole interaction method using masked cross attention. Finally, extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results. Our models can serve as a strong baseline and aid future research in PPS.
{"title":"Panoptic-PartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation","authors":"Xiangtai Li;Shilin Xu;Yibo Yang;Haobo Yuan;Guangliang Cheng;Yunhai Tong;Zhouchen Lin;Ming-Hsuan Yang;Dacheng Tao","doi":"10.1109/TPAMI.2024.3453916","DOIUrl":"10.1109/TPAMI.2024.3453916","url":null,"abstract":"Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework, Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we first design a meta-architecture that decouples part features and things/stuff features, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Second, we propose a new metric Part-Whole Quality (PWQ), better to measure this task from pixel-region and part-whole perspectives. It also decouples the errors for part segmentation and panoptic segmentation. Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further. We design a new part-whole interaction method using masked cross attention. Finally, extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results. Our models can serve as a strong baseline and aid future research in PPS.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11087-11103"},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1109/TPAMI.2024.3452629
Zdravko Marinov;Paul F. Jäger;Jan Egger;Jens Kleesiek;Rainer Stiefelhagen
Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have propelled results to a new level causing a rapid growth in the field with 121 methods proposed in the medical imaging domain alone. In this review, we provide a structured overview of this emerging field featuring a comprehensive taxonomy, a systematic review of existing methods, and an in-depth analysis of current practices. Based on these contributions, we discuss the challenges and opportunities in the field. For instance, we find that there is a severe lack of comparison across methods which needs to be tackled by standardized baselines and benchmarks.
{"title":"Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy","authors":"Zdravko Marinov;Paul F. Jäger;Jan Egger;Jens Kleesiek;Rainer Stiefelhagen","doi":"10.1109/TPAMI.2024.3452629","DOIUrl":"10.1109/TPAMI.2024.3452629","url":null,"abstract":"Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have propelled results to a new level causing a rapid growth in the field with 121 methods proposed in the medical imaging domain alone. In this review, we provide a structured overview of this emerging field featuring a comprehensive taxonomy, a systematic review of existing methods, and an in-depth analysis of current practices. Based on these contributions, we discuss the challenges and opportunities in the field. For instance, we find that there is a severe lack of comparison across methods which needs to be tackled by standardized baselines and benchmarks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10998-11018"},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10660300","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1109/TPAMI.2024.3451994
Francesco Taioli;Francesco Giuliari;Yiming Wang;Riccardo Berra;Alberto Castellini;Alessio Del Bue;Alessandro Farinelli;Marco Cristani;Francesco Setti
We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: i)