Generative models have shown promising results in capturing human mobility characteristics and generating synthetic trajectories. However, it remains challenging to ensure that the generated geospatial mobility data is semantically realistic, including consistent location sequences, and reflects real-world characteristics, such as constraining on geospatial limits. We reformat human mobility modeling as an autoregressive generation task to address these issues, leveraging the Generative Pre-trained Transformer (GPT) architecture. To ensure its controllable generation to alleviate the above challenges, we propose a geospatially-aware generative model, MobilityGPT. We propose a gravity-based sampling method to train a transformer for semantic sequence similarity. Then, we constrained the training process via a road connectivity matrix that provides the connectivity of sequences in trajectory generation, thereby keeping generated trajectories in geospatial limits. Lastly, we proposed to construct a preference dataset for fine-tuning MobilityGPT via Reinforcement Learning from Trajectory Feedback (RLTF) mechanism, which minimizes the travel distance between training and the synthetically generated trajectories. Experiments on real-world datasets demonstrate MobilityGPT’s superior performance over state-of-the-art methods in generating high-quality mobility trajectories that are closest to real data in terms of origin-destination similarity, trip length, travel radius, link, and gravity distributions. We release the source code and reference links to datasets at https://github.com/ammarhydr/MobilityGPT
{"title":"MobilityGPT: Enhanced Human Mobility Modeling With a GPT Model","authors":"Ammar Haydari;Dongjie Chen;Zhengfeng Lai;Michael Zhang;Chen-Nee Chuah","doi":"10.1109/TITS.2025.3626357","DOIUrl":"https://doi.org/10.1109/TITS.2025.3626357","url":null,"abstract":"Generative models have shown promising results in capturing human mobility characteristics and generating synthetic trajectories. However, it remains challenging to ensure that the generated geospatial mobility data is semantically realistic, including consistent location sequences, and reflects real-world characteristics, such as constraining on geospatial limits. We reformat human mobility modeling as an autoregressive generation task to address these issues, leveraging the Generative Pre-trained Transformer (GPT) architecture. To ensure its controllable generation to alleviate the above challenges, we propose a geospatially-aware generative model, MobilityGPT. We propose a gravity-based sampling method to train a transformer for semantic sequence similarity. Then, we constrained the training process via a road connectivity matrix that provides the connectivity of sequences in trajectory generation, thereby keeping generated trajectories in geospatial limits. Lastly, we proposed to construct a preference dataset for fine-tuning MobilityGPT via Reinforcement Learning from Trajectory Feedback (RLTF) mechanism, which minimizes the travel distance between training and the synthetically generated trajectories. Experiments on real-world datasets demonstrate MobilityGPT’s superior performance over state-of-the-art methods in generating high-quality mobility trajectories that are closest to real data in terms of origin-destination similarity, trip length, travel radius, link, and gravity distributions. We release the source code and reference links to datasets at <uri>https://github.com/ammarhydr/MobilityGPT</uri>","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"27 1","pages":"1681-1694"},"PeriodicalIF":8.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145877115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-11DOI: 10.1109/TITS.2025.3623579
{"title":"IEEE Intelligent Transportation Systems Society Information","authors":"","doi":"10.1109/TITS.2025.3623579","DOIUrl":"https://doi.org/10.1109/TITS.2025.3623579","url":null,"abstract":"","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 11","pages":"C3-C3"},"PeriodicalIF":8.4,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11241053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145486509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1109/TITS.2025.3625181
Quan Hao;Rui Shi;Jiaze Li;Liguo Zhang
Foreign object intrusion into high-speed railway (HSR) catenary systems poses severe operational hazards, making effective detection crucial for safety. Precise detection of these small intrusive objects is essential. However, the lack of datasets and research on foreign object intrusion in HSR scenario brings two major challenges: limited data and low accuracy for detecting small intrusive objects. To address these challenges, this paper introduces a novel generative method for detecting foreign object intrusion. To address data limitations, we use low-rank adaptation to fine-tune a diffusion model, developing a generation-extraction-integration framework that generates true-to-reality HSR images of small intrusive target objects. Furthermore, to enhance the detection of small objects in HSR scenario, we propose a new detection model called SA-YOLO. Based on the YOLOv9 architecture, this model optimizes the backbone network using the star operation, an element-wise multiplication method, and introduces the A-DyS module to improve upsampling through dynamic sampling and attention mechanism. Extensive experiments demonstrate that in the HSR scenario our method outperforms existing state-of-the-art approaches in terms of both generation quality and detection performance, while also showing high robustness.
{"title":"Generative Approach for Detecting Small Intrusive Foreign Objects in High-Speed Railway Scenario","authors":"Quan Hao;Rui Shi;Jiaze Li;Liguo Zhang","doi":"10.1109/TITS.2025.3625181","DOIUrl":"https://doi.org/10.1109/TITS.2025.3625181","url":null,"abstract":"Foreign object intrusion into high-speed railway (HSR) catenary systems poses severe operational hazards, making effective detection crucial for safety. Precise detection of these small intrusive objects is essential. However, the lack of datasets and research on foreign object intrusion in HSR scenario brings two major challenges: limited data and low accuracy for detecting small intrusive objects. To address these challenges, this paper introduces a novel generative method for detecting foreign object intrusion. To address data limitations, we use low-rank adaptation to fine-tune a diffusion model, developing a generation-extraction-integration framework that generates true-to-reality HSR images of small intrusive target objects. Furthermore, to enhance the detection of small objects in HSR scenario, we propose a new detection model called SA-YOLO. Based on the YOLOv9 architecture, this model optimizes the backbone network using the star operation, an element-wise multiplication method, and introduces the A-DyS module to improve upsampling through dynamic sampling and attention mechanism. Extensive experiments demonstrate that in the HSR scenario our method outperforms existing state-of-the-art approaches in terms of both generation quality and detection performance, while also showing high robustness.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"27 1","pages":"1471-1484"},"PeriodicalIF":8.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1109/TITS.2025.3616119
Changze Li;Yunxue Lu;Hao Wang
The research on signal coordination has been greatly enriched over the last decade. However, existing contributions face inherent limitations such as weak connection between objectives and common measurements of effectiveness (MOEs) caused by insufficient modeling of traffic dynamics, invariable phase splits, and great demand on hyperparameters. Meanwhile, nearly all related works are concentrated on scenarios with only under-saturated phases. Therefore, an arterial signal coordination model for minimum level of over-saturation and stops is proposed. Unlike most related works, the proposed model focuses on minimizing phase over-saturation and total stops by estimating queue profile for all phases under variable signal plans. The model is initially formulated as a mixed-integer nonlinear programming (MINLP). By applying linearization techniques, it is then transformed into a mixed-integer linear programming (MILP). Simulation experiments are carried out in SUMO, where an artery is built with eight scenarios of different traffic demand. The results indicate that the model is more competent in reducing average delay (AD), average stops (AS) and average total travel time (ATTT) than Yang’s multi-path progression model for all scenarios. It is also verified to best MP-BAND by managing obvious reduction in AS and showing advantage in decreasing AD and ATTT in most scenarios. Additionally, the proposed model is able to alleviate the level of over-saturation for an intersection by re-allocating phase splits properly, resulting in less over-saturated phases. Intuitive illustrations attest to the effectiveness of the queue estimation in the proposed model, highlighting the theoretical importance of modeling queue length as a variable.
{"title":"A Multi-Objective Model for Traffic Signal Coordination Control With Queue Profile Estimation","authors":"Changze Li;Yunxue Lu;Hao Wang","doi":"10.1109/TITS.2025.3616119","DOIUrl":"https://doi.org/10.1109/TITS.2025.3616119","url":null,"abstract":"The research on signal coordination has been greatly enriched over the last decade. However, existing contributions face inherent limitations such as weak connection between objectives and common measurements of effectiveness (MOEs) caused by insufficient modeling of traffic dynamics, invariable phase splits, and great demand on hyperparameters. Meanwhile, nearly all related works are concentrated on scenarios with only under-saturated phases. Therefore, an arterial signal coordination model for minimum level of over-saturation and stops is proposed. Unlike most related works, the proposed model focuses on minimizing phase over-saturation and total stops by estimating queue profile for all phases under variable signal plans. The model is initially formulated as a mixed-integer nonlinear programming (MINLP). By applying linearization techniques, it is then transformed into a mixed-integer linear programming (MILP). Simulation experiments are carried out in SUMO, where an artery is built with eight scenarios of different traffic demand. The results indicate that the model is more competent in reducing average delay (AD), average stops (AS) and average total travel time (ATTT) than Yang’s multi-path progression model for all scenarios. It is also verified to best MP-BAND by managing obvious reduction in AS and showing advantage in decreasing AD and ATTT in most scenarios. Additionally, the proposed model is able to alleviate the level of over-saturation for an intersection by re-allocating phase splits properly, resulting in less over-saturated phases. Intuitive illustrations attest to the effectiveness of the queue estimation in the proposed model, highlighting the theoretical importance of modeling queue length as a variable.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 12","pages":"23389-23406"},"PeriodicalIF":8.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145665752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1109/TITS.2025.3625273
Qinghua Chen;Xin Ge;Shiqian Chen;Xiaoyu Hu;Yong Jiang;Jiheng Wu;Kaiyun Wang
Electronically controlled pneumatic (ECP) is an auxiliary device for the air brake system that replaces traditional signals with electrical signals for transmitting braking waves. This study presents an ECP design that integrates synchronous braking and release functionalities. Based on the fluid dynamics theory, we developed an air brake system model with ECP devices for a 20,000-ton heavy-haul train. Then, the influence of the ECP devices on the performance of air braking, longitudinal dynamics, and operational safety is analyzed under different operation conditions. Simulation results demonstrate that the ECP devices can significantly enhance the consistency of train manipulation under braking and release phases, and increase the charging time of the air brake system during cyclic braking. Additionally, the ECP devices effectively reduce the compressive coupler forces of the salve control locomotives and improve the wheel-rail safety of trains negotiating tight curves. The findings in this study could provide valuable guidance for parameter design when implementing ECP devices in field applications.
{"title":"An Air Brake Model With Electronically Controlled Pneumatic for Heavy-Haul Trains","authors":"Qinghua Chen;Xin Ge;Shiqian Chen;Xiaoyu Hu;Yong Jiang;Jiheng Wu;Kaiyun Wang","doi":"10.1109/TITS.2025.3625273","DOIUrl":"https://doi.org/10.1109/TITS.2025.3625273","url":null,"abstract":"Electronically controlled pneumatic (ECP) is an auxiliary device for the air brake system that replaces traditional signals with electrical signals for transmitting braking waves. This study presents an ECP design that integrates synchronous braking and release functionalities. Based on the fluid dynamics theory, we developed an air brake system model with ECP devices for a 20,000-ton heavy-haul train. Then, the influence of the ECP devices on the performance of air braking, longitudinal dynamics, and operational safety is analyzed under different operation conditions. Simulation results demonstrate that the ECP devices can significantly enhance the consistency of train manipulation under braking and release phases, and increase the charging time of the air brake system during cyclic braking. Additionally, the ECP devices effectively reduce the compressive coupler forces of the salve control locomotives and improve the wheel-rail safety of trains negotiating tight curves. The findings in this study could provide valuable guidance for parameter design when implementing ECP devices in field applications.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"27 1","pages":"1578-1591"},"PeriodicalIF":8.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145877112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, multi-sensor fusion-based vehicle infrastructure cooperative perception has aroused extensive attention due to the demands for the safety of autonomous driving and traffic monitoring. An accurate calibration between different sensors is a critical foundation for most sensor fusion systems. For LiDAR-camera calibration, high accuracy can be achieved with the help of artificial calibration targets, such as a checkerboard. However, unlike autonomous vehicles, roadside sensors monitor traffic scenes with continuous traffic flow from a fixed viewpoint, posing challenges for conventional calibration methods. There, a calibration method suitable for roadside scenes is required for infrastructure sensors. In this paper, we propose FlowCalib, a novel targetless infrastructure LiDAR-camera spatial calibration method through alignment of scene flow and optical flow. The main idea is to leverage the inherent consistency of moving objects in traffic flow across two types of sensor data. Firstly, the moving objects are extracted by optical flow and scene flow. Then, the extrinsic parameters are obtained in two steps: rough calibration and calibration refinement. In rough calibration, the center and motion flow of each moving instance are calculated by clustering methods separately in the point cloud and image. Based on this, the possible initial value set of extrinsic parameters is estimated by two-step parameter sampling. The initial parameters are obtained by distance of center and motion flow in point cloud and image based scoring. Subsequently, the extrinsic parameters are refined by optimization of instance alignment loss and flow alignment loss of moving objects. In the end, quantitative and qualitative experiments are conducted to validate the effectiveness of the algorithm across both simulated datasets and real-world datasets.
{"title":"FlowCalib: Targetless Infrastructure LiDAR-Camera Extrinsic Calibration Based on Optical Flow and Scene Flow","authors":"Renwei Hai;Yanqing Shen;Yuchen Yan;Shitao Chen;Jingmin Xin;Nanning Zheng","doi":"10.1109/TITS.2025.3627651","DOIUrl":"https://doi.org/10.1109/TITS.2025.3627651","url":null,"abstract":"Recently, multi-sensor fusion-based vehicle infrastructure cooperative perception has aroused extensive attention due to the demands for the safety of autonomous driving and traffic monitoring. An accurate calibration between different sensors is a critical foundation for most sensor fusion systems. For LiDAR-camera calibration, high accuracy can be achieved with the help of artificial calibration targets, such as a checkerboard. However, unlike autonomous vehicles, roadside sensors monitor traffic scenes with continuous traffic flow from a fixed viewpoint, posing challenges for conventional calibration methods. There, a calibration method suitable for roadside scenes is required for infrastructure sensors. In this paper, we propose FlowCalib, a novel targetless infrastructure LiDAR-camera spatial calibration method through alignment of scene flow and optical flow. The main idea is to leverage the inherent consistency of moving objects in traffic flow across two types of sensor data. Firstly, the moving objects are extracted by optical flow and scene flow. Then, the extrinsic parameters are obtained in two steps: rough calibration and calibration refinement. In rough calibration, the center and motion flow of each moving instance are calculated by clustering methods separately in the point cloud and image. Based on this, the possible initial value set of extrinsic parameters is estimated by two-step parameter sampling. The initial parameters are obtained by distance of center and motion flow in point cloud and image based scoring. Subsequently, the extrinsic parameters are refined by optimization of instance alignment loss and flow alignment loss of moving objects. In the end, quantitative and qualitative experiments are conducted to validate the effectiveness of the algorithm across both simulated datasets and real-world datasets.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"27 1","pages":"1565-1577"},"PeriodicalIF":8.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145877121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1109/TITS.2025.3624271
Yunpeng Ba;Ruihao Zheng;Zhenkun Wang;Genghui Li
The Heterogeneous Fleet Vehicle Routing Problem (HFVRP) aims to find optimal routes for vehicles with different capacities and costs, and is common in real-world applications. Total cost and fairness among drivers are two important yet conflicting objectives, while existing studies address either one objective alone or a specific weighted sum of them. To trade off the two objectives simultaneously, this paper formulates the Multi-Objective HFVRP (MO-HFVRP). Our analysis reveals that the MO-HFVRP is challenging, as the decision space has sparse feasible solutions and the objective space exhibits an uneven distribution of objective vectors. Subsequently, a corresponding algorithm called AMOILS/D is proposed. It decomposes the MO-HFVRP into a few single-objective subproblems, and then applies Iterated Local Search (ILS) and multi-objective optimization techniques to collaboratively solve them. AMOILS/D has three key components. The first is the resource allocation strategy that periodically selects subproblems to focus the search on promising regions. The other two are the adaptive perturbation degree control and the acceptance mechanism in ILS. They enable effective navigation of the decision space and balance convergence and diversity. Experimental results show that AMOILS/D significantly outperforms other representative algorithms across most instances. Ablation studies also confirm the effectiveness of each proposed component.
{"title":"Multi-Objective Heterogeneous Fleet Vehicle Routing Problem: Formulation and Algorithm","authors":"Yunpeng Ba;Ruihao Zheng;Zhenkun Wang;Genghui Li","doi":"10.1109/TITS.2025.3624271","DOIUrl":"https://doi.org/10.1109/TITS.2025.3624271","url":null,"abstract":"The Heterogeneous Fleet Vehicle Routing Problem (HFVRP) aims to find optimal routes for vehicles with different capacities and costs, and is common in real-world applications. Total cost and fairness among drivers are two important yet conflicting objectives, while existing studies address either one objective alone or a specific weighted sum of them. To trade off the two objectives simultaneously, this paper formulates the Multi-Objective HFVRP (MO-HFVRP). Our analysis reveals that the MO-HFVRP is challenging, as the decision space has sparse feasible solutions and the objective space exhibits an uneven distribution of objective vectors. Subsequently, a corresponding algorithm called AMOILS/D is proposed. It decomposes the MO-HFVRP into a few single-objective subproblems, and then applies Iterated Local Search (ILS) and multi-objective optimization techniques to collaboratively solve them. AMOILS/D has three key components. The first is the resource allocation strategy that periodically selects subproblems to focus the search on promising regions. The other two are the adaptive perturbation degree control and the acceptance mechanism in ILS. They enable effective navigation of the decision space and balance convergence and diversity. Experimental results show that AMOILS/D significantly outperforms other representative algorithms across most instances. Ablation studies also confirm the effectiveness of each proposed component.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"27 1","pages":"1666-1680"},"PeriodicalIF":8.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145877117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-07DOI: 10.1109/TITS.2025.3618307
Zhuolin He;Xinrun Li;Jiacheng Tang;Shoumeng Qiu;Wenfu Wang;Xiangyang Xue;Jian Pu
Conventional camera-based 3D object detectors in autonomous driving are limited to recognizing a predefined set of objects, which poses a safety risk when encountering novel or unseen objects in real-world scenarios. To address this limitation, we present OS-Det3D, a two-stage training framework designed for camera-based open-set 3D object detection. In the first stage, our proposed 3D object discovery network (ODN3D) uses geometric cues from LiDAR point clouds to generate class-agnostic 3D object proposals, each of which are assigned a 3D objectness score. This approach allows the network to discover objects beyond known categories, allowing for the detection of unfamiliar objects. However, due to the absence of class constraints, ODN3D-generated proposals may include noisy data, particularly in cluttered or dynamic scenes. To mitigate this issue, we introduce a joint selection (JS) module in the second stage. The JS module uses both camera bird’s eye view (BEV) feature responses and 3D objectness scores to filter out low-quality proposals, yielding high-quality pseudo ground truth for unknown objects. OS-Det3D significantly enhances the ability of camera 3D detectors to discover and identify unknown objects while also improving the performance on known objects, as demonstrated through extensive experiments on the nuScenes and KITTI datasets.
{"title":"Toward Camera Open-Set 3D Object Detection for Autonomous Driving Scenarios","authors":"Zhuolin He;Xinrun Li;Jiacheng Tang;Shoumeng Qiu;Wenfu Wang;Xiangyang Xue;Jian Pu","doi":"10.1109/TITS.2025.3618307","DOIUrl":"https://doi.org/10.1109/TITS.2025.3618307","url":null,"abstract":"Conventional camera-based 3D object detectors in autonomous driving are limited to recognizing a predefined set of objects, which poses a safety risk when encountering novel or unseen objects in real-world scenarios. To address this limitation, we present OS-Det3D, a two-stage training framework designed for camera-based open-set 3D object detection. In the first stage, our proposed 3D object discovery network (ODN3D) uses geometric cues from LiDAR point clouds to generate class-agnostic 3D object proposals, each of which are assigned a 3D objectness score. This approach allows the network to discover objects beyond known categories, allowing for the detection of unfamiliar objects. However, due to the absence of class constraints, ODN3D-generated proposals may include noisy data, particularly in cluttered or dynamic scenes. To mitigate this issue, we introduce a joint selection (JS) module in the second stage. The JS module uses both camera bird’s eye view (BEV) feature responses and 3D objectness scores to filter out low-quality proposals, yielding high-quality pseudo ground truth for unknown objects. OS-Det3D significantly enhances the ability of camera 3D detectors to discover and identify unknown objects while also improving the performance on known objects, as demonstrated through extensive experiments on the nuScenes and KITTI datasets.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 12","pages":"23190-23201"},"PeriodicalIF":8.4,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145665756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cooperative perception has significant potential to enhance perception performance compared to single-agent systems by integrating information from multiple agents through vehicle-to-everything (V2X) communication. However, several challenges hinder the attainment of high performance in cooperative perception, particularly positional errors arising from sensor data collection and time delays during data transmission. Existing research often addresses only one of these issues, making it unsuitable for scenarios where spatial-temporal errors coexist. In this paper, we focus on resolving the spatio-temporal drift issue caused by the interplay of spatial and temporal variations. To address this, we propose a novel end-to-end cooperative perception framework called Multi-frame Grouping Multi-agent Perception (MGMP), which effectively fuses spatio-temporal perception features from multiple agents, including vehicles and road infrastructure. Our approach extracts the effective semantic information of the temporal context of multiple agents, leverage the cross-learning of window information through multi-scale window attention, and group and aggregate multiple agents to simultaneously address the spatio-temporal drift problem caused by positional errors and time delays. We validate the effectiveness of our method on the V2XSet, OPV2V and Dair-V2X datasets. Experimental results indicate that, compared to the state-of-the-art (SOTA) work, our method achieves improvements of 2.7%, 1.7%, and 1.2% on AP@0.7, respectively.
{"title":"Cooperative Perception of Multi-Agents Under the Spatio-Temporal Drift Issue","authors":"Penglin Dai;Hao Zhou;Quanmin Wei;Xiao Wu;Zhanbo Sun;Zhaofei Yu","doi":"10.1109/TITS.2025.3626365","DOIUrl":"https://doi.org/10.1109/TITS.2025.3626365","url":null,"abstract":"Cooperative perception has significant potential to enhance perception performance compared to single-agent systems by integrating information from multiple agents through vehicle-to-everything (V2X) communication. However, several challenges hinder the attainment of high performance in cooperative perception, particularly positional errors arising from sensor data collection and time delays during data transmission. Existing research often addresses only one of these issues, making it unsuitable for scenarios where spatial-temporal errors coexist. In this paper, we focus on resolving the spatio-temporal drift issue caused by the interplay of spatial and temporal variations. To address this, we propose a novel end-to-end cooperative perception framework called Multi-frame Grouping Multi-agent Perception (MGMP), which effectively fuses spatio-temporal perception features from multiple agents, including vehicles and road infrastructure. Our approach extracts the effective semantic information of the temporal context of multiple agents, leverage the cross-learning of window information through multi-scale window attention, and group and aggregate multiple agents to simultaneously address the spatio-temporal drift problem caused by positional errors and time delays. We validate the effectiveness of our method on the V2XSet, OPV2V and Dair-V2X datasets. Experimental results indicate that, compared to the state-of-the-art (SOTA) work, our method achieves improvements of 2.7%, 1.7%, and 1.2% on AP@0.7, respectively.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"27 1","pages":"1485-1498"},"PeriodicalIF":8.4,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145877107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-04DOI: 10.1109/TITS.2025.3624568
Yuxin Ding;Chenxi Chen;Tianjia Yang;Xianbiao Hu
The Leader-Follower Cooperative Driving Automation (LF-CDA) system, crucial for applications such as truck platooning and off-road vehicle convoys, relies on automation and communication technologies to virtually link multiple vehicles and has become a core focus in the automated vehicle industry. Accurate relative positioning is critical for LF-CDA operations, yet GNSS can be unreliable in challenging environments. Asymmetric architecture is common in many LF-CDA systems, making direct application of localization models either infeasible or both computationally and communication intensive. This manuscript presents a lightweight LiDAR-based cooperative localization model that leverages the unique characteristics of asymmetric LF-CDA systems, specifically the property of “asynchronous view repetition.” In this context, the follower vehicle, operating in vehicle-following mode, consistently receives similar visual and spatial information as the leader vehicle, though with a time delay. To capitalize on such system characteristics, an asynchronous view repetition-based graph optimization model is formulated to minimize the positional errors of both leader and follower vehicles. To provide input to and solve the graph optimization model, a lightweight cooperative localization framework with multiple submodules is established, allowing the system to function independently of environmental constraints. A comprehensive set of experiments was conducted in the CARLA simulation environment, using CT-ICP and KISS-ICP as benchmarks, given their strong performance in single-vehicle settings. The results indicate that, under the LF-CDA scenario, our proposed model demonstrates greater suitability by achieving higher localization accuracy while maintaining comparable or even superior computational efficiency.
{"title":"Lightweight LiDAR-Based Cooperative Localization Model for Asymmetric Leader-Follower Cooperative Driving Automation System","authors":"Yuxin Ding;Chenxi Chen;Tianjia Yang;Xianbiao Hu","doi":"10.1109/TITS.2025.3624568","DOIUrl":"https://doi.org/10.1109/TITS.2025.3624568","url":null,"abstract":"The Leader-Follower Cooperative Driving Automation (LF-CDA) system, crucial for applications such as truck platooning and off-road vehicle convoys, relies on automation and communication technologies to virtually link multiple vehicles and has become a core focus in the automated vehicle industry. Accurate relative positioning is critical for LF-CDA operations, yet GNSS can be unreliable in challenging environments. Asymmetric architecture is common in many LF-CDA systems, making direct application of localization models either infeasible or both computationally and communication intensive. This manuscript presents a lightweight LiDAR-based cooperative localization model that leverages the unique characteristics of asymmetric LF-CDA systems, specifically the property of “asynchronous view repetition.” In this context, the follower vehicle, operating in vehicle-following mode, consistently receives similar visual and spatial information as the leader vehicle, though with a time delay. To capitalize on such system characteristics, an asynchronous view repetition-based graph optimization model is formulated to minimize the positional errors of both leader and follower vehicles. To provide input to and solve the graph optimization model, a lightweight cooperative localization framework with multiple submodules is established, allowing the system to function independently of environmental constraints. A comprehensive set of experiments was conducted in the CARLA simulation environment, using CT-ICP and KISS-ICP as benchmarks, given their strong performance in single-vehicle settings. The results indicate that, under the LF-CDA scenario, our proposed model demonstrates greater suitability by achieving higher localization accuracy while maintaining comparable or even superior computational efficiency.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"27 1","pages":"1650-1665"},"PeriodicalIF":8.4,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145877110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}