Pub Date : 2024-12-13DOI: 10.1109/TIV.2024.3517335
Min Zhou;Hairong Dong;Haifeng Song;Nan Zheng;Wen-Hua Chen;Hongwei Wang
This perspective proposes a framework for autonomous operations of rail transportation based on embodied intelligence to enhance environmental adaptation and autonomous decision-making capabilities. Three key technologies are outlined to enable perception, decision-making, and control in rail transportation, i.e., embodied perception for active environmental understanding, embodied execution, and embodied learning and evolution. It also explores the main challenges of implementing embodied intelligence-based autonomous operations. The proposed framework offers new insights and directions for practitioners in the rail transportation industry.
{"title":"Embodied Intelligence-Based Perception, Decision-Making, and Control for Autonomous Operations of Rail Transportation","authors":"Min Zhou;Hairong Dong;Haifeng Song;Nan Zheng;Wen-Hua Chen;Hongwei Wang","doi":"10.1109/TIV.2024.3517335","DOIUrl":"https://doi.org/10.1109/TIV.2024.3517335","url":null,"abstract":"This perspective proposes a framework for autonomous operations of rail transportation based on embodied intelligence to enhance environmental adaptation and autonomous decision-making capabilities. Three key technologies are outlined to enable perception, decision-making, and control in rail transportation, i.e., embodied perception for active environmental understanding, embodied execution, and embodied learning and evolution. It also explores the main challenges of implementing embodied intelligence-based autonomous operations. The proposed framework offers new insights and directions for practitioners in the rail transportation industry.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 12","pages":"5061-5065"},"PeriodicalIF":14.3,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Waterway pattern mining and route planning are essential system services in maritime applications to support safety, efficiency, and sustainability goals. In this paper, first, we propose a novel waterway pattern mining method. It allows a compact footprint design and is featured with manifold granularities, waypoint and directional tagging, forming a knowledge base with rich traffic features. Second, relying on the extracted waterway patterns as “map context”, we enhance the Theta* path planning algorithm by taking the extracted traffic properties into consideration. An integrated solution combining both the waterway patterns and the enhanced Theta* algorithm has been applied to several domain use cases: a) waterway pattern based trajectory reconstruction; b) passage plan generation for various types of vessels; and c) vessel movement estimation and forecasting. These use cases proved the usefulness and practicality of the proposed solution, its feasibility for other port waters, and potential use on a global scale.
{"title":"Innovating Waterway Route Planning as a Service for Marine Traffic Applications","authors":"Zhe Xiao;Xiaocai Zhang;Xiuju Fu;Liye Zhang;Haiyan Xu;Ryan Wen Liu;Chee Seng Chong;Zheng Qin","doi":"10.1109/TIV.2024.3516361","DOIUrl":"https://doi.org/10.1109/TIV.2024.3516361","url":null,"abstract":"Waterway pattern mining and route planning are essential system services in maritime applications to support safety, efficiency, and sustainability goals. In this paper, first, we propose a novel waterway pattern mining method. It allows a compact footprint design and is featured with manifold granularities, waypoint and directional tagging, forming a knowledge base with rich traffic features. Second, relying on the extracted waterway patterns as “map context”, we enhance the Theta<sup>*</sup> path planning algorithm by taking the extracted traffic properties into consideration. An integrated solution combining both the waterway patterns and the enhanced Theta<sup>*</sup> algorithm has been applied to several domain use cases: a) waterway pattern based trajectory reconstruction; b) passage plan generation for various types of vessels; and c) vessel movement estimation and forecasting. These use cases proved the usefulness and practicality of the proposed solution, its feasibility for other port waters, and potential use on a global scale.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 12","pages":"5151-5161"},"PeriodicalIF":14.3,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-11DOI: 10.1109/TIV.2024.3515219
Kui Xia;Haoran Jiang;Zipeng Man;Yangsheng Jiang;Zhihong Yao
Autonomous vehicles (AVs) face considerable challenges when interacting with their environment, especially during lane changes. Enhancing decision-making and planning processes with prediction regarding the intentions and trajectories of surrounding vehicles can significantly improve lane change performance. However, prediction accuracy is limited by both the technology used and environmental variability. Utilizing low-confidence predictive data can adversely impact the safety and comfort of AV operations. This paper proposes a novel dual-model framework for lane change decision-making and planning of AVs (LC-Dual), which dialectically uses predictive data and provides redundant safety measures in parallel. In Model I, the optimal end-state trajectory for lane changes is planned using predictive information from the upper layer. In Model II, a redundant lane change trajectory is quickly generated based on a spatio-temporal safety corridor constructed from real perception data. Ultimately, the selection of the lane change model and trajectory is determined by rule-based decision-making, factoring in prediction confidence and computational efficiency. Simulation experiments demonstrate that the LC-Dual framework yields more adaptive trajectories in scenarios with accurate predictions and effective switches between lane change models in cases of inaccurate predictions. The LC-Dual framework markedly improves safety and efficiency in lane change operations, thereby facilitating broader AV adoption.
{"title":"LC-Dual: Coupling Predictive Information and Redundant Strategies for Autonomous Vehicle Lane Change Trajectory Planning","authors":"Kui Xia;Haoran Jiang;Zipeng Man;Yangsheng Jiang;Zhihong Yao","doi":"10.1109/TIV.2024.3515219","DOIUrl":"https://doi.org/10.1109/TIV.2024.3515219","url":null,"abstract":"Autonomous vehicles (AVs) face considerable challenges when interacting with their environment, especially during lane changes. Enhancing decision-making and planning processes with prediction regarding the intentions and trajectories of surrounding vehicles can significantly improve lane change performance. However, prediction accuracy is limited by both the technology used and environmental variability. Utilizing low-confidence predictive data can adversely impact the safety and comfort of AV operations. This paper proposes a novel dual-model framework for lane change decision-making and planning of AVs (LC-Dual), which dialectically uses predictive data and provides redundant safety measures in parallel. In Model I, the optimal end-state trajectory for lane changes is planned using predictive information from the upper layer. In Model II, a redundant lane change trajectory is quickly generated based on a spatio-temporal safety corridor constructed from real perception data. Ultimately, the selection of the lane change model and trajectory is determined by rule-based decision-making, factoring in prediction confidence and computational efficiency. Simulation experiments demonstrate that the LC-Dual framework yields more adaptive trajectories in scenarios with accurate predictions and effective switches between lane change models in cases of inaccurate predictions. The LC-Dual framework markedly improves safety and efficiency in lane change operations, thereby facilitating broader AV adoption.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 12","pages":"5136-5150"},"PeriodicalIF":14.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-11DOI: 10.1109/TIV.2024.3514296
Nicola Kolb;Florian Huber;Alexander Pretschner
In scenario-based testing of autonomous vehicles, scenario types serve as input for generating test cases. Towards a comprehensive catalog of scenario types, expert-defined scenario types are complemented with types derived from real traffic data. Recorded traffic contains instances of different scenario types: By clustering these instances, a scenario type can be associated with each cluster. This involves adding semantics to recorded scenario instances, our clusters' elements, in the form of maneuver-based descriptions of the behavior of scenario participants. Several works offer approaches to derive such instance descriptions. These approaches are often automated. The quality of the generated descriptions matters, given that subsequent steps build upon them. How to derive good descriptions automatically has not been addressed until now. In our work, (1) we approximate a description's quality by measuring the Fréchet distance between the trajectories obtained by simulating a description and the original trajectories in the dataset; (2) we introduce a quality threshold as to what makes a good description and, based on this threshold, we identify inadequate descriptions; (3) we build upon these evaluation results to modify identified inadequate descriptions automatically. In experiments, we extract and describe a set of 843 scenario instances from the inD intersection dataset: the average reconstruction quality in terms of a description's aggregated Fréchet distance of 0.67 and a ratio of 54% (459) of adequate descriptions improves to 0.34, and 97% (819). The average aggregated distance is reduced by half, i.e., the reconstruction quality is doubled.
{"title":"Automatically Improving Scenario Descriptions Derived From Recorded Traffic","authors":"Nicola Kolb;Florian Huber;Alexander Pretschner","doi":"10.1109/TIV.2024.3514296","DOIUrl":"https://doi.org/10.1109/TIV.2024.3514296","url":null,"abstract":"In scenario-based testing of autonomous vehicles, scenario types serve as input for generating test cases. Towards a comprehensive catalog of scenario types, expert-defined scenario types are complemented with types derived from real traffic data. Recorded traffic contains instances of different scenario types: By clustering these instances, a scenario type can be associated with each cluster. This involves adding semantics to recorded scenario instances, our clusters' elements, in the form of maneuver-based descriptions of the behavior of scenario participants. Several works offer approaches to derive such instance descriptions. These approaches are often automated. The quality of the generated descriptions matters, given that subsequent steps build upon them. How to derive good descriptions automatically has not been addressed until now. In our work, (1) we approximate a description's quality by measuring the Fréchet distance between the trajectories obtained by simulating a description and the original trajectories in the dataset; (2) we introduce a quality threshold as to what makes a <italic>good</i> description and, based on this threshold, we identify inadequate descriptions; (3) we build upon these evaluation results to modify identified <italic>inadequate</i> descriptions automatically. In experiments, we extract and describe a set of 843 scenario instances from the inD intersection dataset: the average reconstruction quality in terms of a description's aggregated Fréchet distance of 0.67 and a ratio of 54% (459) of adequate descriptions improves to 0.34, and 97% (819). The average aggregated distance is reduced by half, i.e., the reconstruction quality is doubled.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 12","pages":"5126-5135"},"PeriodicalIF":14.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10793072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09DOI: 10.1109/TIV.2024.3513401
Yaoming Zhuang;Zhenjie Duan;Li Li;Chengdong Wu;Zhanlin Liu
High-definition (HD) maps are crucial for autonomous driving, supporting decision-making, control, and localization. However, current map generation methods often rely on single-modality images, which lack depth information and environmental context. To overcome these limitations, we propose PC-FusionMap, a novel approach that generates point cloud modality data and fuses multi-modal and temporal features for accurate and efficient online HD map construction. Our method addresses the shortcomings of existing approaches by leveraging an improved depth estimation module and a supervised labeling strategy to generate point cloud data. We also introduce a multi-modal feature fusion architecture (CFMB) and a temporal fusion network (TFC) to effectively integrate multi-modal and temporal information. The CFMB architecture uses a query mechanism and cross-attention to enhance the complementary performance between modalities, simplifying the fusion process and improving accuracy. The TFC Network models dynamic changes in time series data, further enhancing the accuracy and robustness of online HD map construction. Our approach achieves state-of-the-art results on the nuScenes and the Argoverse2 datasets, surpassing baseline models in both accuracy and stability. Additionally, incorporating our method into existing HD map generation models can lead to substantial performance gains.
{"title":"PC-FusionMap: A Novel Point Cloud Generation and Multimodal Fusion Approach for HD Map Construction","authors":"Yaoming Zhuang;Zhenjie Duan;Li Li;Chengdong Wu;Zhanlin Liu","doi":"10.1109/TIV.2024.3513401","DOIUrl":"https://doi.org/10.1109/TIV.2024.3513401","url":null,"abstract":"High-definition (HD) maps are crucial for autonomous driving, supporting decision-making, control, and localization. However, current map generation methods often rely on single-modality images, which lack depth information and environmental context. To overcome these limitations, we propose PC-FusionMap, a novel approach that generates point cloud modality data and fuses multi-modal and temporal features for accurate and efficient online HD map construction. Our method addresses the shortcomings of existing approaches by leveraging an improved depth estimation module and a supervised labeling strategy to generate point cloud data. We also introduce a multi-modal feature fusion architecture (CFMB) and a temporal fusion network (TFC) to effectively integrate multi-modal and temporal information. The CFMB architecture uses a query mechanism and cross-attention to enhance the complementary performance between modalities, simplifying the fusion process and improving accuracy. The TFC Network models dynamic changes in time series data, further enhancing the accuracy and robustness of online HD map construction. Our approach achieves state-of-the-art results on the nuScenes and the Argoverse2 datasets, surpassing baseline models in both accuracy and stability. Additionally, incorporating our method into existing HD map generation models can lead to substantial performance gains.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 12","pages":"5112-5125"},"PeriodicalIF":14.3,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09DOI: 10.1109/TIV.2024.3512786
Antonello Cherubini;Gastone Pietro Rosati Papini;Alice Plebe;Mattia Piazza;Mauro Da Lio
Highly automated vehicles are complex systems, and ensuring their safe operation within their Operational Design Domain (ODD) presents significant challenges. Diagnosing failure modes and updating these systems are even more demanding tasks. This paper introduces a method to assist with assessing, diagnosing, and updating these systems by developing a stochastic model that predicts safety outcomes (collision, near-miss, or safe state) with quantified uncertainty in any parametrized scenario. The approach uses bootstrapping aggregation to create an ensemble of predictive models, leveraging fully connected feed-forward neural networks. These networks are designed with a flexible number of trainable parameters and hidden layers, requiring minimal computational resources. The model is trained on a small set of examples obtained through direct simulations that randomly sample the parametric scenario, bypassing the traditional test matrix definition. Once trained, the bootstrapped model serves as an identity card for the system under test, allowing continuous performance evaluation across the parametric scenario. The paper demonstrates applications, including safety assessment, failure mode identification, and developing a safe speed recommendation function. The model's compact size ensures rapid execution, facilitating extensive post-analysis for safety argumentation and diagnosis and real-time online use to extend the system's abilities.
{"title":"Bootstrapped Neural Models for Predicting Self-Driving Vehicle Collisions With Quantified Confidence: Offline and Online Applications","authors":"Antonello Cherubini;Gastone Pietro Rosati Papini;Alice Plebe;Mattia Piazza;Mauro Da Lio","doi":"10.1109/TIV.2024.3512786","DOIUrl":"https://doi.org/10.1109/TIV.2024.3512786","url":null,"abstract":"Highly automated vehicles are complex systems, and ensuring their safe operation within their Operational Design Domain (ODD) presents significant challenges. Diagnosing failure modes and updating these systems are even more demanding tasks. This paper introduces a method to assist with assessing, diagnosing, and updating these systems by developing a stochastic model that predicts safety outcomes (collision, near-miss, or safe state) with quantified uncertainty in any parametrized scenario. The approach uses bootstrapping aggregation to create an ensemble of predictive models, leveraging fully connected feed-forward neural networks. These networks are designed with a flexible number of trainable parameters and hidden layers, requiring minimal computational resources. The model is trained on a small set of examples obtained through direct simulations that randomly sample the parametric scenario, bypassing the traditional test matrix definition. Once trained, the bootstrapped model serves as an identity card for the system under test, allowing continuous performance evaluation across the parametric scenario. The paper demonstrates applications, including safety assessment, failure mode identification, and developing a safe speed recommendation function. The model's compact size ensures rapid execution, facilitating extensive post-analysis for safety argumentation and diagnosis and real-time online use to extend the system's abilities.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 12","pages":"5079-5099"},"PeriodicalIF":14.3,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10783023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09DOI: 10.1109/TIV.2024.3512995
Tao Lian;Jose L. Gómez;Antonio M. López
The last mile of unsupervised domain adaptation (UDA) for semantic segmentation is the challenge of solving the syn-to-real domain gap. Recent UDA methods have progressed significantly, yet they often rely on strategies customized for synthetic single-source datasets (e.g., GTA5), which limits their generalisation to multi-source datasets. Conversely, synthetic multi-source datasets hold promise for advancing the last mile of UDA but remain underutilized in current research. Thus, we propose DEC, a flexible UDA framework for multi-source datasets. Following a divide-and-conquer strategy, DEC simplifies the task by categorizing semantic classes, training models for each category, and fusing their outputs by an ensemble model trained exclusively on synthetic datasets to obtain the final segmentation mask. DEC can integrate with existing UDA methods, achieving state-of-the-art performance on Cityscapes, BDD100K, and Mapillary Vistas, significantly narrowing the syn-to-real domain gap.
{"title":"Divide, Ensemble and Conquer: The Last Mile on Unsupervised Domain Adaptation for Semantic Segmentation","authors":"Tao Lian;Jose L. Gómez;Antonio M. López","doi":"10.1109/TIV.2024.3512995","DOIUrl":"https://doi.org/10.1109/TIV.2024.3512995","url":null,"abstract":"The last mile of unsupervised domain adaptation (UDA) for semantic segmentation is the challenge of solving the syn-to-real domain gap. Recent UDA methods have progressed significantly, yet they often rely on strategies customized for synthetic single-source datasets (e.g., GTA5), which limits their generalisation to multi-source datasets. Conversely, synthetic multi-source datasets hold promise for advancing the last mile of UDA but remain underutilized in current research. Thus, we propose DEC, a flexible UDA framework for multi-source datasets. Following a divide-and-conquer strategy, DEC simplifies the task by categorizing semantic classes, training models for each category, and fusing their outputs by an ensemble model trained exclusively on synthetic datasets to obtain the final segmentation mask. DEC can integrate with existing UDA methods, achieving state-of-the-art performance on Cityscapes, BDD100K, and Mapillary Vistas, significantly narrowing the syn-to-real domain gap.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 12","pages":"5100-5111"},"PeriodicalIF":14.3,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-04DOI: 10.1109/TIV.2024.3506936
Mary L. Cummings;Ben Bauchwitz
There are currently around thirty companies testing self-driving cars in San Francisco, CA, effectively creating a living laboratory. Of these companies, only Waymo is engaged in commercial operations, while Zoox conducts routine driverless testing operations in San Francisco. Despite these successes, federal investigations have been opened into both companies for safety concerns, and Cruise is attempting to reinstate its permit after a near-fatal pedestrian crash. An analysis of these three companies’ crash data from required reporting illustrates that many areas of self-driving need improvement. The most significant crash type for Waymo and Zoox are struck-from-behind events, while Cruise struggled most with unexpected actions by others. Computer vision systems are very brittle and likely play an outsized role in crashes. Self-driving cars also struggle to reason under uncertainty, and simulations are not effectively bridging the physical-to-real-world testing gap. This analysis underscores that research is lacking, especially for artificial intelligence involving computer vision and reasoning under uncertainty.
{"title":"Identifying Research Gaps Through Self-Driving Car Data Analysis","authors":"Mary L. Cummings;Ben Bauchwitz","doi":"10.1109/TIV.2024.3506936","DOIUrl":"https://doi.org/10.1109/TIV.2024.3506936","url":null,"abstract":"There are currently around thirty companies testing self-driving cars in San Francisco, CA, effectively creating a living laboratory. Of these companies, only Waymo is engaged in commercial operations, while Zoox conducts routine driverless testing operations in San Francisco. Despite these successes, federal investigations have been opened into both companies for safety concerns, and Cruise is attempting to reinstate its permit after a near-fatal pedestrian crash. An analysis of these three companies’ crash data from required reporting illustrates that many areas of self-driving need improvement. The most significant crash type for Waymo and Zoox are struck-from-behind events, while Cruise struggled most with unexpected actions by others. Computer vision systems are very brittle and likely play an outsized role in crashes. Self-driving cars also struggle to reason under uncertainty, and simulations are not effectively bridging the physical-to-real-world testing gap. This analysis underscores that research is lacking, especially for artificial intelligence involving computer vision and reasoning under uncertainty.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 11","pages":"4903-4912"},"PeriodicalIF":14.3,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10778107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145665808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-04DOI: 10.1109/TIV.2024.3510787
Muhammad Asif Khan;Hamid Menouar;Mohamed Abdallah;Adnan Abu-Dayya
Connected and Autonomous Vehicles (CAVs) are referred to as self-driving vehicles that will become an essential component of future intelligent transportation systems. These CAVs will be equipped with various sensors for perceiving their surroundings and onboard computing capabilities to process sensor data in real-time. Light Detection and Ranging (LiDAR) is one of the essential sensors used for detecting objects and accurate distance estimation. However, LiDAR sensors are susceptible to several types of attacks. Adversaries can exploit LiDAR sensors either physically, by sending signals directly to the sensor, or digitally, by manipulating LiDAR data after gaining access to the in-vehicle network. Over the past few years, there has been significant research on the vulnerabilities, attack models, and security of LiDAR sensors. However, to our knowledge, no comprehensive survey exists that addresses these aspects of autonomous vehicle security. This paper aims to bridge this gap by presenting an overview of LiDAR-based perception, data processing, threat models, and defense mechanisms for LiDAR sensors in CAVs. We believe this paper will serve as a valuable reference for researchers, providing a clear understanding of cyber-physical attacks and defense strategies related to LiDAR sensors in autonomous vehicles and related fields.
{"title":"LiDAR in Connected and Autonomous Vehicles: Perception, Threat Model, and Defense","authors":"Muhammad Asif Khan;Hamid Menouar;Mohamed Abdallah;Adnan Abu-Dayya","doi":"10.1109/TIV.2024.3510787","DOIUrl":"https://doi.org/10.1109/TIV.2024.3510787","url":null,"abstract":"Connected and Autonomous Vehicles (CAVs) are referred to as self-driving vehicles that will become an essential component of future intelligent transportation systems. These CAVs will be equipped with various sensors for perceiving their surroundings and onboard computing capabilities to process sensor data in real-time. Light Detection and Ranging (LiDAR) is one of the essential sensors used for detecting objects and accurate distance estimation. However, LiDAR sensors are susceptible to several types of attacks. Adversaries can exploit LiDAR sensors either physically, by sending signals directly to the sensor, or digitally, by manipulating LiDAR data after gaining access to the in-vehicle network. Over the past few years, there has been significant research on the vulnerabilities, attack models, and security of LiDAR sensors. However, to our knowledge, no comprehensive survey exists that addresses these aspects of autonomous vehicle security. This paper aims to bridge this gap by presenting an overview of LiDAR-based perception, data processing, threat models, and defense mechanisms for LiDAR sensors in CAVs. We believe this paper will serve as a valuable reference for researchers, providing a clear understanding of cyber-physical attacks and defense strategies related to LiDAR sensors in autonomous vehicles and related fields.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 11","pages":"5023-5041"},"PeriodicalIF":14.3,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145665722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-03DOI: 10.1109/TIV.2024.3510563
Wei Zhou;Nan Zheng;Chen Wang
Unmanned aerial vehicles (UAVs) have emerged as valuable tools in intelligent transportation systems, offering potential for real-time traffic monitoring and emergency response. However, the limited flight endurance of UAVs restricts their ability to collect large amounts of traffic event data crucial for data-driven models. While simulation platforms provide an alternative data source, a significant visual gap persists between synthetic and real images. Given the unique characteristics of UAV perspectives, where objects typically appear smaller, and the challenges faced by existing generative approaches in maintaining structural and semantic consistency, this paper proposes a novel generative approach. Our method integrates semantic mask guidance with a style-modulated Transformer-based GAN architecture to address these issues. Our approach first utilizes the Segment-Anything Model (SAM) and Contrastive Language-Image Pre-training (CLIP) to extract semantic masks from GTA V simulated images. Subsequently, we introduce Spatially-Attentive Denormalization Modules (SADM) within the generator. These modules incorporate semantic mask statistics to enhance image quality and maintain consistency. Furthermore, we develop a perceptual discriminator incorporating a “memory bank” mechanism to more effectively evaluate image realism and stabilize the training process. To further enhance the quality of generated images, we develop a comprehensive training strategy. Our experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches in both quantitative metrics (i.e., FID and KID) and qualitative visual assessments, thus highlighting its effectiveness and superiority. Overall, our approach offers a robust solution for generating highly realistic traffic event images from UAV perspectives, effectively addressing the scarcity of real-world UAV-recorded traffic event data.
{"title":"Synthesizing Realistic Traffic Events From UAV Perspectives: A Mask-Guided Generative Approach Based on Style-Modulated Transformer","authors":"Wei Zhou;Nan Zheng;Chen Wang","doi":"10.1109/TIV.2024.3510563","DOIUrl":"https://doi.org/10.1109/TIV.2024.3510563","url":null,"abstract":"Unmanned aerial vehicles (UAVs) have emerged as valuable tools in intelligent transportation systems, offering potential for real-time traffic monitoring and emergency response. However, the limited flight endurance of UAVs restricts their ability to collect large amounts of traffic event data crucial for data-driven models. While simulation platforms provide an alternative data source, a significant visual gap persists between synthetic and real images. Given the unique characteristics of UAV perspectives, where objects typically appear smaller, and the challenges faced by existing generative approaches in maintaining structural and semantic consistency, this paper proposes a novel generative approach. Our method integrates semantic mask guidance with a style-modulated Transformer-based GAN architecture to address these issues. Our approach first utilizes the Segment-Anything Model (SAM) and Contrastive Language-Image Pre-training (CLIP) to extract semantic masks from GTA V simulated images. Subsequently, we introduce Spatially-Attentive Denormalization Modules (SADM) within the generator. These modules incorporate semantic mask statistics to enhance image quality and maintain consistency. Furthermore, we develop a perceptual discriminator incorporating a “memory bank” mechanism to more effectively evaluate image realism and stabilize the training process. To further enhance the quality of generated images, we develop a comprehensive training strategy. Our experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches in both quantitative metrics (i.e., FID and KID) and qualitative visual assessments, thus highlighting its effectiveness and superiority. Overall, our approach offers a robust solution for generating highly realistic traffic event images from UAV perspectives, effectively addressing the scarcity of real-world UAV-recorded traffic event data.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 11","pages":"5008-5022"},"PeriodicalIF":14.3,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145665815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}