{"title":"提高自动驾驶车辆的数据效率:利用数据草图检测驾驶异常情况","authors":"Debbie Aisiana Indah , Judith Mwakalonge , Gurcan Comert , Saidi Siuhi","doi":"10.1016/j.mlwa.2024.100530","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning models for near collision detection in autonomous vehicles promise enhanced predictive power. However, training on these large datasets presents storage and computational challenges, particularly when operated on conventional computing systems. This paper addresses the problem of training anomaly detection models from large-scale vehicle trajectory datasets and adopts a reservoir sampling-based data sketching technique. Predetermined subset sizes ranging from 0.4% to 100% of the original data are utilized, A single-pass reservoir sampling algorithm is then applied to construct these data subsets efficiently. Subsequently, a Support Vector Machine (SVM) model is trained on these subsets, and its performance is assessed by various metrics, including accuracy, precision, recall, and F1-score. Experimental outcomes on the HighD dataset, a comprehensive real-world collection of vehicle trajectories, confirm that our approach can achieve robust near-collision detection. With a full dataset, our model achieved an F1-score of 0.9998 for class 0 and 0.9984 for class 1. When the data was reduced to as low as 0.4% of the original size, the F1-score for class 0 remained at 0.9998 and 0.7143 for class 1. This demonstrates a capability to maintain a relatively high performance even with a 99.6% reduction in data size. Moreover, precision and recall values ranged from 71.3% to 0.999 across varying sketch sizes.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"15 ","pages":"Article 100530"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000069/pdfft?md5=9be9b5b35b0fb83d6e0d7837356cf364&pid=1-s2.0-S2666827024000069-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Enhancing data efficiency for autonomous vehicles: Using data sketches for detecting driving anomalies\",\"authors\":\"Debbie Aisiana Indah , Judith Mwakalonge , Gurcan Comert , Saidi Siuhi\",\"doi\":\"10.1016/j.mlwa.2024.100530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Machine learning models for near collision detection in autonomous vehicles promise enhanced predictive power. However, training on these large datasets presents storage and computational challenges, particularly when operated on conventional computing systems. This paper addresses the problem of training anomaly detection models from large-scale vehicle trajectory datasets and adopts a reservoir sampling-based data sketching technique. Predetermined subset sizes ranging from 0.4% to 100% of the original data are utilized, A single-pass reservoir sampling algorithm is then applied to construct these data subsets efficiently. Subsequently, a Support Vector Machine (SVM) model is trained on these subsets, and its performance is assessed by various metrics, including accuracy, precision, recall, and F1-score. Experimental outcomes on the HighD dataset, a comprehensive real-world collection of vehicle trajectories, confirm that our approach can achieve robust near-collision detection. With a full dataset, our model achieved an F1-score of 0.9998 for class 0 and 0.9984 for class 1. When the data was reduced to as low as 0.4% of the original size, the F1-score for class 0 remained at 0.9998 and 0.7143 for class 1. This demonstrates a capability to maintain a relatively high performance even with a 99.6% reduction in data size. Moreover, precision and recall values ranged from 71.3% to 0.999 across varying sketch sizes.</p></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"15 \",\"pages\":\"Article 100530\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000069/pdfft?md5=9be9b5b35b0fb83d6e0d7837356cf364&pid=1-s2.0-S2666827024000069-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827024000069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
用于自动驾驶汽车近距离碰撞检测的机器学习模型有望增强预测能力。然而,在这些大型数据集上进行训练会带来存储和计算方面的挑战,尤其是在传统计算系统上运行时。本文针对从大规模车辆轨迹数据集训练异常检测模型的问题,采用了基于水库采样的数据草图技术。利用原始数据的 0.4% 到 100% 之间的预定子集大小,然后应用单通道水库采样算法高效地构建这些数据子集。随后,在这些子集上训练支持向量机(SVM)模型,并通过各种指标(包括准确率、精确度、召回率和 F1 分数)评估其性能。HighD 数据集是一个全面的真实世界车辆轨迹集合,在该数据集上的实验结果证实,我们的方法可以实现稳健的近碰撞检测。在完整数据集上,我们的模型在 0 类和 1 类的 F1 分数分别达到了 0.9998 和 0.9984。当数据减少到原始数据的 0.4% 时,0 类的 F1 分数仍为 0.9998,1 类为 0.7143。这表明,即使数据量减少 99.6%,也能保持相对较高的性能。此外,在不同的草图大小中,精确度和召回值从 71.3% 到 0.999 不等。
Enhancing data efficiency for autonomous vehicles: Using data sketches for detecting driving anomalies
Machine learning models for near collision detection in autonomous vehicles promise enhanced predictive power. However, training on these large datasets presents storage and computational challenges, particularly when operated on conventional computing systems. This paper addresses the problem of training anomaly detection models from large-scale vehicle trajectory datasets and adopts a reservoir sampling-based data sketching technique. Predetermined subset sizes ranging from 0.4% to 100% of the original data are utilized, A single-pass reservoir sampling algorithm is then applied to construct these data subsets efficiently. Subsequently, a Support Vector Machine (SVM) model is trained on these subsets, and its performance is assessed by various metrics, including accuracy, precision, recall, and F1-score. Experimental outcomes on the HighD dataset, a comprehensive real-world collection of vehicle trajectories, confirm that our approach can achieve robust near-collision detection. With a full dataset, our model achieved an F1-score of 0.9998 for class 0 and 0.9984 for class 1. When the data was reduced to as low as 0.4% of the original size, the F1-score for class 0 remained at 0.9998 and 0.7143 for class 1. This demonstrates a capability to maintain a relatively high performance even with a 99.6% reduction in data size. Moreover, precision and recall values ranged from 71.3% to 0.999 across varying sketch sizes.