{"title":"通过图像和音频集成进行砾石路状况评估的多模态深度学习方法","authors":"Nausheen Saeed, Moudud Alam, Roger G Nyberg","doi":"10.1016/j.treng.2024.100228","DOIUrl":null,"url":null,"abstract":"<div><p>This study investigates the combination of audio and image data to classify road conditions, particularly focusing on loose gravel scenarios. The dataset underwent binary categorisation, comprising audio segments capturing gravel sounds and corresponding images. Early feature fusion, utilising a pre-trained Very Deep Convolutional Networks 19 (VGG19) and Principal component analysis (PCA), improved the accuracy of the Random Forest classifier, surpassing other models in accuracy, precision, recall, and F1-score. Late fusion, involving decision-level processing with logical disjunction and conjunction gates (AND and OR) in combination with individual classifiers for images and audio based on Densely Connected Convolutional Networks 121 (DenseNet121), demonstrated notable performance, especially with the OR gate, achieving 97 % accuracy. The late fusion method enhances adaptability by compensating for limitations in one modality with information from the other. Adapting maintenance based on identified road conditions minimises unnecessary environmental impact. This method can help to identify loose gravel on gravel roads, substantially improving road safety and implementing a precise maintenance strategy through a data-driven approach.</p></div>","PeriodicalId":34480,"journal":{"name":"Transportation Engineering","volume":"16 ","pages":"Article 100228"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666691X24000034/pdfft?md5=e494ea8d359b2181c5933b6007c556a3&pid=1-s2.0-S2666691X24000034-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A multimodal deep learning approach for gravel road condition evaluation through image and audio integration\",\"authors\":\"Nausheen Saeed, Moudud Alam, Roger G Nyberg\",\"doi\":\"10.1016/j.treng.2024.100228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This study investigates the combination of audio and image data to classify road conditions, particularly focusing on loose gravel scenarios. The dataset underwent binary categorisation, comprising audio segments capturing gravel sounds and corresponding images. Early feature fusion, utilising a pre-trained Very Deep Convolutional Networks 19 (VGG19) and Principal component analysis (PCA), improved the accuracy of the Random Forest classifier, surpassing other models in accuracy, precision, recall, and F1-score. Late fusion, involving decision-level processing with logical disjunction and conjunction gates (AND and OR) in combination with individual classifiers for images and audio based on Densely Connected Convolutional Networks 121 (DenseNet121), demonstrated notable performance, especially with the OR gate, achieving 97 % accuracy. The late fusion method enhances adaptability by compensating for limitations in one modality with information from the other. Adapting maintenance based on identified road conditions minimises unnecessary environmental impact. This method can help to identify loose gravel on gravel roads, substantially improving road safety and implementing a precise maintenance strategy through a data-driven approach.</p></div>\",\"PeriodicalId\":34480,\"journal\":{\"name\":\"Transportation Engineering\",\"volume\":\"16 \",\"pages\":\"Article 100228\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666691X24000034/pdfft?md5=e494ea8d359b2181c5933b6007c556a3&pid=1-s2.0-S2666691X24000034-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666691X24000034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666691X24000034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
摘要
本研究调查了结合音频和图像数据对路况进行分类的方法,尤其侧重于松散砾石的情况。数据集进行了二元分类,包括捕捉砾石声音的音频片段和相应的图像。利用预先训练的深度卷积网络 19 (VGG19) 和主成分分析 (PCA) 进行的早期特征融合提高了随机森林分类器的准确度,在准确度、精确度、召回率和 F1 分数方面都超过了其他模型。后期融合法涉及逻辑析取和连接门(AND 和 OR)的决策级处理,结合基于密集连接卷积网络 121(DenseNet 121)的图像和音频单个分类器,表现出显著的性能,尤其是 OR 门,准确率达到 97%。后期融合方法通过利用另一种模式的信息来弥补一种模式的局限性,从而增强了适应性。根据已识别的道路状况调整维护工作,可将不必要的环境影响降至最低。这种方法有助于识别砾石路上的松散砾石,大大提高道路安全性,并通过数据驱动方法实施精确的维护策略。
A multimodal deep learning approach for gravel road condition evaluation through image and audio integration
This study investigates the combination of audio and image data to classify road conditions, particularly focusing on loose gravel scenarios. The dataset underwent binary categorisation, comprising audio segments capturing gravel sounds and corresponding images. Early feature fusion, utilising a pre-trained Very Deep Convolutional Networks 19 (VGG19) and Principal component analysis (PCA), improved the accuracy of the Random Forest classifier, surpassing other models in accuracy, precision, recall, and F1-score. Late fusion, involving decision-level processing with logical disjunction and conjunction gates (AND and OR) in combination with individual classifiers for images and audio based on Densely Connected Convolutional Networks 121 (DenseNet121), demonstrated notable performance, especially with the OR gate, achieving 97 % accuracy. The late fusion method enhances adaptability by compensating for limitations in one modality with information from the other. Adapting maintenance based on identified road conditions minimises unnecessary environmental impact. This method can help to identify loose gravel on gravel roads, substantially improving road safety and implementing a precise maintenance strategy through a data-driven approach.