Natchapon Jongwiriyanurak, Zichao Zeng, June Moh Goo, Xinglei Wang, Ilya Ilyankou, Kerkritt Srirrongvikrai, Meihui Wang, James Haworth
{"title":"V-RoAst: A New Dataset for Visual Road Assessment","authors":"Natchapon Jongwiriyanurak, Zichao Zeng, June Moh Goo, Xinglei Wang, Ilya Ilyankou, Kerkritt Srirrongvikrai, Meihui Wang, James Haworth","doi":"arxiv-2408.10872","DOIUrl":null,"url":null,"abstract":"Road traffic crashes cause millions of deaths annually and have a significant\neconomic impact, particularly in low- and middle-income countries (LMICs). This\npaper presents an approach using Vision Language Models (VLMs) for road safety\nassessment, overcoming the limitations of traditional Convolutional Neural\nNetworks (CNNs). We introduce a new task ,V-RoAst (Visual question answering\nfor Road Assessment), with a real-world dataset. Our approach optimizes prompt\nengineering and evaluates advanced VLMs, including Gemini-1.5-flash and\nGPT-4o-mini. The models effectively examine attributes for road assessment.\nUsing crowdsourced imagery from Mapillary, our scalable solution influentially\nestimates road safety levels. In addition, this approach is designed for local\nstakeholders who lack resources, as it does not require training data. It\noffers a cost-effective and automated methods for global road safety\nassessments, potentially saving lives and reducing economic burdens.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Road traffic crashes cause millions of deaths annually and have a significant
economic impact, particularly in low- and middle-income countries (LMICs). This
paper presents an approach using Vision Language Models (VLMs) for road safety
assessment, overcoming the limitations of traditional Convolutional Neural
Networks (CNNs). We introduce a new task ,V-RoAst (Visual question answering
for Road Assessment), with a real-world dataset. Our approach optimizes prompt
engineering and evaluates advanced VLMs, including Gemini-1.5-flash and
GPT-4o-mini. The models effectively examine attributes for road assessment.
Using crowdsourced imagery from Mapillary, our scalable solution influentially
estimates road safety levels. In addition, this approach is designed for local
stakeholders who lack resources, as it does not require training data. It
offers a cost-effective and automated methods for global road safety
assessments, potentially saving lives and reducing economic burdens.