Sonda Fourati, Wael Jaafar, Noura Baccar, Safwan Alfattani
{"title":"XLM for Autonomous Driving Systems: A Comprehensive Review","authors":"Sonda Fourati, Wael Jaafar, Noura Baccar, Safwan Alfattani","doi":"arxiv-2409.10484","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have showcased remarkable proficiency in various\ninformation-processing tasks. These tasks span from extracting data and\nsummarizing literature to generating content, predictive modeling,\ndecision-making, and system controls. Moreover, Vision Large Models (VLMs) and\nMultimodal LLMs (MLLMs), which represent the next generation of language\nmodels, a.k.a., XLMs, can combine and integrate many data modalities with the\nstrength of language understanding, thus advancing several information-based\nsystems, such as Autonomous Driving Systems (ADS). Indeed, by combining\nlanguage communication with multimodal sensory inputs, e.g., panoramic images\nand LiDAR or radar data, accurate driving actions can be taken. In this\ncontext, we provide in this survey paper a comprehensive overview of the\npotential of XLMs towards achieving autonomous driving. Specifically, we review\nthe relevant literature on ADS and XLMs, including their architectures, tools,\nand frameworks. Then, we detail the proposed approaches to deploy XLMs for\nautonomous driving solutions. Finally, we provide the related challenges to XLM\ndeployment for ADS and point to future research directions aiming to enable XLM\nadoption in future ADS frameworks.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Models (LLMs) have showcased remarkable proficiency in various
information-processing tasks. These tasks span from extracting data and
summarizing literature to generating content, predictive modeling,
decision-making, and system controls. Moreover, Vision Large Models (VLMs) and
Multimodal LLMs (MLLMs), which represent the next generation of language
models, a.k.a., XLMs, can combine and integrate many data modalities with the
strength of language understanding, thus advancing several information-based
systems, such as Autonomous Driving Systems (ADS). Indeed, by combining
language communication with multimodal sensory inputs, e.g., panoramic images
and LiDAR or radar data, accurate driving actions can be taken. In this
context, we provide in this survey paper a comprehensive overview of the
potential of XLMs towards achieving autonomous driving. Specifically, we review
the relevant literature on ADS and XLMs, including their architectures, tools,
and frameworks. Then, we detail the proposed approaches to deploy XLMs for
autonomous driving solutions. Finally, we provide the related challenges to XLM
deployment for ADS and point to future research directions aiming to enable XLM
adoption in future ADS frameworks.