M. Mahyoub, F. Natalia, S. Sudirman, A. Al-Jumaily, P. Liatsis
{"title":"基于多任务网络的城市道路场景图像语义分割与深度估计","authors":"M. Mahyoub, F. Natalia, S. Sudirman, A. Al-Jumaily, P. Liatsis","doi":"10.1109/DeSE58274.2023.10099504","DOIUrl":null,"url":null,"abstract":"In autonomous driving, environment perception is an important step in understanding the driving scene. Objects in images captured through a vehicle camera can be detected and classified using semantic segmentation and depth estimation methods. Both these tasks are closely related to each other and this association helps in building a multi-task neural network where a single network is used to generate both views from a given monocular image. This approach gives the flexibility to include multiple related tasks in a single network. It helps reduce multiple independent networks and improve the performance of all related tasks. The main aim of our research presented in this paper is to build a multi-task deep learning network for simultaneous semantic segmentation and depth estimation from monocular images. Two decoder-focused U-N et-based multi-task networks that use a pre-trained Resnet-50 and DenseNet-121 which shared encoder and task-specific decoder networks with Attention Mechanisms are considered. We also employed multi-task optimization strategies such as equal weighting and dynamic weight averaging during the training of the models. The corresponding models' performance is evaluated using mean IoU for semantic segmentation and Root Mean Square Error for depth estimation. From our experiments, we found that the performance of these multi-task networks is on par with the corresponding single-task networks.","PeriodicalId":346847,"journal":{"name":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks\",\"authors\":\"M. Mahyoub, F. Natalia, S. Sudirman, A. Al-Jumaily, P. Liatsis\",\"doi\":\"10.1109/DeSE58274.2023.10099504\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In autonomous driving, environment perception is an important step in understanding the driving scene. Objects in images captured through a vehicle camera can be detected and classified using semantic segmentation and depth estimation methods. Both these tasks are closely related to each other and this association helps in building a multi-task neural network where a single network is used to generate both views from a given monocular image. This approach gives the flexibility to include multiple related tasks in a single network. It helps reduce multiple independent networks and improve the performance of all related tasks. The main aim of our research presented in this paper is to build a multi-task deep learning network for simultaneous semantic segmentation and depth estimation from monocular images. Two decoder-focused U-N et-based multi-task networks that use a pre-trained Resnet-50 and DenseNet-121 which shared encoder and task-specific decoder networks with Attention Mechanisms are considered. We also employed multi-task optimization strategies such as equal weighting and dynamic weight averaging during the training of the models. The corresponding models' performance is evaluated using mean IoU for semantic segmentation and Root Mean Square Error for depth estimation. From our experiments, we found that the performance of these multi-task networks is on par with the corresponding single-task networks.\",\"PeriodicalId\":346847,\"journal\":{\"name\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DeSE58274.2023.10099504\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DeSE58274.2023.10099504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks
In autonomous driving, environment perception is an important step in understanding the driving scene. Objects in images captured through a vehicle camera can be detected and classified using semantic segmentation and depth estimation methods. Both these tasks are closely related to each other and this association helps in building a multi-task neural network where a single network is used to generate both views from a given monocular image. This approach gives the flexibility to include multiple related tasks in a single network. It helps reduce multiple independent networks and improve the performance of all related tasks. The main aim of our research presented in this paper is to build a multi-task deep learning network for simultaneous semantic segmentation and depth estimation from monocular images. Two decoder-focused U-N et-based multi-task networks that use a pre-trained Resnet-50 and DenseNet-121 which shared encoder and task-specific decoder networks with Attention Mechanisms are considered. We also employed multi-task optimization strategies such as equal weighting and dynamic weight averaging during the training of the models. The corresponding models' performance is evaluated using mean IoU for semantic segmentation and Root Mean Square Error for depth estimation. From our experiments, we found that the performance of these multi-task networks is on par with the corresponding single-task networks.