{"title":"Mobile multi-scale vehicle detector and its application in traffic surveillance","authors":"Trung D. Q. Dang, Hy V. G. Che, T. Dinh","doi":"10.1145/3287921.3287957","DOIUrl":null,"url":null,"abstract":"Object detection is a major problem in computer vision. Recently, deep neural architectures have shown a dramatic boost in performance, but they are often too slow and burdensome for embedded and real-time applications such as video surveillance. In this paper, we describe a new object detection architecture that is faster than state-of-the-art detectors while improving the performance of small mobile models. Moreover, we apply this new architecture into the problem of vehicle detection, which is central to traffic surveillance systems. In more detail, our architecture uses an efficient backbone network in MobileNetV2, whose building blocks consist of depthwise convolutional layers. On top of this network, we build a feature pyramid using separable layers so that the model can detect objects at many scales. We train this network with smooth localization loss and weighted softmax loss in tandem with hard negative mining. Both training and test sets are built from recorded videos of Ho Chi Minh and Da Nang traffic or selected from DETRAC dataset. The experimental results show that our proposed solution can still achieve an mAP of 75% on the test set while using only around 3.4 million parameters and running at 100ms per image on a cheap machine.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Symposium on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3287921.3287957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Object detection is a major problem in computer vision. Recently, deep neural architectures have shown a dramatic boost in performance, but they are often too slow and burdensome for embedded and real-time applications such as video surveillance. In this paper, we describe a new object detection architecture that is faster than state-of-the-art detectors while improving the performance of small mobile models. Moreover, we apply this new architecture into the problem of vehicle detection, which is central to traffic surveillance systems. In more detail, our architecture uses an efficient backbone network in MobileNetV2, whose building blocks consist of depthwise convolutional layers. On top of this network, we build a feature pyramid using separable layers so that the model can detect objects at many scales. We train this network with smooth localization loss and weighted softmax loss in tandem with hard negative mining. Both training and test sets are built from recorded videos of Ho Chi Minh and Da Nang traffic or selected from DETRAC dataset. The experimental results show that our proposed solution can still achieve an mAP of 75% on the test set while using only around 3.4 million parameters and running at 100ms per image on a cheap machine.