{"title":"MODSiam: Moving Object Detection using Siamese Networks","authors":"Islam I. Osman, M. Shehata","doi":"10.1109/CCECE47787.2020.9255776","DOIUrl":null,"url":null,"abstract":"Moving object detection is a challenging task in computer vision. A class agnostic model is learned to detect moving objects in a video despite their category. This is done using the proposed MODSiam that takes a single background image of the scene and the current frame as input, then the model extracts features from both inputs and merges then to output the foreground objects. A comparison of using this model with three different backbone convolutional neural networks is presented. The evaluation is done using the metrics precision, recall, F1-measure, false-positive rate, false-negative rate, specificity, accuracy, and the number of frames per second. All models are tested on the benchmark dataset CDNet, which is a dataset of videos for moving objects under different conditions like low frame rate, shadows, and dynamic background. The results show that using ResNet as a backbone produced promising results compared to other models with respect to most of evaluation metrics.","PeriodicalId":296506,"journal":{"name":"2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCECE47787.2020.9255776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Moving object detection is a challenging task in computer vision. A class agnostic model is learned to detect moving objects in a video despite their category. This is done using the proposed MODSiam that takes a single background image of the scene and the current frame as input, then the model extracts features from both inputs and merges then to output the foreground objects. A comparison of using this model with three different backbone convolutional neural networks is presented. The evaluation is done using the metrics precision, recall, F1-measure, false-positive rate, false-negative rate, specificity, accuracy, and the number of frames per second. All models are tested on the benchmark dataset CDNet, which is a dataset of videos for moving objects under different conditions like low frame rate, shadows, and dynamic background. The results show that using ResNet as a backbone produced promising results compared to other models with respect to most of evaluation metrics.