S. Loke, B. MacDonald, Matthew Parsons, B. Wünsche
{"title":"头部和上身的快速肖像分割","authors":"S. Loke, B. MacDonald, Matthew Parsons, B. Wünsche","doi":"10.1109/IVCNZ51579.2020.9290654","DOIUrl":null,"url":null,"abstract":"Portrait segmentation is the process whereby the head and upper body of a person is separated from the background of an image or video stream. This is difficult to achieve accurately, although good results have been obtained with deep learning methods which cope well with occlusion, pose and illumination changes. These are however, either slow or require a powerful system to operate in real-time. We present a new method of portrait segmentation called FaceSeg which uses fast DBSCAN clustering combined with smart face tracking that can replicate the benefits and accuracy of deep learning methods at a much faster speed. In a direct comparison using a standard testing suite, our method achieved a segmentation speed of 150 fps for a 640x480 video stream with median accuracy and F1 scores of 99.96% and 99.93% respectively on simple backgrounds, with 98.81% and 98.13% on complex backgrounds. The state-of-art deep learning based FastPortrait / Mobile Neural Network method achieved 15 fps with 99.95% accuracy and 99.91% F1 score on simple backgrounds, and 99.01% accuracy and 98.43 F1 score on complex backgrounds. An efficacy-boosted implementation for FaceSeg can achieve 75 fps with 99.23% accuracy and 98.79% F1 score on complex backgrounds.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"478 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fast Portrait Segmentation of the Head and Upper Body\",\"authors\":\"S. Loke, B. MacDonald, Matthew Parsons, B. Wünsche\",\"doi\":\"10.1109/IVCNZ51579.2020.9290654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Portrait segmentation is the process whereby the head and upper body of a person is separated from the background of an image or video stream. This is difficult to achieve accurately, although good results have been obtained with deep learning methods which cope well with occlusion, pose and illumination changes. These are however, either slow or require a powerful system to operate in real-time. We present a new method of portrait segmentation called FaceSeg which uses fast DBSCAN clustering combined with smart face tracking that can replicate the benefits and accuracy of deep learning methods at a much faster speed. In a direct comparison using a standard testing suite, our method achieved a segmentation speed of 150 fps for a 640x480 video stream with median accuracy and F1 scores of 99.96% and 99.93% respectively on simple backgrounds, with 98.81% and 98.13% on complex backgrounds. The state-of-art deep learning based FastPortrait / Mobile Neural Network method achieved 15 fps with 99.95% accuracy and 99.91% F1 score on simple backgrounds, and 99.01% accuracy and 98.43 F1 score on complex backgrounds. An efficacy-boosted implementation for FaceSeg can achieve 75 fps with 99.23% accuracy and 98.79% F1 score on complex backgrounds.\",\"PeriodicalId\":164317,\"journal\":{\"name\":\"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)\",\"volume\":\"478 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVCNZ51579.2020.9290654\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVCNZ51579.2020.9290654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
人像分割是一个过程,其中一个人的头部和上身从图像或视频流的背景分离。这很难准确地实现,尽管深度学习方法可以很好地处理遮挡、姿势和照明变化,但已经获得了很好的结果。然而,这些要么很慢,要么需要一个强大的系统来实时操作。我们提出了一种新的肖像分割方法,称为FaceSeg,它使用快速DBSCAN聚类结合智能人脸跟踪,可以以更快的速度复制深度学习方法的优点和准确性。在与标准测试套件的直接比较中,我们的方法对640x480视频流实现了150 fps的分割速度,在简单背景下的中位数准确率和F1分数分别为99.96%和99.93%,在复杂背景下为98.81%和98.13%。基于最先进深度学习的FastPortrait / Mobile Neural Network方法在简单背景下实现了15 fps,准确率为99.95%,F1分数为99.91%;在复杂背景下实现了99.01%,F1分数为98.43。在复杂背景下,FaceSeg的效率提升实现可以达到75 fps,准确率为99.23%,F1分数为98.79%。
Fast Portrait Segmentation of the Head and Upper Body
Portrait segmentation is the process whereby the head and upper body of a person is separated from the background of an image or video stream. This is difficult to achieve accurately, although good results have been obtained with deep learning methods which cope well with occlusion, pose and illumination changes. These are however, either slow or require a powerful system to operate in real-time. We present a new method of portrait segmentation called FaceSeg which uses fast DBSCAN clustering combined with smart face tracking that can replicate the benefits and accuracy of deep learning methods at a much faster speed. In a direct comparison using a standard testing suite, our method achieved a segmentation speed of 150 fps for a 640x480 video stream with median accuracy and F1 scores of 99.96% and 99.93% respectively on simple backgrounds, with 98.81% and 98.13% on complex backgrounds. The state-of-art deep learning based FastPortrait / Mobile Neural Network method achieved 15 fps with 99.95% accuracy and 99.91% F1 score on simple backgrounds, and 99.01% accuracy and 98.43 F1 score on complex backgrounds. An efficacy-boosted implementation for FaceSeg can achieve 75 fps with 99.23% accuracy and 98.79% F1 score on complex backgrounds.