首页 > 最新文献

2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)最新文献

英文 中文
An Automated and Scalable ML Solution for Mapping Invasive Species: the Case of the Australian Tree Fern in Hawaiian Forests 用于绘制入侵物种的自动化和可扩展的机器学习解决方案:夏威夷森林中澳大利亚树蕨的案例
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00019
O. Iancu, Kara Yang, Han Man, Theresa Cabrera Menard
Biodiversity loss and ecosystem degradation are global challenges demanding creative and scalable solutions. Recent increases in data collection coupled with machine learning have the potential to expand landscape monitoring capabilities. We present a computer vision solution to the problem of identifying invasive species. The Australian Tree Fern (Cyathea cooperi) is a fast growing species that is displacing slower growing native plants across the Hawaiian islands. The Nature Conservancy organization has partnered with Amazon Web Services to develop and test an automated tree fern detection and mapping solution based on imagery collected from fixed wing aircraft. We utilize deep learning to identify tree ferns and map their locations. Distinguishing between invasive and native tree ferns in aerial images is challenging for human experts. We explore techniques such as image embeddings and principal component analysis to assist in the classification. Creating quality training datasets is critical for developing ML solutions. We describe how semi-automated labeling tools can expedite this process. These steps are integrated into an automated cloud native inference pipeline that reduces localization time from weeks to minutes. We further investigate issues encountered when the pipeline is utilized on novel images and a decline in performance relative to the training data is observed. We trace the origin of the problem to a subset of images originating from steep mountain slopes and riverbanks which generate blurring and streaking patterns mistakenly labeled as tree ferns. We propose a preprocessing step based on Haralick texture features which detects and flags images different from the training set. Experimental results show that the proposed method performs well and can potentially enhance the model performance by relabeling and retraining the model iteratively.
生物多样性丧失和生态系统退化是全球性挑战,需要创造性和可扩展的解决方案。最近数据收集的增加加上机器学习有可能扩大景观监测能力。我们提出了一种识别入侵物种问题的计算机视觉解决方案。澳大利亚树蕨(Cyathea cooperi)是一种快速生长的物种,正在取代夏威夷群岛上生长较慢的本土植物。大自然保护协会与亚马逊网络服务公司合作,开发并测试了一种基于固定翼飞机收集的图像的自动树蕨检测和制图解决方案。我们利用深度学习来识别树蕨并绘制它们的位置。在航空图像中区分入侵和本地树蕨对人类专家来说是一个挑战。我们探索了图像嵌入和主成分分析等技术来协助分类。创建高质量的训练数据集对于开发机器学习解决方案至关重要。我们描述了半自动标签工具如何加快这一过程。这些步骤集成到一个自动的云原生推理管道中,将本地化时间从几周减少到几分钟。我们进一步研究了在新图像上使用管道时遇到的问题,并且观察到相对于训练数据的性能下降。我们将问题的根源追溯到来自陡峭山坡和河岸的图像子集,这些图像产生模糊和条纹图案,被错误地标记为树蕨。我们提出了一种基于Haralick纹理特征的预处理步骤,检测和标记与训练集不同的图像。实验结果表明,该方法具有良好的性能,可以通过对模型进行重新标记和再训练来提高模型的性能。
{"title":"An Automated and Scalable ML Solution for Mapping Invasive Species: the Case of the Australian Tree Fern in Hawaiian Forests","authors":"O. Iancu, Kara Yang, Han Man, Theresa Cabrera Menard","doi":"10.1109/WACVW58289.2023.00019","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00019","url":null,"abstract":"Biodiversity loss and ecosystem degradation are global challenges demanding creative and scalable solutions. Recent increases in data collection coupled with machine learning have the potential to expand landscape monitoring capabilities. We present a computer vision solution to the problem of identifying invasive species. The Australian Tree Fern (Cyathea cooperi) is a fast growing species that is displacing slower growing native plants across the Hawaiian islands. The Nature Conservancy organization has partnered with Amazon Web Services to develop and test an automated tree fern detection and mapping solution based on imagery collected from fixed wing aircraft. We utilize deep learning to identify tree ferns and map their locations. Distinguishing between invasive and native tree ferns in aerial images is challenging for human experts. We explore techniques such as image embeddings and principal component analysis to assist in the classification. Creating quality training datasets is critical for developing ML solutions. We describe how semi-automated labeling tools can expedite this process. These steps are integrated into an automated cloud native inference pipeline that reduces localization time from weeks to minutes. We further investigate issues encountered when the pipeline is utilized on novel images and a decline in performance relative to the training data is observed. We trace the origin of the problem to a subset of images originating from steep mountain slopes and riverbanks which generate blurring and streaking patterns mistakenly labeled as tree ferns. We propose a preprocessing step based on Haralick texture features which detects and flags images different from the training set. Experimental results show that the proposed method performs well and can potentially enhance the model performance by relabeling and retraining the model iteratively.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"186 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120868261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Machines Learn to Map Creative Videos to Marketing Campaigns? 机器能学会将创意视频映射到营销活动中吗?
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00056
Jarod Wang, Chirag Mandaviya
The demand for accurate estimation of marketing's incremental effect is rapidly increasing to enable marketers make informed decisions on their ad investment. The process of admapping links an ad shown to consumers on the fixed marketing channels (Linear TV, Digital, Social) to a marketing creative video. Thus, an accurate admapping, which is a special case of video copy detection, is a cornerstone of ensuring exposure of ad is linked to the correct creative and marketing campaign and hence precise marketing effect measurement. With each campaign having tens of creatives and each country (marketplace) having tens of marketing campaigns each week, the current process of human annotation of hundreds of creatives requires over 800+ team's hours annually. Moreover, this manual process causes significant challenges in onboarding new businesses and countries to measurement due to the absence of intelligent model based admapping solution. To solve this problem, we built a machine learning (ML) model that leverages fingerprinting methodology and automatic language identi-fication technology to match each creative to the marketing campaign. In the paper, we present the computing algorithm and implementation details with results from actual campaign dataset. Extensive validation and comparison studies conducted demonstrates improved mapping results with the new proposed method, achieving 87% F1 score and 82% accuracy. To our best knowledge, this is the first model that uses a fusion of visual, audio, language and metadata features for such ML based content mapping solution. The proposed method leads to 90% reduction on the time spent on admapping compared to manual solutions.
准确估计营销增量效应的需求正在迅速增加,以使营销人员能够在广告投资方面做出明智的决策。广告映射的过程将在固定营销渠道(线性电视、数字电视、社交媒体)上向消费者展示的广告与营销创意视频联系起来。因此,准确的广告映射(这是视频副本检测的一个特例)是确保广告曝光与正确的创意和营销活动相关联的基石,从而精确地衡量营销效果。每个活动都有数十个创意,每个国家(市场)每周都有数十个营销活动,目前数百个创意的人工注释过程每年需要超过800个团队小时。此外,由于缺乏基于智能模型的自适应解决方案,这种手动过程在新业务和国家的测量中造成了重大挑战。为了解决这个问题,我们建立了一个机器学习(ML)模型,利用指纹识别方法和自动语言识别技术将每个创意与营销活动相匹配。本文结合实际战役数据集的结果,给出了计算算法和实现细节。广泛的验证和比较研究表明,新方法改善了制图结果,F1得分达到87%,准确率达到82%。据我们所知,这是第一个使用视觉、音频、语言和元数据特征融合的模型,用于这种基于ML的内容映射解决方案。与手动解决方案相比,所提出的方法可以减少90%的自映射时间。
{"title":"Can Machines Learn to Map Creative Videos to Marketing Campaigns?","authors":"Jarod Wang, Chirag Mandaviya","doi":"10.1109/WACVW58289.2023.00056","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00056","url":null,"abstract":"The demand for accurate estimation of marketing's incremental effect is rapidly increasing to enable marketers make informed decisions on their ad investment. The process of admapping links an ad shown to consumers on the fixed marketing channels (Linear TV, Digital, Social) to a marketing creative video. Thus, an accurate admapping, which is a special case of video copy detection, is a cornerstone of ensuring exposure of ad is linked to the correct creative and marketing campaign and hence precise marketing effect measurement. With each campaign having tens of creatives and each country (marketplace) having tens of marketing campaigns each week, the current process of human annotation of hundreds of creatives requires over 800+ team's hours annually. Moreover, this manual process causes significant challenges in onboarding new businesses and countries to measurement due to the absence of intelligent model based admapping solution. To solve this problem, we built a machine learning (ML) model that leverages fingerprinting methodology and automatic language identi-fication technology to match each creative to the marketing campaign. In the paper, we present the computing algorithm and implementation details with results from actual campaign dataset. Extensive validation and comparison studies conducted demonstrates improved mapping results with the new proposed method, achieving 87% F1 score and 82% accuracy. To our best knowledge, this is the first model that uses a fusion of visual, audio, language and metadata features for such ML based content mapping solution. The proposed method leads to 90% reduction on the time spent on admapping compared to manual solutions.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116517237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing the Impact of Gender Misclassification on Face Recognition Accuracy 性别错误分类对人脸识别准确率的影响分析
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00037
Afi Edem Edi Gbekevi, Paloma Vela Achu, Gabriella Pangelinan, M. King, K. Bowyer
Automated face recognition technologies have been under scrutiny in recent years due to noted variations in accuracy relative to race and gender. Much of this concern was driven by media coverage of high error rates for women and persons of color reported in an evaluation of commercial gender classification ('gender from face”) tools. Many decried the conflation of errors observed in the task of gender classification with the task of face recognition. This motivated the question of whether images that are misclas-sified by a gender classification algorithm have increased error rate with face recognition algorithms. In the first experiment, we analyze the False Match Rate (FMR) of face recognition for comparisons in which one or both of the images are gender-misclassified. In the second experiment, we examine match scores of gender-misclassified images when compared to images from their labeled versus classified gender. We find that, in general, gender misclassified images are not associated with an increased FMR. For females, non-mated comparisons involving one misclassified image actually shift the resultant impostor distribution to lower similarity scores, representing improved accuracy. To our knowledge, this is the first work to analyze (1) the FMR of one- and two-misclassification error pairs and (2) non-mated match scores for misclassified images against labeled- and classified-gender categories.
近年来,由于种族和性别的准确性差异,自动人脸识别技术一直受到密切关注。这种担忧很大程度上是由于媒体报道了在商业性别分类(“面部性别”)工具评估中报告的妇女和有色人种的高错误率。许多人谴责将性别分类任务与人脸识别任务中观察到的错误混为一谈。这引发了一个问题,即被性别分类算法错误分类的图像是否会增加人脸识别算法的错误率。在第一个实验中,我们分析了人脸识别的错误匹配率(FMR),用于比较其中一个或两个图像的性别分类错误。在第二个实验中,我们检查了性别错误分类图像的匹配分数,并将其与标记性别与分类性别的图像进行了比较。我们发现,一般来说,性别错误分类的图像与FMR增加无关。对于女性来说,非交配的比较包含一个错误分类的图像,实际上会使所得的冒名顶替者分布的相似性得分降低,这代表了准确性的提高。据我们所知,这是第一个分析(1)一次和两次错误分类错误对的FMR和(2)针对标记和分类性别类别的错误分类图像的非配对匹配分数的工作。
{"title":"Analyzing the Impact of Gender Misclassification on Face Recognition Accuracy","authors":"Afi Edem Edi Gbekevi, Paloma Vela Achu, Gabriella Pangelinan, M. King, K. Bowyer","doi":"10.1109/WACVW58289.2023.00037","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00037","url":null,"abstract":"Automated face recognition technologies have been under scrutiny in recent years due to noted variations in accuracy relative to race and gender. Much of this concern was driven by media coverage of high error rates for women and persons of color reported in an evaluation of commercial gender classification ('gender from face”) tools. Many decried the conflation of errors observed in the task of gender classification with the task of face recognition. This motivated the question of whether images that are misclas-sified by a gender classification algorithm have increased error rate with face recognition algorithms. In the first experiment, we analyze the False Match Rate (FMR) of face recognition for comparisons in which one or both of the images are gender-misclassified. In the second experiment, we examine match scores of gender-misclassified images when compared to images from their labeled versus classified gender. We find that, in general, gender misclassified images are not associated with an increased FMR. For females, non-mated comparisons involving one misclassified image actually shift the resultant impostor distribution to lower similarity scores, representing improved accuracy. To our knowledge, this is the first work to analyze (1) the FMR of one- and two-misclassification error pairs and (2) non-mated match scores for misclassified images against labeled- and classified-gender categories.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129861444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the Importance of Spatio-Temporal Learning for Video Quality Assessment 论时空学习对视频质量评价的重要性
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00053
Dario Fontanel, David Higham, Benoit Vallade
Video quality assessment (VQA) has sparked a lot of interest in the computer vision community, as it plays a critical role in services that provide customers with high quality video content. Due to the lack of high quality reference videos and the difficulties in collecting subjective evaluations, assessing video quality is a challenging and still unsolved problem. Moreover, most of the public research efforts focus only on user-generated content (UGC), making it unclear if reliable solutions can be adopted for assessing the quality of production-related videos. The goal of this work is to assess the importance of spatial and temporal learning for production-related VQA. In particular, it assesses state-of-the-art UGC video quality assessment perspectives on LIVE-APV dataset, demonstrating the importance of learning contextual characteristics from each video frame, as well as capturing temporal correlations between them.
视频质量评估(VQA)在计算机视觉社区引起了很大的兴趣,因为它在为客户提供高质量视频内容的服务中起着关键作用。由于缺乏高质量的参考视频和难以收集主观评价,视频质量评估是一个具有挑战性和尚未解决的问题。此外,大多数公共研究工作只关注用户生成内容(UGC),因此不清楚是否可以采用可靠的解决方案来评估与制作相关的视频的质量。这项工作的目的是评估空间和时间学习对生产相关VQA的重要性。特别是,它评估了LIVE-APV数据集上最先进的UGC视频质量评估视角,展示了从每个视频帧中学习上下文特征的重要性,以及捕获它们之间的时间相关性。
{"title":"On the Importance of Spatio-Temporal Learning for Video Quality Assessment","authors":"Dario Fontanel, David Higham, Benoit Vallade","doi":"10.1109/WACVW58289.2023.00053","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00053","url":null,"abstract":"Video quality assessment (VQA) has sparked a lot of interest in the computer vision community, as it plays a critical role in services that provide customers with high quality video content. Due to the lack of high quality reference videos and the difficulties in collecting subjective evaluations, assessing video quality is a challenging and still unsolved problem. Moreover, most of the public research efforts focus only on user-generated content (UGC), making it unclear if reliable solutions can be adopted for assessing the quality of production-related videos. The goal of this work is to assess the importance of spatial and temporal learning for production-related VQA. In particular, it assesses state-of-the-art UGC video quality assessment perspectives on LIVE-APV dataset, demonstrating the importance of learning contextual characteristics from each video frame, as well as capturing temporal correlations between them.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128283067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object-Ratio-Preserving Video Retargeting Framework based on Segmentation and Inpainting 基于分割和补画的保持目标比例的视频重定向框架
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00055
Jun-Gyu Jin, Jaehyun Bae, Han-Gyul Baek, Sang-hyo Park
The recent development of video-based content platforms led the easy access to videos decades ago. However, some past videos have a old screen ratio. If an image with this ratio is executed on a display with a wider screen ratio, the image is excessively stretched horizontally or creates a black box, which prevents efficient viewing of content. In this paper, we propose a method for retargeting the old ratio video frames to a wider ratio while maintaining the original ratio of important objects in content using deep learning-based semantic segmentation and inpainting techniques. Our research shows that proposed method can make a retargeted frames visually natural.
最近基于视频的内容平台的发展使得几十年前的视频访问变得容易。然而,一些过去的视频有一个旧的屏幕比例。如果在具有更宽屏幕比例的显示器上执行具有此比率的图像,则图像会过度横向拉伸或创建黑框,从而妨碍有效地查看内容。在本文中,我们提出了一种方法,该方法使用基于深度学习的语义分割和绘图技术,将旧的比例视频帧重新定位到更宽的比例,同时保持内容中重要对象的原始比例。研究表明,该方法可以使重定向帧在视觉上更加自然。
{"title":"Object-Ratio-Preserving Video Retargeting Framework based on Segmentation and Inpainting","authors":"Jun-Gyu Jin, Jaehyun Bae, Han-Gyul Baek, Sang-hyo Park","doi":"10.1109/WACVW58289.2023.00055","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00055","url":null,"abstract":"The recent development of video-based content platforms led the easy access to videos decades ago. However, some past videos have a old screen ratio. If an image with this ratio is executed on a display with a wider screen ratio, the image is excessively stretched horizontally or creates a black box, which prevents efficient viewing of content. In this paper, we propose a method for retargeting the old ratio video frames to a wider ratio while maintaining the original ratio of important objects in content using deep learning-based semantic segmentation and inpainting techniques. Our research shows that proposed method can make a retargeted frames visually natural.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128969126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Survey on the Deployability of Semantic Segmentation Networks for Fluvial Navigation 河流导航语义分割网络可部署性研究
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00032
Reeve Lambert, Jianwen Li, Jalil Chavez-Galaviz, N. Mahmoudian
Neural network semantic image segmentation has developed into a powerful tool for autonomous navigational environmental comprehension in complex environments. While semantic segmentation networks have seen ample applications in the ground domain, implementations in the surface water domain, especially fluvial (rivers and streams) deployments, have lagged behind due to training data and literature sparsity issues. To tackle this problem the publicly available River Obstacle Segmentation En-Route By USV Dataset (ROSEBUD) was recently published. The dataset provides unique rural fluvial training data for the water binary segmentation task to aid in fluvial scene au-tonomous navigation. Despite new dataset sources, there is still a need for studies on networks that excel at both under-standing marine and fluvial scenes and efficiently operating on the computationally limited embedded systems that are common on autonomous marine platforms like ASVs. To provide insight into state-of-the-art network capabilities on embedded systems a survey of twelve networks encompassing 8 different architectures has been developed. Networks were trained and tested on a combination of three existing datasets, including the ROSEBUD dataset, and then implemented on an NVIDIA Jetson Nano to evaluate performance on real-world hardware. The survey's results layout recommendations for networks to use in autonomous applications in complex and fast-moving environments relative to network performance and inference speed.
神经网络语义图像分割已成为复杂环境下自主导航环境理解的有力工具。虽然语义分割网络已经在地面领域得到了广泛的应用,但由于训练数据和文献稀疏性问题,在地表水领域的实现,特别是河流和溪流的部署,已经落后。为了解决这个问题,最近发布了USV数据集(ROSEBUD)的公开河流障碍物分割。该数据集为水体二值分割任务提供了独特的农村河流训练数据,有助于河流场景自主导航。尽管有新的数据集来源,但仍然需要研究既能理解海洋和河流场景,又能在asv等自主海洋平台上常见的计算有限的嵌入式系统上有效运行的网络。为了深入了解嵌入式系统上最先进的网络功能,我们对包含8种不同架构的12个网络进行了调查。网络在三个现有数据集(包括ROSEBUD数据集)的组合上进行训练和测试,然后在NVIDIA Jetson Nano上实现,以评估实际硬件上的性能。该调查的结果布局建议网络在复杂和快速移动环境中的自主应用中使用,相对于网络性能和推理速度。
{"title":"A Survey on the Deployability of Semantic Segmentation Networks for Fluvial Navigation","authors":"Reeve Lambert, Jianwen Li, Jalil Chavez-Galaviz, N. Mahmoudian","doi":"10.1109/WACVW58289.2023.00032","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00032","url":null,"abstract":"Neural network semantic image segmentation has developed into a powerful tool for autonomous navigational environmental comprehension in complex environments. While semantic segmentation networks have seen ample applications in the ground domain, implementations in the surface water domain, especially fluvial (rivers and streams) deployments, have lagged behind due to training data and literature sparsity issues. To tackle this problem the publicly available River Obstacle Segmentation En-Route By USV Dataset (ROSEBUD) was recently published. The dataset provides unique rural fluvial training data for the water binary segmentation task to aid in fluvial scene au-tonomous navigation. Despite new dataset sources, there is still a need for studies on networks that excel at both under-standing marine and fluvial scenes and efficiently operating on the computationally limited embedded systems that are common on autonomous marine platforms like ASVs. To provide insight into state-of-the-art network capabilities on embedded systems a survey of twelve networks encompassing 8 different architectures has been developed. Networks were trained and tested on a combination of three existing datasets, including the ROSEBUD dataset, and then implemented on an NVIDIA Jetson Nano to evaluate performance on real-world hardware. The survey's results layout recommendations for networks to use in autonomous applications in complex and fast-moving environments relative to network performance and inference speed.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"502 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121019714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised Effective Resolution Estimation with Adversarial Augmentations 具有对抗性增强的自监督有效分辨率估计
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00064
Manuel Kansy, Julian Balletshofer, Jacek Naruniec, Christopher Schroers, Graziana Mignone, M. Gross, Romann M. Weber
High-resolution, high-quality images of human faces are desired as training data and output for many modern applications, such as avatar generation, face super-resolution, and face swapping. The terms high-resolution and high-quality are often used interchangeably; however, the two concepts are not equivalent, and high-resolution does not always imply high-quality. To address this, we motivate and precisely define the concept of effective resolution in this paper. We thereby draw connections to signal and information theory and show why baselines based on frequency analysis or compression fail. Instead, we propose a novel self-supervised learning scheme to train a neural network for effective resolution estimation without human-labeled data. It leverages adversarial augmentations to bridge the domain gap between synthetic and real, authentic degradations - thus allowing us to train on domains, such as hu-man faces, for which no or only few human labels exist. Finally, we demonstrate that our method outperforms state-of-the-art image quality assessment methods in estimating the sharpness of real and generated human faces, despite using only unlabeled data during training.
许多现代应用都需要高分辨率、高质量的人脸图像作为训练数据和输出,例如头像生成、人脸超分辨率和人脸交换。高分辨率和高质量这两个术语经常互换使用;然而,这两个概念并不等同,高分辨率并不总是意味着高质量。为了解决这个问题,我们在本文中激发并精确定义了有效解决的概念。因此,我们将信号和信息理论联系起来,并说明为什么基于频率分析或压缩的基线会失败。相反,我们提出了一种新的自监督学习方案来训练神经网络,以便在没有人类标记数据的情况下进行有效的分辨率估计。它利用对抗性增强来弥合合成和真实的、真实的退化之间的领域差距——从而允许我们在领域上进行训练,例如人类面孔,这些领域没有或只有很少的人类标签存在。最后,我们证明了我们的方法在估计真实人脸和生成人脸的清晰度方面优于最先进的图像质量评估方法,尽管在训练期间仅使用未标记的数据。
{"title":"Self-Supervised Effective Resolution Estimation with Adversarial Augmentations","authors":"Manuel Kansy, Julian Balletshofer, Jacek Naruniec, Christopher Schroers, Graziana Mignone, M. Gross, Romann M. Weber","doi":"10.1109/WACVW58289.2023.00064","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00064","url":null,"abstract":"High-resolution, high-quality images of human faces are desired as training data and output for many modern applications, such as avatar generation, face super-resolution, and face swapping. The terms high-resolution and high-quality are often used interchangeably; however, the two concepts are not equivalent, and high-resolution does not always imply high-quality. To address this, we motivate and precisely define the concept of effective resolution in this paper. We thereby draw connections to signal and information theory and show why baselines based on frequency analysis or compression fail. Instead, we propose a novel self-supervised learning scheme to train a neural network for effective resolution estimation without human-labeled data. It leverages adversarial augmentations to bridge the domain gap between synthetic and real, authentic degradations - thus allowing us to train on domains, such as hu-man faces, for which no or only few human labels exist. Finally, we demonstrate that our method outperforms state-of-the-art image quality assessment methods in estimating the sharpness of real and generated human faces, despite using only unlabeled data during training.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126759946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Structure Learning of Bias for Fair Affect Recognition 公平情感识别偏差的因果结构学习
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00038
Jiaee Cheong, Sinan Kalkan, H. Gunes
The problem of bias in facial affect recognition tools can lead to severe consequences and issues. It has been posited that causality is able to address the gaps induced by the associational nature of traditional machine learning, and one such gap is that of fairness. However, given the nascency of the field, there is still no clear mapping between tools in causality and applications in fair machine learning for the specific task of affect recognition. To address this gap, we provide the first causal structure formalisation of the different biases that can arise in affect recognition. We conducted a proof of concept on utilising causal structure learning for the post-hoc understanding and analysing bias.
面部情感识别工具中的偏见问题会导致严重的后果和问题。有人认为,因果关系能够解决由传统机器学习的关联性质引起的差距,其中一个差距就是公平性。然而,鉴于该领域的新生,因果关系中的工具与情感识别这一特定任务的公平机器学习中的应用之间仍然没有明确的映射。为了解决这一差距,我们提供了影响识别中可能出现的不同偏见的第一个因果结构形式化。我们对利用因果结构学习进行事后理解和分析偏差的概念验证。
{"title":"Causal Structure Learning of Bias for Fair Affect Recognition","authors":"Jiaee Cheong, Sinan Kalkan, H. Gunes","doi":"10.1109/WACVW58289.2023.00038","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00038","url":null,"abstract":"The problem of bias in facial affect recognition tools can lead to severe consequences and issues. It has been posited that causality is able to address the gaps induced by the associational nature of traditional machine learning, and one such gap is that of fairness. However, given the nascency of the field, there is still no clear mapping between tools in causality and applications in fair machine learning for the specific task of affect recognition. To address this gap, we provide the first causal structure formalisation of the different biases that can arise in affect recognition. We conducted a proof of concept on utilising causal structure learning for the post-hoc understanding and analysing bias.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126394056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
$k-text{NN}$ embeded space conditioning for enhanced few-shot object detection $k-text{NN}$嵌入空间调节增强的少镜头目标检测
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00044
Stefan Matcovici, D. Voinea, A. Popa
Few-shot learning has attracted significant scientific interest in the past decade due to its applicability to visual tasks with a natural long-tailed distribution such as object detection. This paper introduces a novel and flexible few-shot object detection approach which can be adapted effortlessly to any candidate-based object detection frame-work. In particular, our proposed $boldsymbol{kFEW}$ component leverages a kNN retrieval technique over the regions of interest space to build both a class-distribution and a weighted aggregated embedding conditioned by the recovered neighbours. The obtained kNN feature representation is used to drive the training process without any additional trainable parameters as well as during inference time by steering the assumed confidence and the predicted box coordinates of the detection model. We perform extensive experiments and ablation studies on MS COCO and Pascal VOC proving its efficiency and state-of-the-art results (by a margin of 2.3 mAP points on MS COCO and by a margin of 2.5 mAP points on Pascal VOC) in the context of few-shot-object detection. Additionally, we demonstrate its versatility and ease-of-integration aspect by incorporating over competitive few-shot object detection methods and providing superior results.
在过去的十年中,由于它适用于具有自然长尾分布的视觉任务,如目标检测,因此引起了科学界的极大兴趣。本文介绍了一种新颖灵活的小镜头目标检测方法,该方法可以轻松适应任何基于候选对象的目标检测框架。特别地,我们提出的$boldsymbol{kFEW}$组件利用感兴趣空间区域上的kNN检索技术来构建类分布和由恢复的邻居决定的加权聚合嵌入。获得的kNN特征表示用于在没有任何额外可训练参数的情况下驱动训练过程,并在推理期间通过控制检测模型的假设置信度和预测框坐标来驱动训练过程。我们对MS COCO和Pascal VOC进行了广泛的实验和烧蚀研究,以证明其效率和最先进的结果(MS COCO的mAP点相差2.3个,Pascal VOC的mAP点相差2.5个)。此外,我们展示了其通用性和易于集成方面,通过合并竞争激烈的少数镜头目标检测方法,并提供优越的结果。
{"title":"$k-text{NN}$ embeded space conditioning for enhanced few-shot object detection","authors":"Stefan Matcovici, D. Voinea, A. Popa","doi":"10.1109/WACVW58289.2023.00044","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00044","url":null,"abstract":"Few-shot learning has attracted significant scientific interest in the past decade due to its applicability to visual tasks with a natural long-tailed distribution such as object detection. This paper introduces a novel and flexible few-shot object detection approach which can be adapted effortlessly to any candidate-based object detection frame-work. In particular, our proposed $boldsymbol{kFEW}$ component leverages a kNN retrieval technique over the regions of interest space to build both a class-distribution and a weighted aggregated embedding conditioned by the recovered neighbours. The obtained kNN feature representation is used to drive the training process without any additional trainable parameters as well as during inference time by steering the assumed confidence and the predicted box coordinates of the detection model. We perform extensive experiments and ablation studies on MS COCO and Pascal VOC proving its efficiency and state-of-the-art results (by a margin of 2.3 mAP points on MS COCO and by a margin of 2.5 mAP points on Pascal VOC) in the context of few-shot-object detection. Additionally, we demonstrate its versatility and ease-of-integration aspect by incorporating over competitive few-shot object detection methods and providing superior results.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127957521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copyright and Reprint Permissions 版权和转载权限
Pub Date : 2023-01-01 DOI: 10.1109/wacvw58289.2023.00003
{"title":"Copyright and Reprint Permissions","authors":"","doi":"10.1109/wacvw58289.2023.00003","DOIUrl":"https://doi.org/10.1109/wacvw58289.2023.00003","url":null,"abstract":"","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130564460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1