Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition最新文献_第2页

Self-Supervised Keypoint Discovery in Behavioral Videos. 行为视频中的自监督关键点发现。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.00221

Jennifer J Sun, Serim Ryou, Roni H Goldshmid, Brandon Weissbourd, John O Dabiri, David J Anderson, Ann Kennedy, Yisong Yue, Pietro Perona

We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video frames. By focusing only on regions of movement, our approach works directly on input videos without requiring manual annotations. Experiments on a variety of agent types (mouse, fly, human, jellyfish, and trees) demonstrate the generality of our approach and reveal that our discovered keypoints represent semantically meaningful body parts, which achieve state-of-the-art performance on keypoint regression among self-supervised methods. Additionally, B-KinD achieve comparable performance to supervised keypoints on downstream tasks, such as behavior classification, suggesting that our method can dramatically reduce model training costs vis-a-vis supervised methods.

我们提出了一种从无标签行为视频中学习代理姿态和结构的方法。行为视频中的主要运动来源通常是行为主体，从这一观察出发，我们的方法--行为关键点发现（B-KinD）--使用具有几何瓶颈的编码器-解码器架构来重建视频帧之间的时空差异。通过只关注运动区域，我们的方法可直接用于输入视频，而无需手动注释。在各种类型的物体（小鼠、苍蝇、人类、水母和树木）上进行的实验证明了我们方法的通用性，并揭示了我们发现的关键点代表了具有语义意义的身体部位，在自我监督方法中的关键点回归方面达到了最先进的性能。此外，B-KinD 在下游任务（如行为分类）中的表现与监督关键点不相上下，这表明与监督方法相比，我们的方法可以显著降低模型训练成本。

{"title":"Self-Supervised Keypoint Discovery in Behavioral Videos.","authors":"Jennifer J Sun, Serim Ryou, Roni H Goldshmid, Brandon Weissbourd, John O Dabiri, David J Anderson, Ann Kennedy, Yisong Yue, Pietro Perona","doi":"10.1109/cvpr52688.2022.00221","DOIUrl":"10.1109/cvpr52688.2022.00221","url":null,"abstract":"We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video frames. By focusing only on regions of movement, our approach works directly on input videos without requiring manual annotations. Experiments on a variety of agent types (mouse, fly, human, jellyfish, and trees) demonstrate the generality of our approach and reveal that our discovered keypoints represent semantically meaningful body parts, which achieve state-of-the-art performance on keypoint regression among self-supervised methods. Additionally, B-KinD achieve comparable performance to supervised keypoints on downstream tasks, such as behavior classification, suggesting that our method can dramatically reduce model training costs vis-a-vis supervised methods.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2022 ","pages":"2161-2170"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9829414/pdf/nihms-1857208.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9088239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis. DiRA：用于自我监督医学图像分析的判别、恢复和对抗学习。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.02016

Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Michael B Gotway, Jianming Liang

Discriminative learning, restorative learning, and adversarial learning have proven beneficial for self-supervised learning schemes in computer vision and medical imaging. Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can significantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. Our extensive experiments demonstrate that DiRA (1) encourages collaborative learning among three learning ingredients, resulting in more generalizable representation across organs, diseases, and modalities; (2) outperforms fully supervised ImageNet models and increases robustness in small data regimes, reducing annotation cost across multiple medical imaging applications; (3) learns fine-grained semantic representation, facilitating accurate lesion localization with only image-level annotation; and (4) enhances state-of-the-art restorative approaches, revealing that DiRA is a general mechanism for united representation learning. All code and pretrained models are available at https://github.com/JLiangLab/DiRA.

事实证明，判别学习、恢复学习和对抗学习有利于计算机视觉和医学成像中的自我监督学习方案。然而，现有的努力忽略了它们在三元设置中的相互协同效应，而我们的设想是，这将大大有利于深度语义表征学习。为了实现这一愿景，我们开发了 DiRA，这是第一个以统一的方式将判别学习、恢复学习和对抗学习结合在一起的框架，可从无标记的医学图像中协同收集互补的视觉信息，用于细粒度语义表征学习。我们的大量实验证明，DiRA（1）鼓励三种学习成分之间的协作学习，从而产生跨器官、疾病和模式的更具通用性的表示；（2）优于完全监督的ImageNet模型，并提高了小数据环境下的鲁棒性，降低了多种医学成像应用的注释成本；（3）学习细粒度语义表示，促进了仅使用图像级注释的精确病变定位；以及（4）增强了最先进的恢复性方法，揭示了DiRA是一种用于联合表示学习的通用机制。所有代码和预训练模型可在 https://github.com/JLiangLab/DiRA 网站上获取。

{"title":"DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis.","authors":"Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Michael B Gotway, Jianming Liang","doi":"10.1109/cvpr52688.2022.02016","DOIUrl":"10.1109/cvpr52688.2022.02016","url":null,"abstract":"Discriminative learning, restorative learning, and adversarial learning have proven beneficial for self-supervised learning schemes in computer vision and medical imaging. Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can significantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. Our extensive experiments demonstrate that DiRA (1) encourages collaborative learning among three learning ingredients, resulting in more generalizable representation across organs, diseases, and modalities; (2) outperforms fully supervised ImageNet models and increases robustness in small data regimes, reducing annotation cost across multiple medical imaging applications; (3) learns fine-grained semantic representation, facilitating accurate lesion localization with only image-level annotation; and (4) enhances state-of-the-art restorative approaches, revealing that DiRA is a general mechanism for united representation learning. All code and pretrained models are available at https://github.com/JLiangLab/DiRA.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":" ","pages":"20792-20802"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9615927/pdf/nihms-1812882.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40436067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content from Parameterized Transformations. 和谐：一种从参数化转换中分离语义内容的通用无监督方法。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.01999

Mostofa Rafid Uddin, Gregory Howe, Xiangrui Zeng, Min Xu

In many real-life image analysis applications, particularly in biomedical research domains, the objects of interest undergo multiple transformations that alters their visual properties while keeping the semantic content unchanged. Disentangling images into semantic content factors and transformations can provide significant benefits into many domain-specific image analysis tasks. To this end, we propose a generic unsupervised framework, Harmony, that simultaneously and explicitly disentangles semantic content from multiple parameterized transformations. Harmony leverages a simple cross-contrastive learning framework with multiple explicitly parameterized latent representations to disentangle content from transformations. To demonstrate the efficacy of Harmony, we apply it to disentangle image semantic content from several parameterized transformations (rotation, translation, scaling, and contrast). Harmony achieves significantly improved disentanglement over the baseline models on several image datasets of diverse domains. With such disentanglement, Harmony is demonstrated to incentivize bioimage analysis research by modeling structural heterogeneity of macromolecules from cryo-ET images and learning transformation-invariant representations of protein particles from single-particle cryo-EM images. Harmony also performs very well in disentangling content from 3D transformations and can perform coarse and fast alignment of 3D cryo-ET subtomograms. Therefore, Harmony is generalizable to many other imaging domains and can potentially be extended to domains beyond imaging as well.

在许多现实生活中的图像分析应用中，特别是在生物医学研究领域，感兴趣的对象会经历多次转换，从而改变其视觉特性，同时保持语义内容不变。将图像分解为语义内容因素和转换可以为许多特定领域的图像分析任务提供显著的好处。为此，我们提出了一个通用的无监督框架Harmony，它同时明确地将语义内容从多个参数化转换中分离出来。Harmony利用一个简单的交叉对比学习框架，该框架具有多个显式参数化的潜在表示，以将内容与转换区分开来。为了证明Harmony的有效性，我们将其应用于从几个参数化转换（旋转、平移、缩放和对比）中分离图像语义内容。Harmony在不同领域的几个图像数据集上实现了与基线模型相比的显著改进的解纠缠。有了这种解开，Harmony被证明可以通过从冷冻ET图像中模拟大分子的结构异质性，并从单颗粒冷冻EM图像中学习蛋白质颗粒的变换不变表示来激励生物图像分析研究。Harmony在从3D转换中解开内容方面也表现得很好，并且可以对3D cryo-ET子图进行粗略和快速的对齐。因此，Harmony可推广到许多其他成像领域，并有可能扩展到成像以外的领域。

{"title":"Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content from Parameterized Transformations.","authors":"Mostofa Rafid Uddin, Gregory Howe, Xiangrui Zeng, Min Xu","doi":"10.1109/cvpr52688.2022.01999","DOIUrl":"10.1109/cvpr52688.2022.01999","url":null,"abstract":"In many real-life image analysis applications, particularly in biomedical research domains, the objects of interest undergo multiple transformations that alters their visual properties while keeping the semantic content unchanged. Disentangling images into semantic content factors and transformations can provide significant benefits into many domain-specific image analysis tasks. To this end, we propose a generic unsupervised framework, Harmony, that simultaneously and explicitly disentangles semantic content from multiple parameterized transformations. Harmony leverages a simple cross-contrastive learning framework with multiple explicitly parameterized latent representations to disentangle content from transformations. To demonstrate the efficacy of Harmony, we apply it to disentangle image semantic content from several parameterized transformations (rotation, translation, scaling, and contrast). Harmony achieves significantly improved disentanglement over the baseline models on several image datasets of diverse domains. With such disentanglement, Harmony is demonstrated to incentivize bioimage analysis research by modeling structural heterogeneity of macromolecules from cryo-ET images and learning transformation-invariant representations of protein particles from single-particle cryo-EM images. Harmony also performs very well in disentangling content from 3D transformations and can perform coarse and fast alignment of 3D cryo-ET subtomograms. Therefore, Harmony is generalizable to many other imaging domains and can potentially be extended to domains beyond imaging as well.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":" ","pages":"20614-20623"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9521798/pdf/nihms-1794246.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40392959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding Uncertainty Maps in Vision with Statistical Testing. 通过统计测试了解视觉中的不确定性图。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.00050

Jurijs Nazarovs, Zhichun Huang, Songwong Tasneeyapant, Rudrasis Chakraborty, Vikas Singh

Quantitative descriptions of confidence intervals and uncertainties of the predictions of a model are needed in many applications in vision and machine learning. Mechanisms that enable this for deep neural network (DNN) models are slowly becoming available, and occasionally, being integrated within production systems. But the literature is sparse in terms of how to perform statistical tests with the uncertainties produced by these overparameterized models. For two models with a similar accuracy profile, is the former model's uncertainty behavior better in a statistically significant sense compared to the second model? For high resolution images, performing hypothesis tests to generate meaningful actionable information (say, at a user specified significance level $α = 0.05$ ) is difficult but needed in both mission critical settings and elsewhere. In this paper, specifically for uncertainties defined on images, we show how revisiting results from Random Field theory (RFT) when paired with DNN tools (to get around computational hurdles) leads to efficient frameworks that can provide a hypothesis test capabilities, not otherwise available, for uncertainty maps from models used in many vision tasks. We show via many different experiments the viability of this framework.

在视觉和机器学习的许多应用中，都需要对模型预测的置信区间和不确定性进行定量描述。能够为深度神经网络（DNN）模型实现这一点的机制正在慢慢出现，并偶尔被集成到生产系统中。但是，关于如何对这些参数过大的模型所产生的不确定性进行统计测试，相关文献却很少。对于精度相似的两个模型，前一个模型的不确定性行为在统计意义上是否比后一个模型更好？对于高分辨率图像，进行假设检验以生成有意义的可操作信息（例如，用户指定的显著性水平 α=0.05）是很困难的，但在关键任务环境和其他地方都是需要的。在本文中，我们特别针对图像上定义的不确定性，展示了如何重新审视随机场理论（RFT）的结果，并将其与 DNN 工具相结合（以绕过计算障碍），从而形成高效的框架，为许多视觉任务中使用的模型的不确定性映射提供假设检验功能，这是其他方法无法提供的。我们通过许多不同的实验展示了这一框架的可行性。

{"title":"Understanding Uncertainty Maps in Vision with Statistical Testing.","authors":"Jurijs Nazarovs, Zhichun Huang, Songwong Tasneeyapant, Rudrasis Chakraborty, Vikas Singh","doi":"10.1109/cvpr52688.2022.00050","DOIUrl":"10.1109/cvpr52688.2022.00050","url":null,"abstract":"Quantitative descriptions of confidence intervals and uncertainties of the predictions of a model are needed in many applications in vision and machine learning. Mechanisms that enable this for deep neural network (DNN) models are slowly becoming available, and occasionally, being integrated within production systems. But the literature is sparse in terms of how to perform statistical tests with the uncertainties produced by these overparameterized models. For two models with a similar accuracy profile, is the former model's uncertainty behavior better in a statistically significant sense compared to the second model? For high resolution images, performing hypothesis tests to generate meaningful actionable information (say, at a user specified significance level <math><mrow><mi>α</mi><mo>=</mo><mn>0.05</mn></mrow></math>) is difficult but needed in both mission critical settings and elsewhere. In this paper, specifically for uncertainties defined on images, we show how revisiting results from Random Field theory (RFT) when paired with DNN tools (to get around computational hurdles) leads to efficient frameworks that can provide a hypothesis test capabilities, not otherwise available, for uncertainty maps from models used in many vision tasks. We show via many different experiments the viability of this framework.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2022 ","pages":"406-416"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10205027/pdf/nihms-1894544.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9514151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Unlearning via Randomized Conditionally Independent Hessians. 通过随机条件独立哈希值进行深度非学习

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.01017

Ronak Mehta, Sourav Pal, Vikas Singh, Sathya N Ravi

Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deleting the closest original sample can be effective. But this idea is inapplicable to models which learn richer representations. Recent ideas leveraging optimization-based updates scale poorly with the model dimension d, due to inverting the Hessian of the loss function. We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. Our approach completely avoids the need to invert a (possibly) huge matrix. By utilizing a Markov blanket selection, we premise that L-CODEC is also suitable for deep unlearning, as well as other applications in vision. Compared to alternatives, L-CODEC makes approximate unlearning possible in settings that would otherwise be infeasible, including vision models used for face recognition, person re-identification and NLP models that may require unlearning samples identified for exclusion. Code is available at https://github.com/vsingh-group/LCODEC-deep-unlearning.

最近的立法引起了人们对机器取消学习的兴趣，即从预测模型中删除特定的训练样本，就好像这些样本从未出现在训练数据集中一样。由于数据被破坏/具有对抗性，或者仅仅是用户更新了隐私要求，也可能需要取消学习。对于无需训练的模型（k-NN），只需删除最接近的原始样本即可有效。但这种想法不适用于学习更丰富表征的模型。最近的一些想法利用了基于优化的更新，但随着模型维度 d 的增加，效果不佳，这是因为损失函数的 Hessian 会倒置。我们使用一种新的条件独立性系数 L-CODEC 的变体，来识别在单个样本层面上语义重叠最多的模型参数子集。我们的方法完全避免了反转一个（可能）巨大矩阵的需要。通过利用马尔可夫空白选择，我们认为 L-CODEC 也适用于深度学习以及视觉领域的其他应用。与其他替代方法相比，L-CODEC 使近似解学习成为可能，否则这些方法将不可行，包括用于人脸识别、人物再识别的视觉模型，以及可能需要解学习被识别为排除样本的 NLP 模型。代码见 https://github.com/vsingh-group/LCODEC-deep-unlearning。

{"title":"Deep Unlearning via Randomized Conditionally Independent Hessians.","authors":"Ronak Mehta, Sourav Pal, Vikas Singh, Sathya N Ravi","doi":"10.1109/cvpr52688.2022.01017","DOIUrl":"10.1109/cvpr52688.2022.01017","url":null,"abstract":"Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deleting the closest original sample can be effective. But this idea is inapplicable to models which learn richer representations. Recent ideas leveraging optimization-based updates scale poorly with the model dimension d, due to inverting the Hessian of the loss function. We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. Our approach completely avoids the need to invert a (possibly) huge matrix. By utilizing a Markov blanket selection, we premise that L-CODEC is also suitable for deep unlearning, as well as other applications in vision. Compared to alternatives, L-CODEC makes approximate unlearning possible in settings that would otherwise be infeasible, including vision models used for face recognition, person re-identification and NLP models that may require unlearning samples identified for exclusion. Code is available at https://github.com/vsingh-group/LCODEC-deep-unlearning.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2022 ","pages":"10412-10421"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337718/pdf/nihms-1894549.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9820007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Texture-based Error Analysis for Image Super-Resolution. 基于纹理的图像超分辨率误差分析

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.00216

Salma Abdel Magid, Zudi Lin, Donglai Wei, Yulun Zhang, Jinjin Gu, Hanspeter Pfister

Evaluation practices for image super-resolution (SR) use a single-value metric, the PSNR or SSIM, to determine model performance. This provides little insight into the source of errors and model behavior. Therefore, it is beneficial to move beyond the conventional approach and reconceptualize evaluation with interpretability as our main priority. We focus on a thorough error analysis from a variety of perspectives. Our key contribution is to leverage a texture classifier, which enables us to assign patches with semantic labels, to identify the source of SR errors both globally and locally. We then use this to determine (a) the semantic alignment of SR datasets, (b) how SR models perform on each label, (c) to what extent high-resolution (HR) and SR patches semantically correspond, and more. Through these different angles, we are able to highlight potential pitfalls and blindspots. Our overall investigation highlights numerous unexpected insights. We hope this work serves as an initial step for debugging blackbox SR networks.

图像超分辨率（SR）的评估实践使用单值指标（PSNR 或 SSIM）来确定模型性能。这对于误差来源和模型行为的了解甚少。因此，我们有必要超越传统方法，以可解释性为重，重新构思评估方法。我们侧重于从不同角度进行全面的误差分析。我们的主要贡献是利用纹理分类器（它使我们能够为补丁分配语义标签）来识别全局和局部 SR 错误的来源。然后，我们以此来确定：(a) SR 数据集的语义一致性；(b) SR 模型在每个标签上的表现；(c) 高分辨率（HR）和 SR 补丁的语义对应程度等等。通过这些不同的角度，我们能够突出潜在的陷阱和盲点。我们的整体调查凸显了许多意想不到的见解。我们希望这项工作能成为调试黑盒 SR 网络的第一步。

{"title":"Texture-based Error Analysis for Image Super-Resolution.","authors":"Salma Abdel Magid, Zudi Lin, Donglai Wei, Yulun Zhang, Jinjin Gu, Hanspeter Pfister","doi":"10.1109/cvpr52688.2022.00216","DOIUrl":"10.1109/cvpr52688.2022.00216","url":null,"abstract":"Evaluation practices for image super-resolution (SR) use a single-value metric, the PSNR or SSIM, to determine model performance. This provides little insight into the source of errors and model behavior. Therefore, it is beneficial to move beyond the conventional approach and reconceptualize evaluation with interpretability as our main priority. We focus on a thorough error analysis from a variety of perspectives. Our key contribution is to leverage a texture classifier, which enables us to assign patches with semantic labels, to identify the source of SR errors both globally and locally. We then use this to determine (a) the semantic alignment of SR datasets, (b) how SR models perform on each label, (c) to what extent high-resolution (HR) and SR patches semantically correspond, and more. Through these different angles, we are able to highlight potential pitfalls and blindspots. Our overall investigation highlights numerous unexpected insights. We hope this work serves as an initial step for debugging blackbox SR networks.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":" ","pages":"2108-2117"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9719360/pdf/nihms-1852695.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35345911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides. DeepLIIF:临床病理切片定量的在线平台。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2022-01-01

Parmida Ghahremani, Joseph Marino, Ricardo Dodds, Saad Nadeem

In the clinic, resected tissue samples are stained with Hematoxylin-and-Eosin (H&E) and/or Immunhistochemistry (IHC) stains and presented to the pathologists on glass slides or as digital scans for diagnosis and assessment of disease progression. Cell-level quantification, e.g. in IHC protein expression scoring, can be extremely inefficient and subjective. We present DeepLIIF (https://deepliif.org), a first free online platform for efficient and reproducible IHC scoring. DeepLIIF outperforms current state-of-the-art approaches (relying on manual error-prone annotations) by virtually restaining clinical IHC slides with more informative multiplex immunofluorescence staining. Our DeepLIIF cloud-native platform supports (1) more than 150 proprietary/non-proprietary input formats via the Bio-Formats standard, (2) interactive adjustment, visualization, and downloading of the IHC quantification results and the accompanying restained images, (3) consumption of an exposed workflow API programmatically or through interactive plugins for open source whole slide image viewers such as QuPath/ImageJ, and (4) auto scaling to efficiently scale GPU resources based on user demand.

在临床上，切除的组织样本用苏木精和伊红(H&E)和/或免疫组织化学(IHC)染色，并在玻片上或作为数字扫描呈现给病理学家，以诊断和评估疾病进展。细胞水平的定量，例如免疫结构蛋白表达评分，可能非常低效和主观。我们提出DeepLIIF (https://deepliif.org)，这是第一个用于高效和可重复IHC评分的免费在线平台。DeepLIIF优于目前最先进的方法(依赖于人工容易出错的注释)，通过更多信息的多重免疫荧光染色几乎保留临床IHC玻片。我们的DeepLIIF云原生平台支持(1)通过Bio-Formats标准支持150多种专有/非专有输入格式，(2)交互式调整、可视化和下载IHC量化结果和随附的保留图像，(3)以编程方式或通过开源整张幻灯片图像查看器(如QuPath/ImageJ)的交互式插件使用暴露的工作流API，以及(4)自动缩放以根据用户需求有效缩放GPU资源。

{"title":"DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides.","authors":"Parmida Ghahremani, Joseph Marino, Ricardo Dodds, Saad Nadeem","doi":"","DOIUrl":"","url":null,"abstract":"In the clinic, resected tissue samples are stained with Hematoxylin-and-Eosin (H&E) and/or Immunhistochemistry (IHC) stains and presented to the pathologists on glass slides or as digital scans for diagnosis and assessment of disease progression. Cell-level quantification, e.g. in IHC protein expression scoring, can be extremely inefficient and subjective. We present DeepLIIF (https://deepliif.org), a first free online platform for efficient and reproducible IHC scoring. DeepLIIF outperforms current state-of-the-art approaches (relying on manual error-prone annotations) by virtually restaining clinical IHC slides with more informative multiplex immunofluorescence staining. Our DeepLIIF cloud-native platform supports (1) more than 150 proprietary/non-proprietary input formats via the Bio-Formats standard, (2) interactive adjustment, visualization, and downloading of the IHC quantification results and the accompanying restained images, (3) consumption of an exposed workflow API programmatically or through interactive plugins for open source whole slide image viewers such as QuPath/ImageJ, and (4) auto scaling to efficiently scale GPU resources based on user demand.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":" ","pages":"21399-21405"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9494834/pdf/nihms-1836723.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33485228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Metadata Normalization. 元数据标准化。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2021-06-01 Epub Date: 2021-11-02 DOI: 10.1109/cvpr46437.2021.01077

Mandy Lu, Qingyu Zhao, Jiequan Zhang, Kilian M Pohl, Li Fei-Fei, Juan Carlos Niebles, Ehsan Adeli

Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metadata here, may create bias or confounding effects (e.g., race when classifying gender from face images). We introduce the Metadata Normalization (MDN) layer, a new batch-level operation which can be used end-to-end within the training framework, to correct the influence of metadata on feature distributions. MDN adopts a regression analysis technique traditionally used for preprocessing to remove (regress out) the metadata effects on model features during training. We utilize a metric based on distance correlation to quantify the distribution bias from the metadata and demonstrate that our method successfully removes metadata effects on four diverse settings: one synthetic, one 2D image, one video, and one 3D medical image dataset.

批量归一化（BN）及其变体在对抗深度学习方法的训练步骤引起的协变量偏移方面取得了巨大成功。虽然这些技术通过使用批统计进行标准化来规范特征分布，但它们并不能纠正外部变量或多个分布对特征的影响。这种额外的变量，在这里被称为元数据，可能会产生偏见或混淆效应（例如，从人脸图像中对性别进行分类时的种族）。我们引入了元数据规范化（MDN）层，这是一种新的批处理级操作，可以在训练框架内端到端使用，以纠正元数据对特征分布的影响。MDN采用了传统上用于预处理的回归分析技术，以消除（回归）训练过程中元数据对模型特征的影响。我们利用基于距离相关性的度量来量化元数据的分布偏差，并证明我们的方法成功地消除了四种不同设置下的元数据影响：一个合成图像、一个2D图像、一段视频和一个3D医学图像数据集。

{"title":"Metadata Normalization.","authors":"Mandy Lu, Qingyu Zhao, Jiequan Zhang, Kilian M Pohl, Li Fei-Fei, Juan Carlos Niebles, Ehsan Adeli","doi":"10.1109/cvpr46437.2021.01077","DOIUrl":"10.1109/cvpr46437.2021.01077","url":null,"abstract":"Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metadata here, may create bias or confounding effects (e.g., race when classifying gender from face images). We introduce the Metadata Normalization (MDN) layer, a new batch-level operation which can be used end-to-end within the training framework, to correct the influence of metadata on feature distributions. MDN adopts a regression analysis technique traditionally used for preprocessing to remove (regress out) the metadata effects on model features during training. We utilize a metric based on distance correlation to quantify the distribution bias from the metadata and demonstrate that our method successfully removes metadata effects on four diverse settings: one synthetic, one 2D image, one video, and one 3D medical image dataset.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":" ","pages":"10912-10922"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8589298/pdf/nihms-1710131.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39622727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Simpler Certified Radius Maximization by Propagating Covariances. 通过传播协方差简化认证半径最大化。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2021-06-01 Epub Date: 2021-11-02 DOI: 10.1109/cvpr46437.2021.00721

Xingjian Zhen, Rudrasis Chakraborty, Vikas Singh

One strategy for adversarially training a robust model is to maximize its certified radius - the neighborhood around a given training sample for which the model's prediction remains unchanged. The scheme typically involves analyzing a "smoothed" classifier where one estimates the prediction corresponding to Gaussian samples in the neighborhood of each sample in the mini-batch, accomplished in practice by Monte Carlo sampling. In this paper, we investigate the hypothesis that this sampling bottleneck can potentially be mitigated by identifying ways to directly propagate the covariance matrix of the smoothed distribution through the network. To this end, we find that other than certain adjustments to the network, propagating the covariances must also be accompanied by additional accounting that keeps track of how the distributional moments transform and interact at each stage in the network. We show how satisfying these criteria yields an algorithm for maximizing the certified radius on datasets including Cifar-10, ImageNet, and Places365 while offering runtime savings on networks with moderate depth, with a small compromise in overall accuracy. We describe the details of the key modifications that enable practical use. Via various experiments, we evaluate when our simplifications are sensible, and what the key benefits and limitations are.

对抗训练稳健模型的一种策略是最大化其认证半径--即模型预测保持不变的给定训练样本周围的邻域。该方案通常涉及分析一个 "平滑 "分类器，在该分类器中，我们要估计与迷你批次中每个样本邻域内高斯样本相对应的预测值，这在实践中是通过蒙特卡罗采样完成的。在本文中，我们研究了一种假设，即通过确定直接在网络中传播平滑分布协方差矩阵的方法，有可能缓解这种采样瓶颈。为此，我们发现，除了对网络进行某些调整外，传播协方差还必须伴有额外的核算，以跟踪分布矩如何在网络的每个阶段发生转换和相互作用。我们展示了如何通过满足这些标准，在包括 Cifar-10、ImageNet 和 Places365 在内的数据集上实现认证半径最大化，同时在中等深度的网络上节省运行时间，并在总体准确性上略有妥协。我们将详细介绍实现实际应用的关键修改。通过各种实验，我们评估了简化的合理性，以及关键的优势和局限性。

{"title":"Simpler Certified Radius Maximization by Propagating Covariances.","authors":"Xingjian Zhen, Rudrasis Chakraborty, Vikas Singh","doi":"10.1109/cvpr46437.2021.00721","DOIUrl":"10.1109/cvpr46437.2021.00721","url":null,"abstract":"One strategy for adversarially training a robust model is to maximize its certified radius - the neighborhood around a given training sample for which the model's prediction remains unchanged. The scheme typically involves analyzing a \"smoothed\" classifier where one estimates the prediction corresponding to Gaussian samples in the neighborhood of each sample in the mini-batch, accomplished in practice by Monte Carlo sampling. In this paper, we investigate the hypothesis that this sampling bottleneck can potentially be mitigated by identifying ways to directly propagate the covariance matrix of the smoothed distribution through the network. To this end, we find that other than certain adjustments to the network, propagating the covariances must also be accompanied by additional accounting that keeps track of how the distributional moments transform and interact at each stage in the network. We show how satisfying these criteria yields an algorithm for maximizing the certified radius on datasets including Cifar-10, ImageNet, and Places365 while offering runtime savings on networks with moderate depth, with a small compromise in overall accuracy. We describe the details of the key modifications that enable practical use. Via various experiments, we evaluate when our simplifications are sensible, and what the key benefits and limitations are.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8579953/pdf/nihms-1730246.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39613258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Task Programming: Learning Data Efficient Behavior Representations. 任务编程：学习数据高效行为表示法

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2021-06-01 Epub Date: 2021-11-02 DOI: 10.1109/cvpr46437.2021.00290

Jennifer J Sun, Ann Kennedy, Eric Zhan, David J Anderson, Yisong Yue, Pietro Perona

Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annotation-sample efficient trajectory embedding for behavior analysis, based on multi-task self-supervised learning. The tasks in our method can be efficiently engineered by domain experts through a process we call "task programming", which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks. We evaluate this trade-off using data from behavioral neuroscience, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy compared to state-of-the-art features. Our results thus suggest that task programming and self-supervision can be an effective way to reduce annotation effort for domain experts.

要对训练集进行准确注释以便进行深入分析，通常需要专业领域知识，但从领域专家那里获取这些知识既繁琐又耗时。这一问题在自动行为分析中尤为突出，因为在自动行为分析中，需要从视频跟踪数据中检测出感兴趣的代理动作或行为。为了减少注释工作，我们提出了 TREBA：一种基于多任务自监督学习的方法，用于学习行为分析中注释-样本高效轨迹嵌入。我们方法中的任务可由领域专家通过我们称之为 "任务编程 "的过程高效地设计，该过程使用程序对领域专家提供的结构化知识进行显式编码。通过用数据注释时间换取少量编程任务的构建时间，可以减少领域专家的总工作量。我们利用行为神经科学的数据对这种权衡进行了评估，在这些数据中，专门的领域知识被用来识别行为。我们展示了小鼠和果蝇这两个领域的三个数据集的实验结果。与最先进的特征相比，利用 TREBA 的嵌入，我们在不影响准确性的情况下将注释负担最多减轻了 10 倍。因此，我们的研究结果表明，任务编程和自我监督是减少领域专家标注工作量的有效方法。

{"title":"Task Programming: Learning Data Efficient Behavior Representations.","authors":"Jennifer J Sun, Ann Kennedy, Eric Zhan, David J Anderson, Yisong Yue, Pietro Perona","doi":"10.1109/cvpr46437.2021.00290","DOIUrl":"10.1109/cvpr46437.2021.00290","url":null,"abstract":"Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annotation-sample efficient trajectory embedding for behavior analysis, based on multi-task self-supervised learning. The tasks in our method can be efficiently engineered by domain experts through a process we call \"task programming\", which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks. We evaluate this trade-off using data from behavioral neuroscience, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy compared to state-of-the-art features. Our results thus suggest that task programming and self-supervision can be an effective way to reduce annotation effort for domain experts.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2021 ","pages":"2875-2884"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9766046/pdf/nihms-1857211.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10433585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0