首页 > 最新文献

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Learned representation-guided diffusion models for large-image generation. 用于生成大型图像的学习表示引导扩散模型
Pub Date : 2024-06-01 Epub Date: 2024-09-16 DOI: 10.1109/cvpr52733.2024.00815
Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris Samaras

To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions.

要合成高保真样本,扩散模型通常需要辅助数据来指导生成过程。然而,在组织病理学和卫星图像等专业领域,需要进行艰苦的斑块级注释工作,这是不现实的;注释工作通常由领域专家完成,涉及数以亿计的斑块。现代自监督学习(SSL)表征编码了丰富的语义和视觉信息。在本文中,我们认为这些表征具有足够的表现力,可以作为细粒度人类标签的代理。我们引入了一种新方法,以 SSL 的嵌入为条件训练扩散模型。我们的扩散模型成功地将这些特征投射回高质量的组织病理学和遥感图像。此外,我们还通过组合从 SSL 嵌入中推断出的空间一致性斑块来构建更大的图像,从而保留了长距离依赖关系。通过生成真实图像的变体来增强真实数据,提高了下游分类器对斑块级和更大图像级分类任务的准确性。我们的模型即使在训练过程中未遇到的数据集上也很有效,这证明了它们的鲁棒性和通用性。根据所学嵌入生成图像与嵌入的来源无关。用于生成大图像的 SSL 嵌入可以从参考图像中提取,也可以从任何相关模态(如类标签、文本、基因组数据)的辅助模型中采样。作为概念验证,我们引入了文本到大型图像合成范例,成功地从文本描述中合成了大型病理和卫星图像。
{"title":"Learned representation-guided diffusion models for large-image generation.","authors":"Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris Samaras","doi":"10.1109/cvpr52733.2024.00815","DOIUrl":"10.1109/cvpr52733.2024.00815","url":null,"abstract":"<p><p>To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2024 ","pages":"8532-8542"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology. SI-MIL:驯服深度 MIL,实现千兆像素组织病理学的自解释性
Pub Date : 2024-06-01 Epub Date: 2024-09-16 DOI: 10.1109/cvpr52733.2024.01067
Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R Gupta, Prateek Prasanna

Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local-and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness.

鉴于千兆像素切片的复杂性,在用于全切片图像(WSI)分析的多实例学习(MIL)方法中引入可解释性和推理具有挑战性。传统上,MIL 的可解释性仅限于识别被认为与下游任务相关的突出区域,对于最终用户(病理学家)来说,几乎无法深入了解这些选择背后的原理。为了解决这个问题,我们提出了自解释 MIL(SI-MIL),这是一种从一开始就为可解释性而设计的方法。SI-MIL 采用深度 MIL 框架,以手工制作的病理特征为基础,引导可解释的分支,促进线性预测。除了识别突出区域外,SI-MIL 还能为 WSI 提供植根于病理学见解的独特的特征级解释。值得注意的是,SI-MIL 凭借其线性预测约束条件,挑战了模型可解释性与性能之间不可避免的权衡这一流行神话,在三种癌症类型的 WSI 级预测任务中,与最先进的方法相比,SI-MIL 取得了极具竞争力的结果。此外,我们还通过统计分析、领域专家研究以及可解释性的必要条件(即用户友好性和忠实性),对 SI-MIL 的局部和全局可解释性进行了全面的基准测试。
{"title":"SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology.","authors":"Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R Gupta, Prateek Prasanna","doi":"10.1109/cvpr52733.2024.01067","DOIUrl":"10.1109/cvpr52733.2024.01067","url":null,"abstract":"<p><p>Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local-and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2024 ","pages":"11226-11237"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling. MAPSeg:基于三维屏蔽自动编码和伪标签的统一无监督领域适应性异构医学图像分割。
Pub Date : 2024-06-01 Epub Date: 2024-09-16 DOI: 10.1109/cvpr52733.2024.00559
Xuzhe Zhang, Yuhao Wu, Elsa Angelini, Ang Li, Jia Guo, Jerod M Rasmussen, Thomas G O'Connor, Pathik D Wadhwa, Andrea Parolin Jackowski, Hai Li, Jonathan Posner, Andrew F Laine, Yun Wang

Robust segmentation is critical for deriving quantitative measures from large-scale, multi-center, and longitudinal medical scans. Manually annotating medical scans, however, is expensive and labor-intensive and may not always be available in every domain. Unsupervised domain adaptation (UDA) is a well-studied technique that alleviates this label-scarcity problem by leveraging available labels from another domain. In this study, we introduce Masked Autoencoding and Pseudo-Labeling Segmentation (MAPSeg), a unified UDA framework with great versatility and superior performance for heterogeneous and volumetric medical image segmentation. To the best of our knowledge, this is the first study that systematically reviews and develops a framework to tackle four different domain shifts in medical image segmentation. More importantly, MAPSeg is the first framework that can be applied to centralized, federated, and test-time UDA while maintaining comparable performance. We compare MAPSeg with previous state-of-the-art methods on a private infant brain MRI dataset and a public cardiac CT-MRI dataset, and MAPSeg outperforms others by a large margin (10.5 Dice improvement on the private MRI dataset and 5.7 on the public CT-MRI dataset). MAPSeg poses great practical value and can be applied to real-world problems. GitHub: https://github.com/Xuzhez/MAPSeg/.

稳健的分割对于从大规模、多中心和纵向医学扫描中得出定量测量结果至关重要。然而,手动标注医学扫描图像既昂贵又耗费人力,而且并非每个领域都能提供。无监督领域适配(UDA)是一种经过充分研究的技术,它通过利用另一领域的可用标签来缓解标签稀缺问题。在本研究中,我们介绍了掩码自动编码和伪标签分割(MAPSeg),这是一种统一的 UDA 框架,具有很强的通用性和卓越的性能,适用于异构和容积医学图像分割。据我们所知,这是第一项系统回顾和开发用于解决医学图像分割中四个不同领域转变的框架的研究。更重要的是,MAPSeg 是第一个可以应用于集中式、联合式和测试时间 UDA 的框架,同时还能保持可比的性能。我们将 MAPSeg 与之前在私人婴儿脑部 MRI 数据集和公共心脏 CT-MRI 数据集上使用的最先进方法进行了比较,结果发现 MAPSeg 远远优于其他方法(在私人 MRI 数据集上提高了 10.5 Dice,在公共 CT-MRI 数据集上提高了 5.7 Dice)。MAPSeg 具有极大的实用价值,可应用于现实世界的问题。GitHub: https://github.com/Xuzhez/MAPSeg/.
{"title":"MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling.","authors":"Xuzhe Zhang, Yuhao Wu, Elsa Angelini, Ang Li, Jia Guo, Jerod M Rasmussen, Thomas G O'Connor, Pathik D Wadhwa, Andrea Parolin Jackowski, Hai Li, Jonathan Posner, Andrew F Laine, Yun Wang","doi":"10.1109/cvpr52733.2024.00559","DOIUrl":"10.1109/cvpr52733.2024.00559","url":null,"abstract":"<p><p>Robust segmentation is critical for deriving quantitative measures from large-scale, multi-center, and longitudinal medical scans. Manually annotating medical scans, however, is expensive and labor-intensive and may not always be available in every domain. Unsupervised domain adaptation (UDA) is a well-studied technique that alleviates this label-scarcity problem by leveraging available labels from another domain. In this study, we introduce Masked Autoencoding and Pseudo-Labeling Segmentation (MAPSeg), a <b>unified</b> UDA framework with great versatility and superior performance for heterogeneous and volumetric medical image segmentation. To the best of our knowledge, this is the first study that systematically reviews and develops a framework to tackle four different domain shifts in medical image segmentation. More importantly, MAPSeg is the first framework that can be applied to <b>centralized</b>, <b>federated</b>, and <b>test-time</b> UDA while maintaining comparable performance. We compare MAPSeg with previous state-of-the-art methods on a private infant brain MRI dataset and a public cardiac CT-MRI dataset, and MAPSeg outperforms others by a large margin (10.5 Dice improvement on the private MRI dataset and 5.7 on the public CT-MRI dataset). MAPSeg poses great practical value and can be applied to real-world problems. GitHub: https://github.com/Xuzhez/MAPSeg/.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2024 ","pages":"5851-5862"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520032/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision. 通过自我监督从解剖学中学习可定位性、可组合性和可分解性来表示基础模型中的部分-整体层次。
Pub Date : 2024-06-01 Epub Date: 2024-09-16 DOI: 10.1109/cvpr52733.2024.01071
Mohammad Reza Hosseinzadeh Taher, Michael B Gotway, Jianming Liang

Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at GitHub.com/JLiangLab/Eden.

人类通过将图像解析成“部分-整体”的层次结构,毫不费力地解读图像;深度学习在学习多层次特征空间方面表现出色,但它们往往缺乏对部分-整体关系的明确编码,这是医学成像的一个突出特性。为了克服这一限制,我们引入了Adam-v2,这是一个新的自监督学习框架,它扩展了Adam[79],通过三个关键分支明确地将部分-整体层次结构纳入其学习目标中:(1)本地化,获取区分不同解剖模式的判别表征;(2)可组合性,以部分到整体的方式学习每个解剖结构;(3)可分解性,以整体到局部的方式理解每个解剖结构。在10个任务中的实验结果,与零射击、少射击转移和完全微调设置的11个基线相比,显示了Adam-v2在不同下游任务中的大规模医学模型和现有SSL方法的优越性能。Adam-v2的表示具有更高的通用性和鲁棒性,这源于它对未标记的医学图像中不同解剖结构的层次结构的明确构建。Adam-v2在其嵌入中保持了解剖多样性和和谐的语义平衡,产生了既通用又有语义意义的表示,但在现有的SSL方法中却被忽视了。所有代码和预训练模型都可以在GitHub.com/JLiangLab/Eden上获得。
{"title":"Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision.","authors":"Mohammad Reza Hosseinzadeh Taher, Michael B Gotway, Jianming Liang","doi":"10.1109/cvpr52733.2024.01071","DOIUrl":"10.1109/cvpr52733.2024.01071","url":null,"abstract":"<p><p>Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at GitHub.com/JLiangLab/Eden.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"abs/210504906 2024","pages":"11269-11281"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11636527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142820343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations. 校准多模态表示:追求无注释的组鲁棒性。
Pub Date : 2024-06-01 Epub Date: 2024-09-16 DOI: 10.1109/cvpr52733.2024.02470
Chenyu You, Yifei Min, Weicheng Dai, Jasjeet S Sekhon, Lawrence Staib, James S Duncan

Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent studies indicate that pre-trained vision-language classifiers may overly depend on spurious features - patterns that correlate with the target in training data, but are not related to the true labeling function; and (iii) existing studies on mitigating the reliance on spurious features, largely based on the assumption that we can identify such features, does not provide definitive assurance for real-world applications. As a piloting study, this work focuses on exploring mitigating the reliance on spurious features for CLIP without using any group annotation. To this end, we systematically study the existence of spurious correlation on CLIP and CLIP+ERM. We first, following recent work on Deep Feature Reweighting (DFR), verify that last-layer retraining can greatly improve group robustness on pretrained CLIP. In view of them, we advocate a lightweight representation calibration method for fine-tuning CLIP, by first generating a calibration set using the pretrained CLIP, and then calibrating representations of samples within this set through contrastive learning, all without the need for group labels. Extensive experiments and in-depth visualizations on several benchmarks validate the effectiveness of our proposals, largely reducing reliance and significantly boosting the model generalization. Our codes will be available in here.

微调预训练的视觉语言模型,如CLIP,在各种下游任务上取得了成功。然而,这种模式存在几个痛点:(i)直接调整整个预训练模型既费时又耗量。此外,这些调优模型往往变得高度专门化,限制了它们在实际部署中的实用性;(ii)最近的研究表明,预训练的视觉语言分类器可能过度依赖于虚假特征——与训练数据中的目标相关的模式,但与真实的标记函数无关;(iii)现有的关于减少对虚假特征的依赖的研究,主要基于我们可以识别这些特征的假设,并不能为现实世界的应用提供明确的保证。作为一项试点研究,这项工作的重点是探索在不使用任何组注释的情况下减轻CLIP对虚假特征的依赖。为此,我们系统地研究了CLIP和CLIP+ERM上伪相关的存在性。首先,根据最近对深度特征重加权(DFR)的研究,我们验证了最后一层再训练可以大大提高预训练CLIP上的组鲁棒性。鉴于此,我们提倡一种轻量级的表征校准方法来对CLIP进行微调,首先使用预训练的CLIP生成一个校准集,然后通过对比学习来校准该集内样本的表征,而不需要组标签。在几个基准上进行了广泛的实验和深入的可视化,验证了我们的建议的有效性,在很大程度上减少了依赖并显著提高了模型的泛化。我们的密码就在这里。
{"title":"Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations.","authors":"Chenyu You, Yifei Min, Weicheng Dai, Jasjeet S Sekhon, Lawrence Staib, James S Duncan","doi":"10.1109/cvpr52733.2024.02470","DOIUrl":"10.1109/cvpr52733.2024.02470","url":null,"abstract":"<p><p>Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent studies indicate that pre-trained vision-language classifiers may overly depend on spurious features - patterns that correlate with the target in training data, but are not related to the true labeling function; and (iii) existing studies on mitigating the reliance on spurious features, largely based on the assumption that we can identify such features, does not provide definitive assurance for real-world applications. As a piloting study, this work focuses on exploring mitigating the reliance on spurious features for CLIP without using any group annotation. To this end, we systematically study the existence of spurious correlation on CLIP and CLIP+ERM. We first, following recent work on Deep Feature Reweighting (DFR), verify that last-layer retraining can greatly improve group robustness on pretrained CLIP. In view of them, we advocate a lightweight representation calibration method for fine-tuning CLIP, by first generating a calibration set using the pretrained CLIP, and then calibrating representations of samples within this set through contrastive learning, all without the need for group labels. Extensive experiments and in-depth visualizations on several benchmarks validate the effectiveness of our proposals, largely reducing reliance and significantly boosting the model generalization. Our codes will be available in here.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2024 ","pages":"26140-26150"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11620289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Directional Connectivity-based Segmentation of Medical Images. 基于方向连通性的医学图像分割。
Pub Date : 2023-06-01 Epub Date: 2023-08-22 DOI: 10.1109/cvpr52729.2023.01109
Ziyun Yang, Sina Farsiu

Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional information in the latent space. In this work, we demonstrate that effective disentanglement of directional sub-space from the shared latent space can significantly enhance the feature representation in the connectivity-based network. To this end, we propose a directional connectivity modeling scheme for segmentation that decouples, tracks, and utilizes the directional information across the network. Experiments on various public medical image segmentation benchmarks show the effectiveness of our model as compared to the state-of-the-art methods. Code is available at https://github.com/Zyun-Y/DconnNet.

生物标志物分割中的解剖学一致性对于许多医学图像分析任务至关重要。通过深度网络实现解剖学一致分割的一个很有前途的范例是结合像素连接(数字拓扑中的一个基本概念)来建模像素间关系。然而,以前关于连通性建模的工作忽略了潜在空间中丰富的通道方向信息。在这项工作中,我们证明了在基于连通性的网络中,方向子空间与共享潜在空间的有效解纠缠可以显著增强特征表示。为此,我们提出了一种用于分割的定向连接建模方案,该方案解耦、跟踪并利用整个网络的定向信息。在各种公共医学图像分割基准上的实验表明,与最先进的方法相比,我们的模型是有效的。代码可在https://github.com/Zyun-Y/DconnNet.
{"title":"Directional Connectivity-based Segmentation of Medical Images.","authors":"Ziyun Yang, Sina Farsiu","doi":"10.1109/cvpr52729.2023.01109","DOIUrl":"10.1109/cvpr52729.2023.01109","url":null,"abstract":"<p><p>Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional information in the latent space. In this work, we demonstrate that effective disentanglement of directional sub-space from the shared latent space can significantly enhance the feature representation in the connectivity-based network. To this end, we propose a directional connectivity modeling scheme for segmentation that decouples, tracks, and utilizes the directional information across the network. Experiments on various public medical image segmentation benchmarks show the effectiveness of our model as compared to the state-of-the-art methods. Code is available at https://github.com/Zyun-Y/DconnNet.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2023 ","pages":"11525-11535"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10543919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41169515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical discriminative learning improves visual representations of biomedical microscopy. 分层判别学习提高了生物医学显微镜的视觉表征。
Cheng Jiang, Xinhai Hou, Akhil Kondepudi, Asadur Chowdury, Christian W Freudiger, Daniel A Orringer, Honglak Lee, Todd C Hollon

Learning high-quality, self-supervised, visual representations is essential to advance the role of computer vision in biomedical microscopy and clinical medicine. Previous work has focused on self-supervised representation learning (SSL) methods developed for instance discrimination and applied them directly to image patches, or fields-of-view, sampled from gigapixel whole-slide images (WSIs) used for cancer diagnosis. However, this strategy is limited because it (1) assumes patches from the same patient are independent, (2) neglects the patient-slide-patch hierarchy of clinical biomedical microscopy, and (3) requires strong data augmentations that can degrade downstream performance. Importantly, sampled patches from WSIs of a patient's tumor are a diverse set of image examples that capture the same underlying cancer diagnosis. This motivated HiDisc, a data-driven method that leverages the inherent patient-slide-patch hierarchy of clinical biomedical microscopy to define a hierarchical discriminative learning task that implicitly learns features of the underlying diagnosis. HiDisc uses a self-supervised contrastive learning framework in which positive patch pairs are defined based on a common ancestry in the data hierarchy, and a unified patch, slide, and patient discriminative learning objective is used for visual SSL. We benchmark HiDisc visual representations on two vision tasks using two biomedical microscopy datasets, and demonstrate that (1) HiDisc pretraining outperforms current state-of-the-art self-supervised pretraining methods for cancer diagnosis and genetic mutation prediction, and (2) HiDisc learns high-quality visual representations using natural patch diversity without strong data augmentations.

学习高质量的、自我监督的视觉表征对于提高计算机视觉在生物医学显微镜和临床医学中的作用至关重要。之前的工作主要集中在自监督表示学习(SSL)方法上,该方法用于实例识别,并将其直接应用于用于癌症诊断的从千兆像素整张图像(wsi)中采样的图像补丁或视场。然而,这种策略是有限的,因为它(1)假设来自同一患者的贴片是独立的,(2)忽略了临床生物医学显微镜的患者-玻片-贴片层次结构,(3)需要强大的数据增强,这可能会降低下游性能。重要的是,从患者肿瘤的wsi中采样的斑块是一组不同的图像示例,它们捕获了相同的潜在癌症诊断。这激发了HiDisc,一种数据驱动的方法,利用临床生物医学显微镜固有的患者-幻灯片-贴片层次来定义分层判别学习任务,隐式学习潜在诊断的特征。HiDisc使用自监督对比学习框架,其中基于数据层次结构中的共同祖先定义正补丁对,并将统一的补丁、幻灯片和患者判别学习目标用于可视化SSL。我们使用两个生物医学显微镜数据集在两个视觉任务上对HiDisc视觉表示进行了基准测试,并证明(1)HiDisc预训练优于目前最先进的癌症诊断和基因突变预测的自监督预训练方法;(2)HiDisc在没有强数据增强的情况下使用自然斑块多样性学习高质量的视觉表示。
{"title":"Hierarchical discriminative learning improves visual representations of biomedical microscopy.","authors":"Cheng Jiang,&nbsp;Xinhai Hou,&nbsp;Akhil Kondepudi,&nbsp;Asadur Chowdury,&nbsp;Christian W Freudiger,&nbsp;Daniel A Orringer,&nbsp;Honglak Lee,&nbsp;Todd C Hollon","doi":"10.1109/cvpr52729.2023.01896","DOIUrl":"https://doi.org/10.1109/cvpr52729.2023.01896","url":null,"abstract":"<p><p>Learning high-quality, self-supervised, visual representations is essential to advance the role of computer vision in biomedical microscopy and clinical medicine. Previous work has focused on self-supervised representation learning (SSL) methods developed for instance discrimination and applied them directly to image patches, or fields-of-view, sampled from gigapixel whole-slide images (WSIs) used for cancer diagnosis. However, this strategy is limited because it (1) assumes patches from the same patient are independent, (2) neglects the patient-slide-patch hierarchy of clinical biomedical microscopy, and (3) requires strong data augmentations that can degrade downstream performance. Importantly, sampled patches from WSIs of a patient's tumor are a diverse set of image examples that capture the same underlying cancer diagnosis. This motivated HiDisc, a data-driven method that leverages the inherent patient-slide-patch hierarchy of clinical biomedical microscopy to define a hierarchical discriminative learning task that implicitly learns features of the underlying diagnosis. HiDisc uses a self-supervised contrastive learning framework in which positive patch pairs are defined based on a common ancestry in the data hierarchy, and a unified patch, slide, and patient discriminative learning objective is used for visual SSL. We benchmark HiDisc visual representations on two vision tasks using two biomedical microscopy datasets, and demonstrate that (1) HiDisc pretraining outperforms current state-of-the-art self-supervised pretraining methods for cancer diagnosis and genetic mutation prediction, and (2) HiDisc learns high-quality visual representations using natural patch diversity without strong data augmentations.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2023 ","pages":"19798-19808"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468966/pdf/nihms-1911888.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10504505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning. 针对联邦学习中数据异构问题的体系结构设计反思。
Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei, Daniel Rubin

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.

联邦学习是一种新兴的研究范式,可以在不同组织之间协作训练机器学习模型,同时保持每个机构的数据私密性。尽管最近取得了一些进展,但仍然存在一些根本性的挑战,如缺乏融合和在现实世界的异构设备中潜在的灾难性遗忘。在本文中,我们证明了基于自注意的体系结构(例如,transformer)对分布转移更健壮,因此改进了异构数据上的联邦学习。具体地说,我们在一系列联邦算法、现实世界基准测试和异构数据分割中对不同的神经架构进行了第一次严格的实证研究。我们的实验表明,简单地用transformer替换卷积网络可以大大减少先前设备的灾难性遗忘,加速收敛,并获得更好的全局模型,特别是在处理异构数据时。我们发布了我们的代码和预训练模型,以鼓励未来对健壮架构的探索,作为当前优化前沿研究工作的替代方案。
{"title":"Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning.","authors":"Liangqiong Qu,&nbsp;Yuyin Zhou,&nbsp;Paul Pu Liang,&nbsp;Yingda Xia,&nbsp;Feifei Wang,&nbsp;Ehsan Adeli,&nbsp;Li Fei-Fei,&nbsp;Daniel Rubin","doi":"10.1109/cvpr52688.2022.00982","DOIUrl":"https://doi.org/10.1109/cvpr52688.2022.00982","url":null,"abstract":"<p><p>Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2022 ","pages":"10051-10061"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9826695/pdf/nihms-1859405.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10524829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets. 等差数列允许在分析集合神经成像数据集时处理多个干扰变量
Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.01018
Vishnu Suresh Lokhande, Sathya N Ravi, Rudrasis Chakraborty, Vikas Singh

Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a single source of variability (e.g., different scanners), domain adaptation and matching the distributions of representations may suffice in many scenarios. But in the presence of more than one nuisance variable which concurrently influence the measurements, pooling datasets poses unique challenges, e.g., variations in the data can come from both the acquisition method as well as the demographics of participants (gender, age). Invariant representation learning, by itself, is ill-suited to fully model the data generation process. In this paper, we show how bringing recent results on equivariant representation learning (for studying symmetries in neural networks) instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. In particular, we demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples. Our code is available on https://github.com/vsingh-group/DatasetPooling.

在评估风险因素与疾病结果之间的关联时,将不同机构的多个神经成像数据集汇集在一起往往能提高统计能力,否则这些关联可能会过于微弱而无法检测。当只有一个变异源(如不同的扫描仪)时,在许多情况下,域适应和匹配表征分布可能就足够了。但是,如果同时存在不止一个影响测量结果的干扰变量,数据集的汇集就会带来独特的挑战,例如,数据的变化可能来自采集方法和参与者的人口统计学特征(性别、年龄)。不变表示学习本身并不适合对数据生成过程进行完全建模。在本文中,我们展示了如何将最近在等变表示学习(用于研究神经网络中的对称性)方面取得的成果实例化到结构化空间中,并简单利用因果推理方面的经典成果,从而提供有效的实用解决方案。我们特别展示了我们的模型如何在某些假设条件下处理多个干扰变量,以及如何在需要移除大部分样本的情况下分析集合科学数据集。我们的代码可在 https://github.com/vsingh-group/DatasetPooling 上获取。
{"title":"Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets.","authors":"Vishnu Suresh Lokhande, Sathya N Ravi, Rudrasis Chakraborty, Vikas Singh","doi":"10.1109/cvpr52688.2022.01018","DOIUrl":"10.1109/cvpr52688.2022.01018","url":null,"abstract":"<p><p>Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a single source of variability (e.g., different scanners), domain adaptation and matching the distributions of representations may suffice in many scenarios. But in the presence of more than one nuisance variable which concurrently influence the measurements, pooling datasets poses unique challenges, e.g., variations in the data can come from both the acquisition method as well as the demographics of participants (gender, age). Invariant representation learning, by itself, is ill-suited to fully model the data generation process. In this paper, we show how bringing recent results on equivariant representation learning (for studying symmetries in neural networks) instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. In particular, we demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples. Our code is available on https://github.com/vsingh-group/DatasetPooling.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":" ","pages":"10422-10431"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581465/pdf/nihms-1834390.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40562783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design. 用于降维和双曲 NN 设计的嵌套双曲空间
Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI: 10.1109/cvpr52688.2022.00045
Xiran Fan, Chun-Hao Yang, Baba C Vemuri

Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group which is a semi-Riemannian manifold, i.e. a manifold equipped with an indefinite metric. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations, which are the natural isometric transformations in hyperbolic spaces. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this representation - independent of the network - with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.

双曲神经网络因其能够有效、高效地表示分层数据集而在近来大受欢迎。开发这些网络的挑战在于嵌入空间(即双曲空间)的非线性。双曲空间是洛伦兹群的同质黎曼流形,属于半黎曼流形,即具有不确定度量的流形。大多数现有方法(除个别情况外)都使用局部线性化来定义各种操作,这些操作与欧几里得空间中传统深度神经网络中使用的操作类似。在本文中,我们提出了一种新颖的全双曲神经网络,它使用投影(嵌入)的概念,然后在双曲空间内进行内在聚合和非线性处理。其新颖之处在于投影,它旨在将数据投影到低维嵌入双曲空间,从而产生一种嵌套双曲空间表示法,可独立用于降维。它的主要理论贡献是证明了所提出的嵌入在洛伦兹变换下是等距和等变的,而洛伦兹变换是双曲空间中天然的等距变换。由于这种投影可以用简单的线性运算来表示,因此计算效率很高,而且由于上述等变性质,它还允许权重共享。嵌套双曲空间表示法是我们网络的核心组成部分,因此,我们首先将这种独立于网络的表示法与其他降维方法(如切线 PCA、主大地分析法 (PGA) 和 HoroPCA)进行比较。在等变嵌入的基础上,我们开发了一种新颖的全双曲图卷积神经网络架构来学习投影参数。最后,我们通过实验展示了我们的网络在几个公开数据集上的比较性能。
{"title":"Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design.","authors":"Xiran Fan, Chun-Hao Yang, Baba C Vemuri","doi":"10.1109/cvpr52688.2022.00045","DOIUrl":"10.1109/cvpr52688.2022.00045","url":null,"abstract":"<p><p>Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group which is a semi-Riemannian manifold, i.e. a manifold equipped with an indefinite metric. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations, which are the natural isometric transformations in hyperbolic spaces. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this representation - independent of the network - with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2022 ","pages":"356-365"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9997089/pdf/nihms-1871476.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9110465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1