PLZero：基于占位符的胸片多标签识别广义零学习方法

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2025-01-02 DOI:10.1007/s40747-024-01717-4

Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou

{"title":"PLZero：基于占位符的胸片多标签识别广义零学习方法","authors":"Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou","doi":"10.1007/s40747-024-01717-4","DOIUrl":null,"url":null,"abstract":"<p>By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"27 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PLZero: placeholder based approach to generalized zero-shot learning for multi-label recognition in chest radiographs\",\"authors\":\"Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou\",\"doi\":\"10.1007/s40747-024-01717-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.</p>\",\"PeriodicalId\":10524,\"journal\":{\"name\":\"Complex & Intelligent Systems\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Complex & Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s40747-024-01717-4\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01717-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

通过利用大规模的图像-文本配对数据进行预训练，该模型可以高效地学习图像和文本之间的对齐，极大地推动了零射击学习（zero-shot learning， ZSL）在智能医学图像分析领域的发展。然而，跨模态之间的异质性、图像-文本对的假阴性和领域转移现象给现有方法带来了挑战，使其难以有效地学习图像和文本之间的深层语义关系。为了解决这些挑战，我们提出了一个基于占位符学习的多标签胸部x射线识别广义ZSL框架，称为PLZero。具体来说，我们首先引入了一个联合嵌入空间学习模块（JESL），以鼓励模型更好地捕获不同标签之间的多样性。其次，我们提出了一个幻觉类生成模块（HCG），该模块基于可见类的视觉和语义特征，通过特征扩散和特征融合生成幻觉类，并将这些幻觉类作为未见类的占位符。最后，我们提出了一个基于幻觉类的原型学习模块（HCPL），它利用对比学习来控制幻觉类在视觉类周围的分布，而不会明显偏离原始数据，鼓励视觉类的类原型高度分散，从而为插入未见类样本创造足够的空间。广泛的实验表明，我们的方法具有足够的泛化性，并在三个经典和具有挑战性的胸部x射线数据集（NIH chest X-ray 14， CheXpert和ChestX-Det10）中实现了最佳性能。值得注意的是，即使未见类的数量超过其他方法的实验设置，我们的方法也优于其他方法。代码可在https://github.com/jinqiwen/PLZero上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PLZero: placeholder based approach to generalized zero-shot learning for multi-label recognition in chest radiographs

By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.