Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

arXiv - EE - Image and Video Processing Pub Date : 2024-07-20 DOI:arxiv-2407.14904

Chen Shen, Chunfeng Lian, Wanqing Zhang, Fan Wang, Jianhua Zhang, Shuanliang Fan, Xin Wei, Gongji Wang, Kehan Li, Hongshu Mu, Hao Wu, Xinggong Liang, Jianhua Ma, Zhenyuan Wang

{"title":"Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning","authors":"Chen Shen, Chunfeng Lian, Wanqing Zhang, Fan Wang, Jianhua Zhang, Shuanliang Fan, Xin Wei, Gongji Wang, Kehan Li, Hongshu Mu, Hao Wu, Xinggong Liang, Jianhua Ma, Zhenyuan Wang","doi":"arxiv-2407.14904","DOIUrl":null,"url":null,"abstract":"Forensic pathology is critical in determining the cause and manner of death\nthrough post-mortem examinations, both macroscopic and microscopic. The field,\nhowever, grapples with issues such as outcome variability, laborious processes,\nand a scarcity of trained professionals. This paper presents SongCi, an\ninnovative visual-language model (VLM) designed specifically for forensic\npathology. SongCi utilizes advanced prototypical cross-modal self-supervised\ncontrastive learning to enhance the accuracy, efficiency, and generalizability\nof forensic analyses. It was pre-trained and evaluated on a comprehensive\nmulti-center dataset, which includes over 16 million high-resolution image\npatches, 2,228 vision-language pairs of post-mortem whole slide images (WSIs),\nand corresponding gross key findings, along with 471 distinct diagnostic\noutcomes. Our findings indicate that SongCi surpasses existing multi-modal AI\nmodels in many forensic pathology tasks, performs comparably to experienced\nforensic pathologists and significantly better than less experienced ones, and\nprovides detailed multi-modal explainability, offering critical assistance in\nforensic investigations. To the best of our knowledge, SongCi is the first VLM\nspecifically developed for forensic pathological analysis and the first\nlarge-vocabulary computational pathology (CPath) model that directly processes\ngigapixel WSIs in forensic science.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.14904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Forensic pathology is critical in determining the cause and manner of death through post-mortem examinations, both macroscopic and microscopic. The field, however, grapples with issues such as outcome variability, laborious processes, and a scarcity of trained professionals. This paper presents SongCi, an innovative visual-language model (VLM) designed specifically for forensic pathology. SongCi utilizes advanced prototypical cross-modal self-supervised contrastive learning to enhance the accuracy, efficiency, and generalizability of forensic analyses. It was pre-trained and evaluated on a comprehensive multi-center dataset, which includes over 16 million high-resolution image patches, 2,228 vision-language pairs of post-mortem whole slide images (WSIs), and corresponding gross key findings, along with 471 distinct diagnostic outcomes. Our findings indicate that SongCi surpasses existing multi-modal AI models in many forensic pathology tasks, performs comparably to experienced forensic pathologists and significantly better than less experienced ones, and provides detailed multi-modal explainability, offering critical assistance in forensic investigations. To the best of our knowledge, SongCi is the first VLM specifically developed for forensic pathological analysis and the first large-vocabulary computational pathology (CPath) model that directly processes gigapixel WSIs in forensic science.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过原型跨模态对比学习进行大词汇量法医病理分析

法医病理学在通过尸体检验（包括宏观和微观检验）确定死因和死亡方式方面至关重要。然而，该领域却面临着结果多变、过程繁琐、训练有素的专业人员稀缺等问题。本文介绍了专为法医病理学设计的创新型视觉语言模型（VLM）--SongCi。SongCi 利用先进的原型跨模态自监督对比学习来提高法医分析的准确性、效率和通用性。该数据集包括超过1,600万个高分辨率图像斑块、2,228对死后全切片图像（WSIs）的视觉语言对、相应的主要发现以及471种不同的诊断结果。我们的研究结果表明，在许多法医病理学任务中，SongCi超越了现有的多模态人工智能模型，其表现可与经验丰富的法医病理学家媲美，明显优于经验不足的病理学家，并提供了详细的多模态可解释性，为法医调查提供了重要帮助。据我们所知，SongCi 是第一个专门为法医病理分析开发的 VLM，也是第一个在法医学中直接处理千兆像素 WSI 的大词汇量计算病理学（CPath）模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - EE - Image and Video Processing

自引率

0.00%

发文量