{"title":"Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning","authors":"Chen Shen, Chunfeng Lian, Wanqing Zhang, Fan Wang, Jianhua Zhang, Shuanliang Fan, Xin Wei, Gongji Wang, Kehan Li, Hongshu Mu, Hao Wu, Xinggong Liang, Jianhua Ma, Zhenyuan Wang","doi":"arxiv-2407.14904","DOIUrl":null,"url":null,"abstract":"Forensic pathology is critical in determining the cause and manner of death\nthrough post-mortem examinations, both macroscopic and microscopic. The field,\nhowever, grapples with issues such as outcome variability, laborious processes,\nand a scarcity of trained professionals. This paper presents SongCi, an\ninnovative visual-language model (VLM) designed specifically for forensic\npathology. SongCi utilizes advanced prototypical cross-modal self-supervised\ncontrastive learning to enhance the accuracy, efficiency, and generalizability\nof forensic analyses. It was pre-trained and evaluated on a comprehensive\nmulti-center dataset, which includes over 16 million high-resolution image\npatches, 2,228 vision-language pairs of post-mortem whole slide images (WSIs),\nand corresponding gross key findings, along with 471 distinct diagnostic\noutcomes. Our findings indicate that SongCi surpasses existing multi-modal AI\nmodels in many forensic pathology tasks, performs comparably to experienced\nforensic pathologists and significantly better than less experienced ones, and\nprovides detailed multi-modal explainability, offering critical assistance in\nforensic investigations. To the best of our knowledge, SongCi is the first VLM\nspecifically developed for forensic pathological analysis and the first\nlarge-vocabulary computational pathology (CPath) model that directly processes\ngigapixel WSIs in forensic science.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.14904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Forensic pathology is critical in determining the cause and manner of death
through post-mortem examinations, both macroscopic and microscopic. The field,
however, grapples with issues such as outcome variability, laborious processes,
and a scarcity of trained professionals. This paper presents SongCi, an
innovative visual-language model (VLM) designed specifically for forensic
pathology. SongCi utilizes advanced prototypical cross-modal self-supervised
contrastive learning to enhance the accuracy, efficiency, and generalizability
of forensic analyses. It was pre-trained and evaluated on a comprehensive
multi-center dataset, which includes over 16 million high-resolution image
patches, 2,228 vision-language pairs of post-mortem whole slide images (WSIs),
and corresponding gross key findings, along with 471 distinct diagnostic
outcomes. Our findings indicate that SongCi surpasses existing multi-modal AI
models in many forensic pathology tasks, performs comparably to experienced
forensic pathologists and significantly better than less experienced ones, and
provides detailed multi-modal explainability, offering critical assistance in
forensic investigations. To the best of our knowledge, SongCi is the first VLM
specifically developed for forensic pathological analysis and the first
large-vocabulary computational pathology (CPath) model that directly processes
gigapixel WSIs in forensic science.