Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger
{"title":"GP-VLS: A general-purpose vision language model for surgery","authors":"Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger","doi":"arxiv-2407.19305","DOIUrl":null,"url":null,"abstract":"Surgery requires comprehensive medical knowledge, visual assessment skills,\nand procedural expertise. While recent surgical AI models have focused on\nsolving task-specific problems, there is a need for general-purpose systems\nthat can understand surgical scenes and interact through natural language. This\npaper introduces GP-VLS, a general-purpose vision language model for surgery\nthat integrates medical and surgical knowledge with visual scene understanding.\nFor comprehensively evaluating general-purpose surgical models, we propose\nSurgiQual, which evaluates across medical and surgical knowledge benchmarks as\nwell as surgical vision-language questions. To train GP-VLS, we develop six new\ndatasets spanning medical knowledge, surgical textbooks, and vision-language\npairs for tasks like phase recognition and tool identification. We show that\nGP-VLS significantly outperforms existing open- and closed-source models on\nsurgical vision-language tasks, with 8-21% improvements in accuracy across\nSurgiQual benchmarks. GP-VLS also demonstrates strong performance on medical\nand surgical knowledge tests compared to open-source alternatives. Overall,\nGP-VLS provides an open-source foundation for developing AI assistants to\nsupport surgeons across a wide range of tasks and scenarios.","PeriodicalId":501572,"journal":{"name":"arXiv - QuanBio - Tissues and Organs","volume":"51 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Tissues and Organs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Surgery requires comprehensive medical knowledge, visual assessment skills,
and procedural expertise. While recent surgical AI models have focused on
solving task-specific problems, there is a need for general-purpose systems
that can understand surgical scenes and interact through natural language. This
paper introduces GP-VLS, a general-purpose vision language model for surgery
that integrates medical and surgical knowledge with visual scene understanding.
For comprehensively evaluating general-purpose surgical models, we propose
SurgiQual, which evaluates across medical and surgical knowledge benchmarks as
well as surgical vision-language questions. To train GP-VLS, we develop six new
datasets spanning medical knowledge, surgical textbooks, and vision-language
pairs for tasks like phase recognition and tool identification. We show that
GP-VLS significantly outperforms existing open- and closed-source models on
surgical vision-language tasks, with 8-21% improvements in accuracy across
SurgiQual benchmarks. GP-VLS also demonstrates strong performance on medical
and surgical knowledge tests compared to open-source alternatives. Overall,
GP-VLS provides an open-source foundation for developing AI assistants to
support surgeons across a wide range of tasks and scenarios.