普外科视觉转换器:用于普外科手术的视频预训练基础模型

Samuel Schmidgall, Ji Woong Kim, Jeffery Jopling, Axel Krieger
{"title":"普外科视觉转换器:用于普外科手术的视频预训练基础模型","authors":"Samuel Schmidgall, Ji Woong Kim, Jeffery Jopling, Axel Krieger","doi":"arxiv-2403.05949","DOIUrl":null,"url":null,"abstract":"The absence of openly accessible data and specialized foundation models is a\nmajor barrier for computational research in surgery. Toward this, (i) we\nopen-source the largest dataset of general surgery videos to-date, consisting\nof 680 hours of surgical videos, including data from robotic and laparoscopic\ntechniques across 28 procedures; (ii) we propose a technique for video\npre-training a general surgery vision transformer (GSViT) on surgical videos\nbased on forward video prediction that can run in real-time for surgical\napplications, toward which we open-source the code and weights of GSViT; (iii)\nwe also release code and weights for procedure-specific fine-tuned versions of\nGSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the\nCholec80 phase annotation task, displaying improved performance over\nstate-of-the-art single frame predictors.","PeriodicalId":501572,"journal":{"name":"arXiv - QuanBio - Tissues and Organs","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"General surgery vision transformer: A video pre-trained foundation model for general surgery\",\"authors\":\"Samuel Schmidgall, Ji Woong Kim, Jeffery Jopling, Axel Krieger\",\"doi\":\"arxiv-2403.05949\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The absence of openly accessible data and specialized foundation models is a\\nmajor barrier for computational research in surgery. Toward this, (i) we\\nopen-source the largest dataset of general surgery videos to-date, consisting\\nof 680 hours of surgical videos, including data from robotic and laparoscopic\\ntechniques across 28 procedures; (ii) we propose a technique for video\\npre-training a general surgery vision transformer (GSViT) on surgical videos\\nbased on forward video prediction that can run in real-time for surgical\\napplications, toward which we open-source the code and weights of GSViT; (iii)\\nwe also release code and weights for procedure-specific fine-tuned versions of\\nGSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the\\nCholec80 phase annotation task, displaying improved performance over\\nstate-of-the-art single frame predictors.\",\"PeriodicalId\":501572,\"journal\":{\"name\":\"arXiv - QuanBio - Tissues and Organs\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Tissues and Organs\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2403.05949\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Tissues and Organs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.05949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

缺乏可公开访问的数据和专业基础模型是外科计算研究的一大障碍。为此,(i) 我们开源了迄今为止最大的普外科手术视频数据集,该数据集由 680 小时的手术视频组成,包括来自机器人和腹腔镜技术的 28 种手术数据;(ii) 我们提出了一种基于前向视频预测的普外科手术视觉转换器(GSViT)视频预训练技术,该技术可实时运行于手术应用中,为此我们开源了 GSViT 的代码和权重;(iv) 我们展示了 GSViT 在 Cholec80 阶段标注任务中的性能,其性能超过了最先进的单帧预测器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
General surgery vision transformer: A video pre-trained foundation model for general surgery
The absence of openly accessible data and specialized foundation models is a major barrier for computational research in surgery. Toward this, (i) we open-source the largest dataset of general surgery videos to-date, consisting of 680 hours of surgical videos, including data from robotic and laparoscopic techniques across 28 procedures; (ii) we propose a technique for video pre-training a general surgery vision transformer (GSViT) on surgical videos based on forward video prediction that can run in real-time for surgical applications, toward which we open-source the code and weights of GSViT; (iii) we also release code and weights for procedure-specific fine-tuned versions of GSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the Cholec80 phase annotation task, displaying improved performance over state-of-the-art single frame predictors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Clinical Validation of a Real-Time Machine Learning-based System for the Detection of Acute Myeloid Leukemia by Flow Cytometry Dynamic landscapes and statistical limits on growth during cell fate specification (Un)buckling mechanics of epithelial monolayers under compression On the design and stability of cancer adaptive therapy cycles: deterministic and stochastic models Celcomen: spatial causal disentanglement for single-cell and tissue perturbation modeling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1