Multitask Learning for Video-based Surgical Skill Assessment

Zhiteng Jian, W. Yue, Qiuxia Wu, Wei Li, Zhiyong Wang, Vincent Lam
{"title":"Multitask Learning for Video-based Surgical Skill Assessment","authors":"Zhiteng Jian, W. Yue, Qiuxia Wu, Wei Li, Zhiyong Wang, Vincent Lam","doi":"10.1109/DICTA51227.2020.9363408","DOIUrl":null,"url":null,"abstract":"Surgical skill assessment (SSA) plays a vital role in medical systems for reducing intraoperative surgical errors and improving clinical outcomes. To ensure objective and efficient SSA, many automatic video-based SSA methods have been developed. In particular, various deep learning methods have been devised recently by utilising CNN or RNN-based networks for various skill assessment tasks (e.g., skill level prediction). While predicting overall skill levels and assessing detailed attribute-based scores are highly correlated, most existing studies deal with these two tasks separately, without fully exploiting different information sources encoded in a dataset. In contrast, we propose a novel end-to-end multitask learning framework to conduct skill level classification and attribute score regression jointly. Specifically, our network incorporates two branches for the two tasks, which share earlier layers for feature extraction and hold different prediction layers for specific targets. The shared feature extractor is optimised under the supervision of both tasks simultaneously, encouraging the model to consider information from different aspects and their relatedness to learn richer and more generalised features. In addition, since not every part of a surgical video contributes to skill assessment equally, we enhance an existing feature extractor I3D with a novel Spatio-Temporal & Channel Attention Module to emphasize important features. Experimental results on the public dataset JIGSAWS show that our proposed network outperforms state-of-the-art models on both skill classification and score regression tasks.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA51227.2020.9363408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Surgical skill assessment (SSA) plays a vital role in medical systems for reducing intraoperative surgical errors and improving clinical outcomes. To ensure objective and efficient SSA, many automatic video-based SSA methods have been developed. In particular, various deep learning methods have been devised recently by utilising CNN or RNN-based networks for various skill assessment tasks (e.g., skill level prediction). While predicting overall skill levels and assessing detailed attribute-based scores are highly correlated, most existing studies deal with these two tasks separately, without fully exploiting different information sources encoded in a dataset. In contrast, we propose a novel end-to-end multitask learning framework to conduct skill level classification and attribute score regression jointly. Specifically, our network incorporates two branches for the two tasks, which share earlier layers for feature extraction and hold different prediction layers for specific targets. The shared feature extractor is optimised under the supervision of both tasks simultaneously, encouraging the model to consider information from different aspects and their relatedness to learn richer and more generalised features. In addition, since not every part of a surgical video contributes to skill assessment equally, we enhance an existing feature extractor I3D with a novel Spatio-Temporal & Channel Attention Module to emphasize important features. Experimental results on the public dataset JIGSAWS show that our proposed network outperforms state-of-the-art models on both skill classification and score regression tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于视频的多任务学习外科技能评估
手术技能评估(SSA)在医疗系统中对于减少术中手术错误和改善临床结果起着至关重要的作用。为了保证SSA的客观和高效,人们开发了许多基于视频的自动SSA方法。特别是,最近已经设计了各种深度学习方法,利用CNN或基于rnn的网络进行各种技能评估任务(例如,技能水平预测)。虽然预测整体技能水平和评估详细的基于属性的分数是高度相关的,但大多数现有研究都是分别处理这两项任务,而没有充分利用数据集中编码的不同信息源。相反,我们提出了一种新的端到端多任务学习框架,将技能水平分类和属性得分联合进行回归。具体来说,我们的网络为两个任务合并了两个分支,它们共享用于特征提取的早期层,并为特定目标保留不同的预测层。共享特征提取器在两个任务同时监督下进行优化,鼓励模型考虑来自不同方面的信息及其相关性,以学习更丰富、更广义的特征。此外,由于并非手术视频的每个部分都对技能评估有同等的贡献,我们使用新的时空和通道注意模块增强了现有的特征提取器I3D,以强调重要的特征。在公共数据集JIGSAWS上的实验结果表明,我们提出的网络在技能分类和分数回归任务上都优于最先进的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pixel-RRT*: A Novel Skeleton Trajectory Search Algorithm for Hepatic Vessels M2-Net: A Multi-scale Multi-level Feature Enhanced Network for Object Detection in Optical Remote Sensing Images Using Environmental Context to Synthesis Missing Pixels Automatic Assessment of Open Street Maps Database Quality using Aerial Imagery Temporal 3D RetinaNet for fish detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1