Multitask Learning for Video-based Surgical Skill Assessment

2020 Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2020-11-29 DOI:10.1109/DICTA51227.2020.9363408

Zhiteng Jian, W. Yue, Qiuxia Wu, Wei Li, Zhiyong Wang, Vincent Lam

{"title":"Multitask Learning for Video-based Surgical Skill Assessment","authors":"Zhiteng Jian, W. Yue, Qiuxia Wu, Wei Li, Zhiyong Wang, Vincent Lam","doi":"10.1109/DICTA51227.2020.9363408","DOIUrl":null,"url":null,"abstract":"Surgical skill assessment (SSA) plays a vital role in medical systems for reducing intraoperative surgical errors and improving clinical outcomes. To ensure objective and efficient SSA, many automatic video-based SSA methods have been developed. In particular, various deep learning methods have been devised recently by utilising CNN or RNN-based networks for various skill assessment tasks (e.g., skill level prediction). While predicting overall skill levels and assessing detailed attribute-based scores are highly correlated, most existing studies deal with these two tasks separately, without fully exploiting different information sources encoded in a dataset. In contrast, we propose a novel end-to-end multitask learning framework to conduct skill level classification and attribute score regression jointly. Specifically, our network incorporates two branches for the two tasks, which share earlier layers for feature extraction and hold different prediction layers for specific targets. The shared feature extractor is optimised under the supervision of both tasks simultaneously, encouraging the model to consider information from different aspects and their relatedness to learn richer and more generalised features. In addition, since not every part of a surgical video contributes to skill assessment equally, we enhance an existing feature extractor I3D with a novel Spatio-Temporal & Channel Attention Module to emphasize important features. Experimental results on the public dataset JIGSAWS show that our proposed network outperforms state-of-the-art models on both skill classification and score regression tasks.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA51227.2020.9363408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Surgical skill assessment (SSA) plays a vital role in medical systems for reducing intraoperative surgical errors and improving clinical outcomes. To ensure objective and efficient SSA, many automatic video-based SSA methods have been developed. In particular, various deep learning methods have been devised recently by utilising CNN or RNN-based networks for various skill assessment tasks (e.g., skill level prediction). While predicting overall skill levels and assessing detailed attribute-based scores are highly correlated, most existing studies deal with these two tasks separately, without fully exploiting different information sources encoded in a dataset. In contrast, we propose a novel end-to-end multitask learning framework to conduct skill level classification and attribute score regression jointly. Specifically, our network incorporates two branches for the two tasks, which share earlier layers for feature extraction and hold different prediction layers for specific targets. The shared feature extractor is optimised under the supervision of both tasks simultaneously, encouraging the model to consider information from different aspects and their relatedness to learn richer and more generalised features. In addition, since not every part of a surgical video contributes to skill assessment equally, we enhance an existing feature extractor I3D with a novel Spatio-Temporal & Channel Attention Module to emphasize important features. Experimental results on the public dataset JIGSAWS show that our proposed network outperforms state-of-the-art models on both skill classification and score regression tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视频的多任务学习外科技能评估

手术技能评估(SSA)在医疗系统中对于减少术中手术错误和改善临床结果起着至关重要的作用。为了保证SSA的客观和高效，人们开发了许多基于视频的自动SSA方法。特别是，最近已经设计了各种深度学习方法，利用CNN或基于rnn的网络进行各种技能评估任务(例如，技能水平预测)。虽然预测整体技能水平和评估详细的基于属性的分数是高度相关的，但大多数现有研究都是分别处理这两项任务，而没有充分利用数据集中编码的不同信息源。相反，我们提出了一种新的端到端多任务学习框架，将技能水平分类和属性得分联合进行回归。具体来说，我们的网络为两个任务合并了两个分支，它们共享用于特征提取的早期层，并为特定目标保留不同的预测层。共享特征提取器在两个任务同时监督下进行优化，鼓励模型考虑来自不同方面的信息及其相关性，以学习更丰富、更广义的特征。此外，由于并非手术视频的每个部分都对技能评估有同等的贡献，我们使用新的时空和通道注意模块增强了现有的特征提取器I3D，以强调重要的特征。在公共数据集JIGSAWS上的实验结果表明，我们提出的网络在技能分类和分数回归任务上都优于最先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量

期刊最新文献

Pixel-RRT*: A Novel Skeleton Trajectory Search Algorithm for Hepatic Vessels M2-Net: A Multi-scale Multi-level Feature Enhanced Network for Object Detection in Optical Remote Sensing Images Using Environmental Context to Synthesis Missing Pixels Automatic Assessment of Open Street Maps Database Quality using Aerial Imagery Temporal 3D RetinaNet for fish detection