首页 > 最新文献

PRIME@MICCAI最新文献

英文 中文
Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson's Disease 基于视频的手部姿势评估用于帕金森病运动迟缓的远程评估
Pub Date : 2023-08-28 DOI: 10.48550/arXiv.2308.14679
Gabriela T. Acevedo Trebbau, A. Bandini, D. Guarin
There is a growing interest in using pose estimation algorithms for video-based assessment of Bradykinesia in Parkinson's Disease (PD) to facilitate remote disease assessment and monitoring. However, the accuracy of pose estimation algorithms in videos from video streaming services during Telehealth appointments has not been studied. In this study, we used seven off-the-shelf hand pose estimation models to estimate the movement of the thumb and index fingers in videos of the finger-tapping (FT) test recorded from Healthy Controls (HC) and participants with PD and under two different conditions: streaming (videos recorded during a live Zoom meeting) and on-device (videos recorded locally with high-quality cameras). The accuracy and reliability of the models were estimated by comparing the models' output with manual results. Three of the seven models demonstrated good accuracy for on-device recordings, and the accuracy decreased significantly for streaming recordings. We observed a negative correlation between movement speed and the model's accuracy for the streaming recordings. Additionally, we evaluated the reliability of ten movement features related to bradykinesia extracted from video recordings of PD patients performing the FT test. While most of the features demonstrated excellent reliability for on-device recordings, most of the features demonstrated poor to moderate reliability for streaming recordings. Our findings highlight the limitations of pose estimation algorithms when applied to video recordings obtained during Telehealth visits, and demonstrate that on-device recordings can be used for automatic video-assessment of bradykinesia in PD.
在帕金森病(PD)运动迟缓的视频评估中使用姿态估计算法以促进远程疾病评估和监测的兴趣越来越大。然而,在远程医疗预约过程中,视频流服务视频的姿态估计算法的准确性尚未得到研究。在这项研究中,我们使用了七个现成的手部姿势估计模型来估计手指敲击(FT)测试视频中拇指和食指的运动,这些视频来自健康对照组(HC)和PD参与者,在两种不同的条件下:流媒体(在实时Zoom会议期间录制的视频)和设备上(用高质量摄像机在本地录制的视频)。通过将模型输出与人工结果进行比较,估计了模型的准确性和可靠性。7种模型中有3种在设备上记录时表现出良好的准确性,而在流媒体记录时准确性明显下降。我们观察到流媒体记录的移动速度和模型的准确性之间呈负相关。此外,我们评估了从PD患者进行FT测试的视频记录中提取的与运动迟缓相关的十个运动特征的可靠性。虽然大多数功能在设备上记录方面表现出出色的可靠性,但大多数功能在流媒体记录方面表现出较差到中等的可靠性。我们的研究结果强调了姿态估计算法在应用于远程医疗访问期间获得的视频记录时的局限性,并证明了设备上的记录可用于PD运动迟缓的自动视频评估。
{"title":"Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson's Disease","authors":"Gabriela T. Acevedo Trebbau, A. Bandini, D. Guarin","doi":"10.48550/arXiv.2308.14679","DOIUrl":"https://doi.org/10.48550/arXiv.2308.14679","url":null,"abstract":"There is a growing interest in using pose estimation algorithms for video-based assessment of Bradykinesia in Parkinson's Disease (PD) to facilitate remote disease assessment and monitoring. However, the accuracy of pose estimation algorithms in videos from video streaming services during Telehealth appointments has not been studied. In this study, we used seven off-the-shelf hand pose estimation models to estimate the movement of the thumb and index fingers in videos of the finger-tapping (FT) test recorded from Healthy Controls (HC) and participants with PD and under two different conditions: streaming (videos recorded during a live Zoom meeting) and on-device (videos recorded locally with high-quality cameras). The accuracy and reliability of the models were estimated by comparing the models' output with manual results. Three of the seven models demonstrated good accuracy for on-device recordings, and the accuracy decreased significantly for streaming recordings. We observed a negative correlation between movement speed and the model's accuracy for the streaming recordings. Additionally, we evaluated the reliability of ten movement features related to bradykinesia extracted from video recordings of PD patients performing the FT test. While most of the features demonstrated excellent reliability for on-device recordings, most of the features demonstrated poor to moderate reliability for streaming recordings. Our findings highlight the limitations of pose estimation algorithms when applied to video recordings obtained during Telehealth visits, and demonstrate that on-device recordings can be used for automatic video-assessment of bradykinesia in PD.","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123894913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pose2Gait: Extracting Gait Features from Monocular Video of Individuals with Dementia pose2步态:从痴呆患者的单目视频中提取步态特征
Pub Date : 2023-08-22 DOI: 10.48550/arXiv.2308.11484
Caroline Malin-Mayor, Vida Adeli, Andrea Sabo, S. Noritsyn, C. Gorodetsky, A. Fasano, A. Iaboni, B. Taati
Video-based ambient monitoring of gait for older adults with dementia has the potential to detect negative changes in health and allow clinicians and caregivers to intervene early to prevent falls or hospitalizations. Computer vision-based pose tracking models can process video data automatically and extract joint locations; however, publicly available models are not optimized for gait analysis on older adults or clinical populations. In this work we train a deep neural network to map from a two dimensional pose sequence, extracted from a video of an individual walking down a hallway toward a wall-mounted camera, to a set of three-dimensional spatiotemporal gait features averaged over the walking sequence. The data of individuals with dementia used in this work was captured at two sites using a wall-mounted system to collect the video and depth information used to train and evaluate our model. Our Pose2Gait model is able to extract velocity and step length values from the video that are correlated with the features from the depth camera, with Spearman's correlation coefficients of .83 and .60 respectively, showing that three dimensional spatiotemporal features can be predicted from monocular video. Future work remains to improve the accuracy of other features, such as step time and step width, and test the utility of the predicted values for detecting meaningful changes in gait during longitudinal ambient monitoring.
基于视频的老年痴呆患者步态环境监测有可能发现健康方面的负面变化,并使临床医生和护理人员能够及早干预,防止跌倒或住院。基于计算机视觉的姿态跟踪模型可以自动处理视频数据并提取关节位置;然而,公开可用的模型并没有优化老年人或临床人群的步态分析。在这项工作中,我们训练了一个深度神经网络,从一个人沿着走廊走向壁挂式摄像机的视频中提取的二维姿势序列,映射到一组三维时空步态特征,这些特征是在行走序列上平均的。在这项工作中使用的痴呆症患者的数据是在两个地点使用壁挂式系统收集视频和深度信息,用于训练和评估我们的模型。我们的pose2步态模型能够从视频中提取与深度摄像机特征相关的速度和步长值,Spearman相关系数分别为0.83和0.60,表明可以从单目视频中预测三维时空特征。未来的工作仍然是提高其他特征的准确性,如步长和步宽,并测试在纵向环境监测中检测步态有意义变化的预测值的实用性。
{"title":"Pose2Gait: Extracting Gait Features from Monocular Video of Individuals with Dementia","authors":"Caroline Malin-Mayor, Vida Adeli, Andrea Sabo, S. Noritsyn, C. Gorodetsky, A. Fasano, A. Iaboni, B. Taati","doi":"10.48550/arXiv.2308.11484","DOIUrl":"https://doi.org/10.48550/arXiv.2308.11484","url":null,"abstract":"Video-based ambient monitoring of gait for older adults with dementia has the potential to detect negative changes in health and allow clinicians and caregivers to intervene early to prevent falls or hospitalizations. Computer vision-based pose tracking models can process video data automatically and extract joint locations; however, publicly available models are not optimized for gait analysis on older adults or clinical populations. In this work we train a deep neural network to map from a two dimensional pose sequence, extracted from a video of an individual walking down a hallway toward a wall-mounted camera, to a set of three-dimensional spatiotemporal gait features averaged over the walking sequence. The data of individuals with dementia used in this work was captured at two sites using a wall-mounted system to collect the video and depth information used to train and evaluate our model. Our Pose2Gait model is able to extract velocity and step length values from the video that are correlated with the features from the depth camera, with Spearman's correlation coefficients of .83 and .60 respectively, showing that three dimensional spatiotemporal features can be predicted from monocular video. Future work remains to improve the accuracy of other features, such as step time and step width, and test the utility of the predicted values for detecting meaningful changes in gait during longitudinal ambient monitoring.","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"61 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129167032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imputing Brain Measurements Across Data Sets via Graph Neural Networks 通过图神经网络跨数据集计算大脑测量值
Pub Date : 2023-08-19 DOI: 10.48550/arXiv.2308.09907
Yixin Wang, Wei Peng, S. Tapert, Qingyu Zhao, K. Pohl
Publicly available data sets of structural MRIs might not contain specific measurements of brain Regions of Interests (ROIs) that are important for training machine learning models. For example, the curvature scores computed by Freesurfer are not released by the Adolescent Brain Cognitive Development (ABCD) Study. One can address this issue by simply reapplying Freesurfer to the data set. However, this approach is generally computationally and labor intensive (e.g., requiring quality control). An alternative is to impute the missing measurements via a deep learning approach. However, the state-of-the-art is designed to estimate randomly missing values rather than entire measurements. We therefore propose to re-frame the imputation problem as a prediction task on another (public) data set that contains the missing measurements and shares some ROI measurements with the data sets of interest. A deep learning model is then trained to predict the missing measurements from the shared ones and afterwards is applied to the other data sets. Our proposed algorithm models the dependencies between ROI measurements via a graph neural network (GNN) and accounts for demographic differences in brain measurements (e.g. sex) by feeding the graph encoding into a parallel architecture. The architecture simultaneously optimizes a graph decoder to impute values and a classifier in predicting demographic factors. We test the approach, called Demographic Aware Graph-based Imputation (DAGI), on imputing those missing Freesurfer measurements of ABCD (N=3760) by training the predictor on those publicly released by the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540)...
公开可用的结构核磁共振成像数据集可能不包含对训练机器学习模型很重要的大脑兴趣区域(roi)的具体测量。例如,Freesurfer计算的曲率分数并没有在青少年大脑认知发展(ABCD)研究中公布。我们可以通过简单地对数据集重新应用Freesurfer来解决这个问题。然而,这种方法通常是计算和劳动密集型的(例如,需要质量控制)。另一种方法是通过深度学习方法来计算缺失的测量值。然而,最先进的技术是用来估计随机缺失的值,而不是整个测量值。因此,我们建议将估算问题重新构建为另一个(公共)数据集上的预测任务,该数据集包含缺失的测量值,并与感兴趣的数据集共享一些ROI测量值。然后训练一个深度学习模型来预测共享数据中缺失的测量值,然后应用于其他数据集。我们提出的算法通过图神经网络(GNN)对ROI测量之间的依赖关系进行建模,并通过将图编码输入并行架构来解释大脑测量(例如性别)中的人口统计学差异。该体系结构同时优化了用于估算值的图形解码器和用于预测人口因素的分类器。我们测试了这种方法,称为基于人口统计意识图的Imputation (DAGI),通过对国家酒精和青少年神经发育协会(nanda, N=540)公开发布的预测器进行训练,来推算那些缺失的Freesurfer ABCD测量值(N=3760)。
{"title":"Imputing Brain Measurements Across Data Sets via Graph Neural Networks","authors":"Yixin Wang, Wei Peng, S. Tapert, Qingyu Zhao, K. Pohl","doi":"10.48550/arXiv.2308.09907","DOIUrl":"https://doi.org/10.48550/arXiv.2308.09907","url":null,"abstract":"Publicly available data sets of structural MRIs might not contain specific measurements of brain Regions of Interests (ROIs) that are important for training machine learning models. For example, the curvature scores computed by Freesurfer are not released by the Adolescent Brain Cognitive Development (ABCD) Study. One can address this issue by simply reapplying Freesurfer to the data set. However, this approach is generally computationally and labor intensive (e.g., requiring quality control). An alternative is to impute the missing measurements via a deep learning approach. However, the state-of-the-art is designed to estimate randomly missing values rather than entire measurements. We therefore propose to re-frame the imputation problem as a prediction task on another (public) data set that contains the missing measurements and shares some ROI measurements with the data sets of interest. A deep learning model is then trained to predict the missing measurements from the shared ones and afterwards is applied to the other data sets. Our proposed algorithm models the dependencies between ROI measurements via a graph neural network (GNN) and accounts for demographic differences in brain measurements (e.g. sex) by feeding the graph encoding into a parallel architecture. The architecture simultaneously optimizes a graph decoder to impute values and a classifier in predicting demographic factors. We test the approach, called Demographic Aware Graph-based Imputation (DAGI), on imputing those missing Freesurfer measurements of ABCD (N=3760) by training the predictor on those publicly released by the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540)...","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132217901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised Landmark Learning with Deformation Reconstruction and Cross-subject Consistency Objectives 具有变形重构和跨学科一致性目标的自监督里程碑学习
Pub Date : 2023-08-09 DOI: 10.48550/arXiv.2308.04987
Chun-Hung Chao, M. Niethammer
A Point Distribution Model (PDM) is the basis of a Statistical Shape Model (SSM) that relies on a set of landmark points to represent a shape and characterize the shape variation. In this work, we present a self-supervised approach to extract landmark points from a given registration model for the PDMs. Based on the assumption that the landmarks are the points that have the most influence on registration, existing works learn a point-based registration model with a small number of points to estimate the landmark points that influence the deformation the most. However, such approaches assume that the deformation can be captured by point-based registration and quality landmarks can be learned solely with the deformation capturing objective. We argue that data with complicated deformations can not easily be modeled with point-based registration when only a limited number of points is used to extract influential landmark points. Further, landmark consistency is not assured in existing approaches In contrast, we propose to extract landmarks based on a given registration model, which is tailored for the target data, so we can obtain more accurate correspondences. Secondly, to establish the anatomical consistency of the predicted landmarks, we introduce a landmark discovery loss to explicitly encourage the model to predict the landmarks that are anatomically consistent across subjects. We conduct experiments on an osteoarthritis progression prediction task and show our method outperforms existing image-based and point-based approaches.
点分布模型(PDM)是统计形状模型(SSM)的基础,SSM依赖于一组地标点来表示形状并表征形状变化。在这项工作中,我们提出了一种自监督方法,从给定的pdm配准模型中提取地标点。现有作品假设地标点是对配准影响最大的点,学习少量点的基于点的配准模型来估计对变形影响最大的地标点。然而,这些方法假设变形可以通过基于点的配准来捕获,并且质量地标可以单独与变形捕获目标一起学习。我们认为,当仅使用有限数量的点来提取有影响的地标点时,具有复杂变形的数据不容易用基于点的配准建模。此外,现有方法不能保证地标一致性,因此,我们提出基于给定的配准模型提取地标,该模型针对目标数据量身定制,从而获得更准确的对应关系。其次,为了建立预测地标的解剖学一致性,我们引入地标发现损失来明确鼓励模型预测跨受试者解剖一致的地标。我们对骨关节炎进展预测任务进行了实验,并表明我们的方法优于现有的基于图像和基于点的方法。
{"title":"Self-supervised Landmark Learning with Deformation Reconstruction and Cross-subject Consistency Objectives","authors":"Chun-Hung Chao, M. Niethammer","doi":"10.48550/arXiv.2308.04987","DOIUrl":"https://doi.org/10.48550/arXiv.2308.04987","url":null,"abstract":"A Point Distribution Model (PDM) is the basis of a Statistical Shape Model (SSM) that relies on a set of landmark points to represent a shape and characterize the shape variation. In this work, we present a self-supervised approach to extract landmark points from a given registration model for the PDMs. Based on the assumption that the landmarks are the points that have the most influence on registration, existing works learn a point-based registration model with a small number of points to estimate the landmark points that influence the deformation the most. However, such approaches assume that the deformation can be captured by point-based registration and quality landmarks can be learned solely with the deformation capturing objective. We argue that data with complicated deformations can not easily be modeled with point-based registration when only a limited number of points is used to extract influential landmark points. Further, landmark consistency is not assured in existing approaches In contrast, we propose to extract landmarks based on a given registration model, which is tailored for the target data, so we can obtain more accurate correspondences. Secondly, to establish the anatomical consistency of the predicted landmarks, we introduce a landmark discovery loss to explicitly encourage the model to predict the landmarks that are anatomically consistent across subjects. We conduct experiments on an osteoarthritis progression prediction task and show our method outperforms existing image-based and point-based approaches.","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128111307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised Few-shot Learning for Semantic Segmentation: An Annotation-free Approach 语义分割的自监督少镜头学习:一种无标注的方法
Pub Date : 2023-07-26 DOI: 10.48550/arXiv.2307.14446
Sanaz Karimijafarbigloo, Reza Azad, D. Merhof
Few-shot semantic segmentation (FSS) offers immense potential in the field of medical image analysis, enabling accurate object segmentation with limited training data. However, existing FSS techniques heavily rely on annotated semantic classes, rendering them unsuitable for medical images due to the scarcity of annotations. To address this challenge, multiple contributions are proposed: First, inspired by spectral decomposition methods, the problem of image decomposition is reframed as a graph partitioning task. The eigenvectors of the Laplacian matrix, derived from the feature affinity matrix of self-supervised networks, are analyzed to estimate the distribution of the objects of interest from the support images. Secondly, we propose a novel self-supervised FSS framework that does not rely on any annotation. Instead, it adaptively estimates the query mask by leveraging the eigenvectors obtained from the support images. This approach eliminates the need for manual annotation, making it particularly suitable for medical images with limited annotated data. Thirdly, to further enhance the decoding of the query image based on the information provided by the support image, we introduce a multi-scale large kernel attention module. By selectively emphasizing relevant features and details, this module improves the segmentation process and contributes to better object delineation. Evaluations on both natural and medical image datasets demonstrate the efficiency and effectiveness of our method. Moreover, the proposed approach is characterized by its generality and model-agnostic nature, allowing for seamless integration with various deep architectures. The code is publicly available at href{https://github.com/mindflow-institue/annotation_free_fewshot}{textcolor{magenta}{GitHub}}.
少镜头语义分割(FSS)在医学图像分析领域提供了巨大的潜力,可以在有限的训练数据下实现准确的目标分割。然而,现有的FSS技术严重依赖于注释的语义类,由于注释的稀缺性,使得它们不适合医学图像。为了解决这一挑战,提出了多项贡献:首先,受光谱分解方法的启发,将图像分解问题重新定义为图划分任务。从自监督网络的特征亲和矩阵出发,分析拉普拉斯矩阵的特征向量,从支持图像中估计感兴趣对象的分布。其次,我们提出了一个新的不依赖于任何注释的自监督FSS框架。相反,它利用从支持图像中获得的特征向量自适应地估计查询掩码。这种方法消除了手动注释的需要,使其特别适合具有有限注释数据的医学图像。第三,为了进一步增强基于支持图像提供的信息对查询图像的解码,我们引入了多尺度大核关注模块。该模块通过选择性地强调相关特征和细节,改进了分割过程,有助于更好地描绘物体。对自然和医学图像数据集的评估表明了我们的方法的效率和有效性。此外,所提出的方法的特点是其通用性和模型不可知性,允许与各种深度体系结构无缝集成。该代码可在href{https://github.com/mindflow-institue/annotation_free_fewshot}{textcolor{magenta}{GitHub}}上公开获得。
{"title":"Self-supervised Few-shot Learning for Semantic Segmentation: An Annotation-free Approach","authors":"Sanaz Karimijafarbigloo, Reza Azad, D. Merhof","doi":"10.48550/arXiv.2307.14446","DOIUrl":"https://doi.org/10.48550/arXiv.2307.14446","url":null,"abstract":"Few-shot semantic segmentation (FSS) offers immense potential in the field of medical image analysis, enabling accurate object segmentation with limited training data. However, existing FSS techniques heavily rely on annotated semantic classes, rendering them unsuitable for medical images due to the scarcity of annotations. To address this challenge, multiple contributions are proposed: First, inspired by spectral decomposition methods, the problem of image decomposition is reframed as a graph partitioning task. The eigenvectors of the Laplacian matrix, derived from the feature affinity matrix of self-supervised networks, are analyzed to estimate the distribution of the objects of interest from the support images. Secondly, we propose a novel self-supervised FSS framework that does not rely on any annotation. Instead, it adaptively estimates the query mask by leveraging the eigenvectors obtained from the support images. This approach eliminates the need for manual annotation, making it particularly suitable for medical images with limited annotated data. Thirdly, to further enhance the decoding of the query image based on the information provided by the support image, we introduce a multi-scale large kernel attention module. By selectively emphasizing relevant features and details, this module improves the segmentation process and contributes to better object delineation. Evaluations on both natural and medical image datasets demonstrate the efficiency and effectiveness of our method. Moreover, the proposed approach is characterized by its generality and model-agnostic nature, allowing for seamless integration with various deep architectures. The code is publicly available at href{https://github.com/mindflow-institue/annotation_free_fewshot}{textcolor{magenta}{GitHub}}.","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115274989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation TransDeepLab:基于无卷积变换的DeepLab v3+医学图像分割
Pub Date : 2022-08-01 DOI: 10.48550/arXiv.2208.00713
Reza Azad, Moein Heidari, M. Shariatnia, Ehsan Khodapanah Aghdam, Sanaz Karimijafarbigloo, E. Adeli, D. Merhof
Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped models with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modelling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. To this end, this paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at https://github.com/rezazad68/transdeeplab
卷积神经网络(cnn)多年来一直是各种计算机视觉任务的事实上的标准。特别是,基于开创性架构的深度神经网络,如带有跳跃连接的u形模型或带有金字塔池的亚鲁斯卷积,已经被量身定制用于广泛的医学图像分析任务。这种体系结构的主要优点是,它们倾向于保留通用的局部特性。然而,作为一个普遍的共识,cnn无法捕获远程依赖关系和空间相关性,这是由于卷积操作的接受野大小有限的固有性质。另外,得益于源自自注意机制的全局信息建模,Transformer最近在自然语言处理和计算机视觉方面取得了显著的成绩。然而,先前的研究证明,局部和全局特征对于深度模型在密集预测中至关重要,例如分割具有不同形状和配置的复杂结构。为此,本文提出了一种新的用于医学图像分割的类似deeplab的纯Transformer TransDeepLab。具体来说,我们利用带移位窗口的分层swwin - transformer来扩展DeepLabv3并对Atrous空间金字塔池(ASPP)模块进行建模。通过对相关文献的彻底搜索,我们是第一个用纯基于transformer的模型来模拟开创性DeepLab模型的人。在各种医学图像分割任务上进行的大量实验证明,我们的方法在视觉转换器和基于cnn的方法融合上的表现优于或与大多数当代作品相当,同时显著降低了模型复杂性。代码和经过训练的模型可在https://github.com/rezazad68/transdeeplab上公开获得
{"title":"TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation","authors":"Reza Azad, Moein Heidari, M. Shariatnia, Ehsan Khodapanah Aghdam, Sanaz Karimijafarbigloo, E. Adeli, D. Merhof","doi":"10.48550/arXiv.2208.00713","DOIUrl":"https://doi.org/10.48550/arXiv.2208.00713","url":null,"abstract":"Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped models with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modelling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. To this end, this paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at https://github.com/rezazad68/transdeeplab","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125076928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Intervertebral Disc Labeling With Learning Shape Information, A Look Once Approach 椎间盘标记与学习形状信息,一看一次的方法
Pub Date : 2022-04-06 DOI: 10.48550/arXiv.2204.02943
Reza Azad, Moein Heidari, J. Cohen-Adad, E. Adeli, D. Merhof
Accurate and automatic segmentation of intervertebral discs from medical images is a critical task for the assessment of spine-related diseases such as osteoporosis, vertebral fractures, and intervertebral disc herniation. To date, various approaches have been developed in the literature which routinely relies on detecting the discs as the primary step. A disadvantage of many cohort studies is that the localization algorithm also yields false-positive detections. In this study, we aim to alleviate this problem by proposing a novel U-Net-based structure to predict a set of candidates for intervertebral disc locations. In our design, we integrate the image shape information (image gradients) to encourage the model to learn rich and generic geometrical information. This additional signal guides the model to selectively emphasize the contextual representation and suppress the less discriminative features. On the post-processing side, to further decrease the false positive rate, we propose a permutation invariant 'look once' model, which accelerates the candidate recovery procedure. In comparison with previous studies, our proposed approach does not need to perform the selection in an iterative fashion. The proposed method was evaluated on the spine generic public multi-center dataset and demonstrated superior performance compared to previous work. We have provided the implementation code in https://github.com/rezazad68/intervertebral-lookonce
从医学图像中准确、自动地分割椎间盘是评估骨质疏松症、椎体骨折和椎间盘突出等脊柱相关疾病的关键任务。迄今为止,文献中已经发展了各种方法,这些方法通常依赖于检测椎间盘作为主要步骤。许多队列研究的一个缺点是定位算法也会产生假阳性检测。在这项研究中,我们的目标是通过提出一种新的基于u - net的结构来预测一组候选椎间盘位置来缓解这一问题。在我们的设计中,我们整合了图像的形状信息(图像的梯度),鼓励模型学习丰富和通用的几何信息。这个额外的信号引导模型选择性地强调上下文表示,并抑制不太区分的特征。在后处理方面,为了进一步降低假阳性率,我们提出了一种排列不变的“一次查看”模型,该模型加速了候选恢复过程。与以前的研究相比,我们提出的方法不需要以迭代的方式进行选择。在脊柱通用公共多中心数据集上对该方法进行了评估,结果表明该方法的性能优于以往的工作。我们在https://github.com/rezazad68/intervertebral-lookonce中提供了实现代码
{"title":"Intervertebral Disc Labeling With Learning Shape Information, A Look Once Approach","authors":"Reza Azad, Moein Heidari, J. Cohen-Adad, E. Adeli, D. Merhof","doi":"10.48550/arXiv.2204.02943","DOIUrl":"https://doi.org/10.48550/arXiv.2204.02943","url":null,"abstract":"Accurate and automatic segmentation of intervertebral discs from medical images is a critical task for the assessment of spine-related diseases such as osteoporosis, vertebral fractures, and intervertebral disc herniation. To date, various approaches have been developed in the literature which routinely relies on detecting the discs as the primary step. A disadvantage of many cohort studies is that the localization algorithm also yields false-positive detections. In this study, we aim to alleviate this problem by proposing a novel U-Net-based structure to predict a set of candidates for intervertebral disc locations. In our design, we integrate the image shape information (image gradients) to encourage the model to learn rich and generic geometrical information. This additional signal guides the model to selectively emphasize the contextual representation and suppress the less discriminative features. On the post-processing side, to further decrease the false positive rate, we propose a permutation invariant 'look once' model, which accelerates the candidate recovery procedure. In comparison with previous studies, our proposed approach does not need to perform the selection in an iterative fashion. The proposed method was evaluated on the spine generic public multi-center dataset and demonstrated superior performance compared to previous work. We have provided the implementation code in https://github.com/rezazad68/intervertebral-lookonce","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124502619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Few-Shot Learning Graph Multi-trajectory Evolution Network for Forecasting Multimodal Baby Connectivity Development from a Baseline Timepoint 基于基线时间点预测多模态婴儿连通性发展的少镜头学习图多轨迹进化网络
Pub Date : 2021-10-06 DOI: 10.1007/978-3-030-87602-9_2
Alaa Bessadok, Ahmed Nebli, M. Mahjoub, Gang Li, Weili Lin, D. Shen, I. Rekik
{"title":"A Few-Shot Learning Graph Multi-trajectory Evolution Network for Forecasting Multimodal Baby Connectivity Development from a Baseline Timepoint","authors":"Alaa Bessadok, Ahmed Nebli, M. Mahjoub, Gang Li, Weili Lin, D. Shen, I. Rekik","doi":"10.1007/978-3-030-87602-9_2","DOIUrl":"https://doi.org/10.1007/978-3-030-87602-9_2","url":null,"abstract":"","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"36 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114007196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
One Representative-Shot Learning Using a Population-Driven Template with Application to Brain Connectivity Classification and Evolution Prediction 基于群体驱动模板的一次代表性学习在脑连接分类和进化预测中的应用
Pub Date : 2021-10-06 DOI: 10.1007/978-3-030-87602-9_3
Umut Guvercin, Mohammed Amine Gharsallaoui, I. Rekik
{"title":"One Representative-Shot Learning Using a Population-Driven Template with Application to Brain Connectivity Classification and Evolution Prediction","authors":"Umut Guvercin, Mohammed Amine Gharsallaoui, I. Rekik","doi":"10.1007/978-3-030-87602-9_3","DOIUrl":"https://doi.org/10.1007/978-3-030-87602-9_3","url":null,"abstract":"","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131757357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Opportunistic Screening of Osteoporosis Using Plain Film Chest X-ray 利用胸部x线平片筛查骨质疏松症
Pub Date : 2021-04-05 DOI: 10.1007/978-3-030-87602-9_13
Fakai Wang, K. Zheng, Yirui Wang, Xiaoyun Zhou, Le Lu, Jing Xiao, Min Wu, C. Kuo, S. Miao
{"title":"Opportunistic Screening of Osteoporosis Using Plain Film Chest X-ray","authors":"Fakai Wang, K. Zheng, Yirui Wang, Xiaoyun Zhou, Le Lu, Jing Xiao, Min Wu, C. Kuo, S. Miao","doi":"10.1007/978-3-030-87602-9_13","DOIUrl":"https://doi.org/10.1007/978-3-030-87602-9_13","url":null,"abstract":"","PeriodicalId":344481,"journal":{"name":"PRIME@MICCAI","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131320304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
PRIME@MICCAI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1