无线胶囊内窥镜图像中清晰和污染区域分割的像素注释:多中心数据库

IF 1 Q3 MULTIDISCIPLINARY SCIENCES Data in Brief Pub Date : 2024-09-10 DOI:10.1016/j.dib.2024.110927
Vahid Sadeghi , Yasaman Sanahmadi , Maryam Behdad , Alireza Vard , Mohsen Sharifi , Ahmad Raeisi , Mehdi Nikkhah , Alireza Mehridehnavi
{"title":"无线胶囊内窥镜图像中清晰和污染区域分割的像素注释:多中心数据库","authors":"Vahid Sadeghi ,&nbsp;Yasaman Sanahmadi ,&nbsp;Maryam Behdad ,&nbsp;Alireza Vard ,&nbsp;Mohsen Sharifi ,&nbsp;Ahmad Raeisi ,&nbsp;Mehdi Nikkhah ,&nbsp;Alireza Mehridehnavi","doi":"10.1016/j.dib.2024.110927","DOIUrl":null,"url":null,"abstract":"<div><p>Wireless capsule endoscopy (WCE) is capable of non-invasively visualizing the small intestine, the most complicated segment of the gastrointestinal tract, to detect different types of abnormalities. However, its main drawback is reviewing the vast number of captured images (more than 50,000 frames). The recorded images are only sometimes clear, and different contaminating agents, such as turbid materials and air bubbles, degrade the visualization quality of the WCE images. This condition could cause serious problems such as reducing mucosal view visualization, prolonging recorded video reviewing time, and increasing the risks of missing pathology. On the other hand, accurately quantifying the amount of turbid fluids and bubbles can indicate potential motility malfunction. To assist in developing computer vision-based techniques, we have constructed the first multicentre publicly available clear and contaminated annotated dataset by precisely segmenting 17,593 capsule endoscopy images from three different databases.</p><p>In contrast to the existing datasets, our dataset has been annotated at the pixel level, discriminating the clear and contaminated regions and subsequently differentiating bubbles and turbid fluids from normal tissue. To create the dataset, we first selected all of the images (2906 frames) in the reduced mucosal view class covering different levels of contamination and randomly selected 12,237 images from the normal class of the copyright-free CC BY 4.0 licensed small bowel capsule endoscopy (SBCE) images from the Kvasir capsule endoscopy database. To mitigate the possible available bias in the mentioned dataset and to increase the sample size, the number of 2077 and 373 images have been stochastically chosen from the SEE-AI project and CECleanliness datasets respectively for the subsequent annotation. Randomly selected images have been annotated with the aid of ImageJ and ITK-SNAP software under the supervision of an expert SBCE reader with extensive experience in gastroenterology and endoscopy. For each image, two binary and tri-colour ground truth (GT) masks have been created in which each pixel has been indexed into two classes (clear and contaminated) and three classes (bubble, turbid fluids, and normal), respectively.</p><p>To the best of the author's knowledge, there is no implemented clear and contaminated region segmentation on the capsule endoscopy reading software. Curated multicentre dataset can be utilized to implement applicable segmentation algorithms for identification of clear and contaminated regions and discrimination bubbles, as well as turbid fluids from normal tissue in the small intestine.</p><p>Since the annotated images belong to three different sources, they provide a diverse representation of the clear and contaminated patterns in the WCE images. This diversity is valuable for training the models that are more robust to variations in data characteristics and can generalize well across different subjects and settings. The inclusion of images from three different centres allows for robust cross-validation opportunities, where computer vision-based models can be trained on one centre's annotated images and evaluated on others.</p></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352340924008904/pdfft?md5=aefec9e5ef66ca2c5081671730d47c8a&pid=1-s2.0-S2352340924008904-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Pixel-wise annotation for clear and contaminated regions segmentation in wireless capsule endoscopy images: A multicentre database\",\"authors\":\"Vahid Sadeghi ,&nbsp;Yasaman Sanahmadi ,&nbsp;Maryam Behdad ,&nbsp;Alireza Vard ,&nbsp;Mohsen Sharifi ,&nbsp;Ahmad Raeisi ,&nbsp;Mehdi Nikkhah ,&nbsp;Alireza Mehridehnavi\",\"doi\":\"10.1016/j.dib.2024.110927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Wireless capsule endoscopy (WCE) is capable of non-invasively visualizing the small intestine, the most complicated segment of the gastrointestinal tract, to detect different types of abnormalities. However, its main drawback is reviewing the vast number of captured images (more than 50,000 frames). The recorded images are only sometimes clear, and different contaminating agents, such as turbid materials and air bubbles, degrade the visualization quality of the WCE images. This condition could cause serious problems such as reducing mucosal view visualization, prolonging recorded video reviewing time, and increasing the risks of missing pathology. On the other hand, accurately quantifying the amount of turbid fluids and bubbles can indicate potential motility malfunction. To assist in developing computer vision-based techniques, we have constructed the first multicentre publicly available clear and contaminated annotated dataset by precisely segmenting 17,593 capsule endoscopy images from three different databases.</p><p>In contrast to the existing datasets, our dataset has been annotated at the pixel level, discriminating the clear and contaminated regions and subsequently differentiating bubbles and turbid fluids from normal tissue. To create the dataset, we first selected all of the images (2906 frames) in the reduced mucosal view class covering different levels of contamination and randomly selected 12,237 images from the normal class of the copyright-free CC BY 4.0 licensed small bowel capsule endoscopy (SBCE) images from the Kvasir capsule endoscopy database. To mitigate the possible available bias in the mentioned dataset and to increase the sample size, the number of 2077 and 373 images have been stochastically chosen from the SEE-AI project and CECleanliness datasets respectively for the subsequent annotation. Randomly selected images have been annotated with the aid of ImageJ and ITK-SNAP software under the supervision of an expert SBCE reader with extensive experience in gastroenterology and endoscopy. For each image, two binary and tri-colour ground truth (GT) masks have been created in which each pixel has been indexed into two classes (clear and contaminated) and three classes (bubble, turbid fluids, and normal), respectively.</p><p>To the best of the author's knowledge, there is no implemented clear and contaminated region segmentation on the capsule endoscopy reading software. Curated multicentre dataset can be utilized to implement applicable segmentation algorithms for identification of clear and contaminated regions and discrimination bubbles, as well as turbid fluids from normal tissue in the small intestine.</p><p>Since the annotated images belong to three different sources, they provide a diverse representation of the clear and contaminated patterns in the WCE images. This diversity is valuable for training the models that are more robust to variations in data characteristics and can generalize well across different subjects and settings. The inclusion of images from three different centres allows for robust cross-validation opportunities, where computer vision-based models can be trained on one centre's annotated images and evaluated on others.</p></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2352340924008904/pdfft?md5=aefec9e5ef66ca2c5081671730d47c8a&pid=1-s2.0-S2352340924008904-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340924008904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924008904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

无线胶囊内窥镜(WCE)能够无创观察胃肠道中最复杂的小肠,检测不同类型的异常。然而,它的主要缺点是需要查看大量捕获的图像(超过 50,000 帧)。所记录的图像有时并不清晰,各种污染物(如浑浊物质和气泡)会降低 WCE 图像的可视化质量。这种情况可能会导致严重的问题,如降低粘膜视图的可视性、延长记录视频的查看时间以及增加遗漏病理的风险。另一方面,准确量化浑浊液体和气泡的数量可提示潜在的运动障碍。为了帮助开发基于计算机视觉的技术,我们对来自三个不同数据库的 17,593 张胶囊内窥镜图像进行了精确分割,从而构建了首个多中心公开可用的清晰和污染注释数据集。与现有数据集不同,我们的数据集在像素级别上进行了注释,区分了清晰和污染区域,随后又将气泡和浑浊液体与正常组织区分开来。为了创建该数据集,我们首先在缩小的粘膜视图类别中选取了涵盖不同污染程度的所有图像(2906 帧),然后从 Kvasir 胶囊内窥镜数据库中无版权限制的 CC BY 4.0 许可的小肠胶囊内窥镜(SBCE)图像中随机选取了 12237 幅正常类别的图像。为了减少上述数据集中可能存在的偏差并增加样本量,我们分别从 SEE-AI 项目和 CECleanliness 数据集中随机选择了 2077 张和 373 张图像进行注释。随机选取的图像在一位在胃肠病学和内窥镜检查方面具有丰富经验的 SBCE 专家读者的指导下,借助 ImageJ 和 ITK-SNAP 软件进行了标注。据笔者所知,胶囊内镜阅读软件还没有实现清晰和污染区域的分割。由于注释图像来自三个不同的来源,它们提供了 WCE 图像中清晰和污染模式的多样性。这种多样性对训练模型很有价值,因为模型对数据特征的变化更加稳健,并能在不同对象和环境中很好地泛化。将来自三个不同中心的图像包括在内,可以提供稳健的交叉验证机会,即基于计算机视觉的模型可以在一个中心的注释图像上进行训练,并在其他中心的注释图像上进行评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Pixel-wise annotation for clear and contaminated regions segmentation in wireless capsule endoscopy images: A multicentre database

Wireless capsule endoscopy (WCE) is capable of non-invasively visualizing the small intestine, the most complicated segment of the gastrointestinal tract, to detect different types of abnormalities. However, its main drawback is reviewing the vast number of captured images (more than 50,000 frames). The recorded images are only sometimes clear, and different contaminating agents, such as turbid materials and air bubbles, degrade the visualization quality of the WCE images. This condition could cause serious problems such as reducing mucosal view visualization, prolonging recorded video reviewing time, and increasing the risks of missing pathology. On the other hand, accurately quantifying the amount of turbid fluids and bubbles can indicate potential motility malfunction. To assist in developing computer vision-based techniques, we have constructed the first multicentre publicly available clear and contaminated annotated dataset by precisely segmenting 17,593 capsule endoscopy images from three different databases.

In contrast to the existing datasets, our dataset has been annotated at the pixel level, discriminating the clear and contaminated regions and subsequently differentiating bubbles and turbid fluids from normal tissue. To create the dataset, we first selected all of the images (2906 frames) in the reduced mucosal view class covering different levels of contamination and randomly selected 12,237 images from the normal class of the copyright-free CC BY 4.0 licensed small bowel capsule endoscopy (SBCE) images from the Kvasir capsule endoscopy database. To mitigate the possible available bias in the mentioned dataset and to increase the sample size, the number of 2077 and 373 images have been stochastically chosen from the SEE-AI project and CECleanliness datasets respectively for the subsequent annotation. Randomly selected images have been annotated with the aid of ImageJ and ITK-SNAP software under the supervision of an expert SBCE reader with extensive experience in gastroenterology and endoscopy. For each image, two binary and tri-colour ground truth (GT) masks have been created in which each pixel has been indexed into two classes (clear and contaminated) and three classes (bubble, turbid fluids, and normal), respectively.

To the best of the author's knowledge, there is no implemented clear and contaminated region segmentation on the capsule endoscopy reading software. Curated multicentre dataset can be utilized to implement applicable segmentation algorithms for identification of clear and contaminated regions and discrimination bubbles, as well as turbid fluids from normal tissue in the small intestine.

Since the annotated images belong to three different sources, they provide a diverse representation of the clear and contaminated patterns in the WCE images. This diversity is valuable for training the models that are more robust to variations in data characteristics and can generalize well across different subjects and settings. The inclusion of images from three different centres allows for robust cross-validation opportunities, where computer vision-based models can be trained on one centre's annotated images and evaluated on others.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data in Brief
Data in Brief MULTIDISCIPLINARY SCIENCES-
CiteScore
3.10
自引率
0.00%
发文量
996
审稿时长
70 days
期刊介绍: Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.
期刊最新文献
Dataset of dendrometer and environmental parameter measurements of two different species of the group of genera known as eucalypts in South Africa and Portugal Bulk mRNA-sequencing data of the estrogen and androgen responses in the human prostate cancer cell line VCaP A refined spirometry dataset for comparing segmented (piecewise) linear models to that of GAMLSS Shotgun metagenomics sequencing data of root microbial community of Huanglongbing-infected Citrus nobilis BEEHIVE: A dataset of Apis mellifera images to empower honeybee monitoring research
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1