机器学习的医疗数据集:标准化和系统化的基本原则

Y. Vasilev, T. Bobrovskaya, K. Arzamasov, S. Chetverikov, A. Vladzymyrskyy, O. Omelyanskaya, A. Andreychenko, N. Pavlov, L. N. Anishchenko
{"title":"机器学习的医疗数据集:标准化和系统化的基本原则","authors":"Y. Vasilev, T. Bobrovskaya, K. Arzamasov, S. Chetverikov, A. Vladzymyrskyy, O. Omelyanskaya, A. Andreychenko, N. Pavlov, L. N. Anishchenko","doi":"10.21045/1811-0185-2023-4-28-41","DOIUrl":null,"url":null,"abstract":"Backgraund: Active implementation of artificial intelligence technologies in the healthcare in recent years promotes increasing amount of medical data for the development of machine learning models, including radiology and instrumental diagnostics data. To solve various problems of digital medical technologies, new datasets are being created through machine learning algorithms, therefore, the problems of their systematization and standardization, storage, access, rational and safe use become actual. A i m : development of an approach to systematization and standardization of information about datasets to represent, store, apply and optimize the use of datasets and ensure the safety and transparency of the development and testing of medical devices using artificial intelligence. M a t e r i a l s a n d m e t h o d s : analysis of own and international experience in the creation and use of medical datasets, medical reference books searching and analysis, registry structure development and justification, scientific publications search with the keywords “datasets”, “registry of medical data”, placed in the databases of the RSCI, Scopus, Web of Science. R e s u l t s . The register of medical instrumental diagnostics datasets structure has been developed in accordance with stages of datasets lifecycle: 7 parameters at the initiation stage, 8 – at the planning stage, 70 – dataset card, 1 – version change, 14 – at the use stage, total – 100 parameters. We propose datasets classification according to the purpose of their creation, a classification of data verification methods, as well as the principles of forming names for standardization and datasets presentation clarity. In addition, the main features of the organization of maintaining this registry are highlighted: management, data quality, confidentiality and security. C o n c l u s i o n s . For the first time, an original technology of medical datasets for instrumental diagnostics structuring and systematization is proposed. It is based on the developed terminology and principles of information classification. This makes it possible to standardize the structure of information about datasets for machine learning, and ensures the storage centralization. It also allows to get quick access to all information about the dataset, and ensure transparency, reliability and reproducibility of artificial intelligence developments. Creating a registry makes it possible to quickly form visual data libraries. This allows a wide range of researchers, developers and companies to choose data sets for their tasks. This approach ensures their widespread use, resource optimization and contributes to the rapid development and implementation of artificial intelligence.","PeriodicalId":270155,"journal":{"name":"Manager Zdravookhranenia","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Medical datasets for machine learning: fundamental principles of standartization and systematization\",\"authors\":\"Y. Vasilev, T. Bobrovskaya, K. Arzamasov, S. Chetverikov, A. Vladzymyrskyy, O. Omelyanskaya, A. Andreychenko, N. Pavlov, L. N. Anishchenko\",\"doi\":\"10.21045/1811-0185-2023-4-28-41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Backgraund: Active implementation of artificial intelligence technologies in the healthcare in recent years promotes increasing amount of medical data for the development of machine learning models, including radiology and instrumental diagnostics data. To solve various problems of digital medical technologies, new datasets are being created through machine learning algorithms, therefore, the problems of their systematization and standardization, storage, access, rational and safe use become actual. A i m : development of an approach to systematization and standardization of information about datasets to represent, store, apply and optimize the use of datasets and ensure the safety and transparency of the development and testing of medical devices using artificial intelligence. M a t e r i a l s a n d m e t h o d s : analysis of own and international experience in the creation and use of medical datasets, medical reference books searching and analysis, registry structure development and justification, scientific publications search with the keywords “datasets”, “registry of medical data”, placed in the databases of the RSCI, Scopus, Web of Science. R e s u l t s . The register of medical instrumental diagnostics datasets structure has been developed in accordance with stages of datasets lifecycle: 7 parameters at the initiation stage, 8 – at the planning stage, 70 – dataset card, 1 – version change, 14 – at the use stage, total – 100 parameters. We propose datasets classification according to the purpose of their creation, a classification of data verification methods, as well as the principles of forming names for standardization and datasets presentation clarity. In addition, the main features of the organization of maintaining this registry are highlighted: management, data quality, confidentiality and security. C o n c l u s i o n s . For the first time, an original technology of medical datasets for instrumental diagnostics structuring and systematization is proposed. It is based on the developed terminology and principles of information classification. This makes it possible to standardize the structure of information about datasets for machine learning, and ensures the storage centralization. It also allows to get quick access to all information about the dataset, and ensure transparency, reliability and reproducibility of artificial intelligence developments. Creating a registry makes it possible to quickly form visual data libraries. This allows a wide range of researchers, developers and companies to choose data sets for their tasks. This approach ensures their widespread use, resource optimization and contributes to the rapid development and implementation of artificial intelligence.\",\"PeriodicalId\":270155,\"journal\":{\"name\":\"Manager Zdravookhranenia\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Manager Zdravookhranenia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21045/1811-0185-2023-4-28-41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Manager Zdravookhranenia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21045/1811-0185-2023-4-28-41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:近年来,人工智能技术在医疗保健领域的积极应用,促进了越来越多的医疗数据用于机器学习模型的开发,包括放射学和仪器诊断数据。为了解决数字医疗技术的各种问题,机器学习算法正在创建新的数据集,因此,它们的系统化和标准化、存储、访问、合理和安全使用问题成为现实。A im:制定数据集信息系统化和标准化的方法,以表示、存储、应用和优化数据集的使用,并确保使用人工智能开发和测试医疗设备的安全性和透明度。我的工作内容包括:分析自己和国际在创建和使用医疗数据集方面的经验,医学参考书的搜索和分析,注册表结构的开发和论证,科学出版物搜索关键词“数据集”,“医疗数据注册表”,放在RSCI、Scopus、Web of Science的数据库中。这是我最喜欢的。医疗仪器诊断数据集注册结构按照数据集生命周期阶段制定:启动阶段7个参数,规划阶段8个参数,数据集卡70个参数,版本变更1个参数,使用阶段14个参数,总共100个参数。我们根据数据集创建的目的提出了数据集分类、数据验证方法的分类以及标准化和数据集呈现清晰的形成名称的原则。此外,还强调了维护该注册表的组织的主要特征:管理、数据质量、机密性和安全性。我想我的孩子们都是这样的。首次提出了一种用于仪器诊断结构化和系统化的医疗数据集的原始技术。它基于已开发的术语和信息分类原则。这使得机器学习数据集的信息结构标准化成为可能,并保证了存储的集中化。它还允许快速访问有关数据集的所有信息,并确保人工智能发展的透明度、可靠性和可重复性。创建注册表使快速形成可视化数据库成为可能。这允许广泛的研究人员、开发人员和公司为他们的任务选择数据集。这种方法保证了它们的广泛使用,资源优化,有助于人工智能的快速发展和实施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Medical datasets for machine learning: fundamental principles of standartization and systematization
Backgraund: Active implementation of artificial intelligence technologies in the healthcare in recent years promotes increasing amount of medical data for the development of machine learning models, including radiology and instrumental diagnostics data. To solve various problems of digital medical technologies, new datasets are being created through machine learning algorithms, therefore, the problems of their systematization and standardization, storage, access, rational and safe use become actual. A i m : development of an approach to systematization and standardization of information about datasets to represent, store, apply and optimize the use of datasets and ensure the safety and transparency of the development and testing of medical devices using artificial intelligence. M a t e r i a l s a n d m e t h o d s : analysis of own and international experience in the creation and use of medical datasets, medical reference books searching and analysis, registry structure development and justification, scientific publications search with the keywords “datasets”, “registry of medical data”, placed in the databases of the RSCI, Scopus, Web of Science. R e s u l t s . The register of medical instrumental diagnostics datasets structure has been developed in accordance with stages of datasets lifecycle: 7 parameters at the initiation stage, 8 – at the planning stage, 70 – dataset card, 1 – version change, 14 – at the use stage, total – 100 parameters. We propose datasets classification according to the purpose of their creation, a classification of data verification methods, as well as the principles of forming names for standardization and datasets presentation clarity. In addition, the main features of the organization of maintaining this registry are highlighted: management, data quality, confidentiality and security. C o n c l u s i o n s . For the first time, an original technology of medical datasets for instrumental diagnostics structuring and systematization is proposed. It is based on the developed terminology and principles of information classification. This makes it possible to standardize the structure of information about datasets for machine learning, and ensures the storage centralization. It also allows to get quick access to all information about the dataset, and ensure transparency, reliability and reproducibility of artificial intelligence developments. Creating a registry makes it possible to quickly form visual data libraries. This allows a wide range of researchers, developers and companies to choose data sets for their tasks. This approach ensures their widespread use, resource optimization and contributes to the rapid development and implementation of artificial intelligence.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Expert assessment of indicators and indicators for the development of a model of digital maturity of primary health care Expert Approaches to Assessing the Organization of Oral and Maxillofacial Service at the regional level. Mixed, hybrid training as a necessary component of modern training of medical professionals. Early diagnosis of malignant skin tumors using a software product based on artificial intelligence and its medical and economic effect. Building an internal communication mechanism for participants in the procurement process of a public health institution.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1