脑血管动脉瘤研究大数据临床数据集的建立

Q4 Biochemistry, Genetics and Molecular Biology Sibirskii nauchnyi meditsinskii zhurnal Pub Date : 2023-06-23 DOI:10.18699/ssmj20230311
Ju. V. Kivelev, I. Saarenpää, A. Krivoshapkin
{"title":"脑血管动脉瘤研究大数据临床数据集的建立","authors":"Ju. V. Kivelev, I. Saarenpää, A. Krivoshapkin","doi":"10.18699/ssmj20230311","DOIUrl":null,"url":null,"abstract":"Variability and heterogeneity of digital medical data requires establishing of modern algorithms which provide appropriate data processing. The aim of the study was to delineate the main steps in formation of a clinical dataset of patients with brain aneurysms from the stage of producing primary mining specifications to formation of a final version.Material and methods. Data collection, crosschecking of the cases and analyses of dataset has been carried out in Turku University Hospital. Within last two decades available medical data at our hospital have been stored in digital data lake thus allowing automatized data mining. In frame of our study, data mining was performed by a data scientist utilizing R software. Inclusion criteria were based on a set of diagnosis which were coded in medical charts according to international classification of diseases (ICD 10).Resutls and Discussion. Primary data mining identified 3850 patients with brain aneurysms treated at our hospital from January 2000 till May 2018. After independent manual crosschecking of medical charts of these patients, we found 1218 (32 %) cases, which had no aneurysm (false-positive). Data of remaining true aneurysm-cases were divided into clinical and intensive care unit subsets where every event linked to particular date of treatment was defined as an info-unit. All the data in both subsets were structured into separate Excel files and presented in chronological order for each particular patient. Altogether, dataset included 70 000 000 rows of info-units found in 2632 patients.Conclusions. Data mining allowed establishment of detailed clinical dataset of patients with brain aneurysms. Produced mining algorithm had limitation regarding false-positive cases (32 % patients). Based on that, we recommend manual crosschecking of automatically collected dataset before statistical analysis.","PeriodicalId":33781,"journal":{"name":"Sibirskii nauchnyi meditsinskii zhurnal","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Establishing of big data clinical dataset in brain vessel aneurysm research\",\"authors\":\"Ju. V. Kivelev, I. Saarenpää, A. Krivoshapkin\",\"doi\":\"10.18699/ssmj20230311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Variability and heterogeneity of digital medical data requires establishing of modern algorithms which provide appropriate data processing. The aim of the study was to delineate the main steps in formation of a clinical dataset of patients with brain aneurysms from the stage of producing primary mining specifications to formation of a final version.Material and methods. Data collection, crosschecking of the cases and analyses of dataset has been carried out in Turku University Hospital. Within last two decades available medical data at our hospital have been stored in digital data lake thus allowing automatized data mining. In frame of our study, data mining was performed by a data scientist utilizing R software. Inclusion criteria were based on a set of diagnosis which were coded in medical charts according to international classification of diseases (ICD 10).Resutls and Discussion. Primary data mining identified 3850 patients with brain aneurysms treated at our hospital from January 2000 till May 2018. After independent manual crosschecking of medical charts of these patients, we found 1218 (32 %) cases, which had no aneurysm (false-positive). Data of remaining true aneurysm-cases were divided into clinical and intensive care unit subsets where every event linked to particular date of treatment was defined as an info-unit. All the data in both subsets were structured into separate Excel files and presented in chronological order for each particular patient. Altogether, dataset included 70 000 000 rows of info-units found in 2632 patients.Conclusions. Data mining allowed establishment of detailed clinical dataset of patients with brain aneurysms. Produced mining algorithm had limitation regarding false-positive cases (32 % patients). Based on that, we recommend manual crosschecking of automatically collected dataset before statistical analysis.\",\"PeriodicalId\":33781,\"journal\":{\"name\":\"Sibirskii nauchnyi meditsinskii zhurnal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sibirskii nauchnyi meditsinskii zhurnal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18699/ssmj20230311\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sibirskii nauchnyi meditsinskii zhurnal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18699/ssmj20230311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

摘要

数字医疗数据的可变性和异质性要求建立提供适当数据处理的现代算法。该研究的目的是描述形成脑动脉瘤患者临床数据集的主要步骤,从产生主要挖掘规范到形成最终版本。材料和方法。图尔库大学医院进行了数据收集、病例交叉检查和数据集分析。在过去的二十年里,我们医院的可用医疗数据已经存储在数字数据湖中,从而实现了自动化的数据挖掘。在我们的研究框架中,数据挖掘是由一位数据科学家利用R软件进行的。纳入标准基于根据国际疾病分类(ICD 10)在医学图表中编码的一组诊断。重申和讨论。从2000年1月到2018年5月,初步数据挖掘确定了3850名在我院接受治疗的脑动脉瘤患者。在对这些患者的病历进行独立手动交叉检查后,我们发现1218例(32%)没有动脉瘤(假阳性)。剩余真实动脉瘤病例的数据被分为临床和重症监护室亚组,其中与特定治疗日期相关的每个事件都被定义为一个信息单元。两个子集中的所有数据都被结构化到单独的Excel文件中,并按每个特定患者的时间顺序显示。总的来说,数据集包括在2632名患者中发现的7000000行信息单元。结论。数据挖掘允许建立脑动脉瘤患者的详细临床数据集。生成的挖掘算法对假阳性病例(32%的患者)有局限性。基于此,我们建议在统计分析之前对自动收集的数据集进行手动交叉检查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Establishing of big data clinical dataset in brain vessel aneurysm research
Variability and heterogeneity of digital medical data requires establishing of modern algorithms which provide appropriate data processing. The aim of the study was to delineate the main steps in formation of a clinical dataset of patients with brain aneurysms from the stage of producing primary mining specifications to formation of a final version.Material and methods. Data collection, crosschecking of the cases and analyses of dataset has been carried out in Turku University Hospital. Within last two decades available medical data at our hospital have been stored in digital data lake thus allowing automatized data mining. In frame of our study, data mining was performed by a data scientist utilizing R software. Inclusion criteria were based on a set of diagnosis which were coded in medical charts according to international classification of diseases (ICD 10).Resutls and Discussion. Primary data mining identified 3850 patients with brain aneurysms treated at our hospital from January 2000 till May 2018. After independent manual crosschecking of medical charts of these patients, we found 1218 (32 %) cases, which had no aneurysm (false-positive). Data of remaining true aneurysm-cases were divided into clinical and intensive care unit subsets where every event linked to particular date of treatment was defined as an info-unit. All the data in both subsets were structured into separate Excel files and presented in chronological order for each particular patient. Altogether, dataset included 70 000 000 rows of info-units found in 2632 patients.Conclusions. Data mining allowed establishment of detailed clinical dataset of patients with brain aneurysms. Produced mining algorithm had limitation regarding false-positive cases (32 % patients). Based on that, we recommend manual crosschecking of automatically collected dataset before statistical analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.40
自引率
0.00%
发文量
54
审稿时长
12 weeks
期刊最新文献
The relationship between methylation of tumor suppressor genes <i>APC</i>, <i>GSTP1</i>, <i>RASSF1A</i> and content of prostate-specific antigen- associated markers in prostate cancer diagnosis Changes in the <i>MIR-143</i. gene methylation pattern in the tumor tissue of the diffuse large B-cell lymphoma Some indicators of nutritional security of children with restrictive types of nutrition Influence of alcohol consumption on the levels of surfactant proteins SP-A and SP-D in blood in men and women in Novosibirsk Association of vitamin D deficiency and preterm birth
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1