对结构不良的医疗数据进行初级处理的方法

Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì Pub Date : 2020-12-05 DOI:10.23939/sisn2020.08.001

Dmytro Bychko, V. Shendryk, Yuliia Parfenenko

{"title":"对结构不良的医疗数据进行初级处理的方法","authors":"Dmytro Bychko, V. Shendryk, Yuliia Parfenenko","doi":"10.23939/sisn2020.08.001","DOIUrl":null,"url":null,"abstract":"The article deals with the approach to the primary processing of poorly structured medical protocol textual data stored and disseminated as pdf files. The relevance of this work is due to the lack of a universal structure for the presentation of medical protocols and methods of their processing. In the course of the work, the problem of primary processing of clinical protocol data was solved by the example of a unified clinical protocol of primary, secondary (specialized) and tertiary (highly specialized) medical care. The method of primary data processing was developed to create a clear structure of the symptoms of the disease. The first step in structuring clinical protocol data is to divide the protocol information into four basic parts, which allows it to be quickly converted to other formats. This process is implemented using an algorithm developed in C # programming language. The proposed algorithm parses the information from a pdf file and converts it to a txt file. After that, the received information is processed, which consists in the syntactic analysis of the text of the protocol and selection of the structural parts of the protocol corresponding to the headings of the sections: title page; introduction; a list of abbreviations used in the protocol; the main part of the protocol; list of literary sources. The identification of the disease name in the medical protocol is performed by comparing the protocol data and the list of disease names, presented in the world classification MKH-10. The headings “Introduction”, “List of abbreviations used in the protocol” and the main part of the protocol were analyzed and the algorithm for removing uninformed sections from the beginning of the protocol, for example, literature sources, was proposed. An algorithm for finding information in the main part of the medical protocol by processing input data by: tables, diagrams, headings, words, phrases and special symbols are also proposed. As a result of the clinical protocol processing algorithms, a new clinical protocol file is generated, which is three times smaller than the original file. It contains only meaningful information from clinical protocols that will speed up further work on this file, namely its use in medical decision support. The disease card based on a medical protocol in JSON format is presented.","PeriodicalId":444399,"journal":{"name":"Vìsnik Nacìonalʹnogo unìversitetu \"Lʹvìvsʹka polìtehnìka\". Serìâ Ìnformacìjnì sistemi ta merežì","volume":"227 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The method of primary processing of poorly structured medical data\",\"authors\":\"Dmytro Bychko, V. Shendryk, Yuliia Parfenenko\",\"doi\":\"10.23939/sisn2020.08.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article deals with the approach to the primary processing of poorly structured medical protocol textual data stored and disseminated as pdf files. The relevance of this work is due to the lack of a universal structure for the presentation of medical protocols and methods of their processing. In the course of the work, the problem of primary processing of clinical protocol data was solved by the example of a unified clinical protocol of primary, secondary (specialized) and tertiary (highly specialized) medical care. The method of primary data processing was developed to create a clear structure of the symptoms of the disease. The first step in structuring clinical protocol data is to divide the protocol information into four basic parts, which allows it to be quickly converted to other formats. This process is implemented using an algorithm developed in C # programming language. The proposed algorithm parses the information from a pdf file and converts it to a txt file. After that, the received information is processed, which consists in the syntactic analysis of the text of the protocol and selection of the structural parts of the protocol corresponding to the headings of the sections: title page; introduction; a list of abbreviations used in the protocol; the main part of the protocol; list of literary sources. The identification of the disease name in the medical protocol is performed by comparing the protocol data and the list of disease names, presented in the world classification MKH-10. The headings “Introduction”, “List of abbreviations used in the protocol” and the main part of the protocol were analyzed and the algorithm for removing uninformed sections from the beginning of the protocol, for example, literature sources, was proposed. An algorithm for finding information in the main part of the medical protocol by processing input data by: tables, diagrams, headings, words, phrases and special symbols are also proposed. As a result of the clinical protocol processing algorithms, a new clinical protocol file is generated, which is three times smaller than the original file. It contains only meaningful information from clinical protocols that will speed up further work on this file, namely its use in medical decision support. The disease card based on a medical protocol in JSON format is presented.\",\"PeriodicalId\":444399,\"journal\":{\"name\":\"Vìsnik Nacìonalʹnogo unìversitetu \\\"Lʹvìvsʹka polìtehnìka\\\". Serìâ Ìnformacìjnì sistemi ta merežì\",\"volume\":\"227 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Vìsnik Nacìonalʹnogo unìversitetu \\\"Lʹvìvsʹka polìtehnìka\\\". Serìâ Ìnformacìjnì sistemi ta merežì\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23939/sisn2020.08.001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vìsnik Nacìonalʹnogo unìversitetu \"Lʹvìvsʹka polìtehnìka\". Serìâ Ìnformacìjnì sistemi ta merežì","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23939/sisn2020.08.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文讨论了以pdf文件形式存储和传播的结构不良的医疗协议文本数据的初级处理方法。这项工作的相关性是由于缺乏一种通用的结构来介绍医学协议及其处理方法。在工作过程中，以一级、二级(专科)和三级(高度专科)医疗统一临床方案为例，解决了临床方案数据的初级处理问题。原始数据处理方法的发展是为了创建疾病症状的清晰结构。构建临床方案数据的第一步是将方案信息划分为四个基本部分，使其能够快速转换为其他格式。该过程使用c#编程语言开发的算法实现。该算法从pdf文件中解析信息并将其转换为txt文件。之后，对接收到的信息进行处理，包括对协议文本进行句法分析，选择与各节标题对应的协议结构部分:标题页;介绍;协议中使用的缩略语列表;协议主体部分;文学来源列表。医学方案中疾病名称的识别是通过将方案数据与世界分类MKH-10中的疾病名称列表进行比较来完成的。对“引言”、“协议中使用的缩略语列表”和协议主体部分的标题进行了分析，并提出了从协议开头删除不知情部分(如文献来源)的算法。提出了一种通过表格、图表、标题、单词、短语和特殊符号对输入数据进行处理，查找医疗协议主体部分信息的算法。根据临床协议处理算法，生成一个新的临床协议文件，该文件比原始文件小三倍。它只包含来自临床协议的有意义的信息，这些信息将加快对该文件的进一步工作，即在医疗决策支持中的使用。提出了一种基于JSON格式医疗协议的疾病卡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The method of primary processing of poorly structured medical data

The article deals with the approach to the primary processing of poorly structured medical protocol textual data stored and disseminated as pdf files. The relevance of this work is due to the lack of a universal structure for the presentation of medical protocols and methods of their processing. In the course of the work, the problem of primary processing of clinical protocol data was solved by the example of a unified clinical protocol of primary, secondary (specialized) and tertiary (highly specialized) medical care. The method of primary data processing was developed to create a clear structure of the symptoms of the disease. The first step in structuring clinical protocol data is to divide the protocol information into four basic parts, which allows it to be quickly converted to other formats. This process is implemented using an algorithm developed in C # programming language. The proposed algorithm parses the information from a pdf file and converts it to a txt file. After that, the received information is processed, which consists in the syntactic analysis of the text of the protocol and selection of the structural parts of the protocol corresponding to the headings of the sections: title page; introduction; a list of abbreviations used in the protocol; the main part of the protocol; list of literary sources. The identification of the disease name in the medical protocol is performed by comparing the protocol data and the list of disease names, presented in the world classification MKH-10. The headings “Introduction”, “List of abbreviations used in the protocol” and the main part of the protocol were analyzed and the algorithm for removing uninformed sections from the beginning of the protocol, for example, literature sources, was proposed. An algorithm for finding information in the main part of the medical protocol by processing input data by: tables, diagrams, headings, words, phrases and special symbols are also proposed. As a result of the clinical protocol processing algorithms, a new clinical protocol file is generated, which is three times smaller than the original file. It contains only meaningful information from clinical protocols that will speed up further work on this file, namely its use in medical decision support. The disease card based on a medical protocol in JSON format is presented.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì

自引率

0.00%

发文量