Standard NER Tagging Scheme for Big Data Healthcare Analytics Built on Unified Medical Corpora

人工智能技术学报(英文) Pub Date : 2022-08-22 DOI:10.37965/jait.2022.0127

Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad

{"title":"Standard NER Tagging Scheme for Big Data Healthcare Analytics Built on Unified Medical Corpora","authors":"Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad","doi":"10.37965/jait.2022.0127","DOIUrl":null,"url":null,"abstract":"The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. \nIn our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.","PeriodicalId":70996,"journal":{"name":"人工智能技术学报(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"人工智能技术学报(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.37965/jait.2022.0127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. In our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于统一医疗语料库的大数据医疗分析标准NER标签方案

这项研究的动机来自于在发现医学背景学习的共同点时发现的差距，通过分析不同目的的诊断，推荐，处方或治疗患者的统一表型特征。本文的作者在寻找医学语境学习的可能解决方案时发现，缺少用医学术语标记的统一语料库来训练医学语境学习的分析。因此，我们展示了一种机制，提出了统一的NER(命名实体识别)标记的医学语料库，该语料库使用14407例诊断为糖尿病和合并症的内分泌患者的CSV格式数据集。另一个语料库是ICD-10-CM编码方案，文本格式取自www.icd10data.com。ICD-10-CM语料库将被标记，以统一地理解医学背景，为此我们正在使用常见的NLP技术和框架进行不同的实验，如;TensorFlow, Keras, LSTM和Bi-LSTM。在我们的初步实验中，虽然(实例，标签)对形式的标签集被标记为在TensorFlow上形成的Sequential()模型。Keras和Bi-LSTM NLP算法。模型验证的最大精度为0.8846。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

人工智能技术学报(英文)

CiteScore

8.70

自引率

0.00%

发文量