Standard NER Tagging Scheme for Big Data Healthcare Analytics Built on Unified Medical Corpora

Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad
{"title":"Standard NER Tagging Scheme for Big Data Healthcare Analytics Built on Unified Medical Corpora","authors":"Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad","doi":"10.37965/jait.2022.0127","DOIUrl":null,"url":null,"abstract":"The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of  diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. \nIn our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.","PeriodicalId":70996,"journal":{"name":"人工智能技术学报(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"人工智能技术学报(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.37965/jait.2022.0127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of  diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. In our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于统一医疗语料库的大数据医疗分析标准NER标签方案
这项研究的动机来自于在发现医学背景学习的共同点时发现的差距,通过分析不同目的的诊断,推荐,处方或治疗患者的统一表型特征。本文的作者在寻找医学语境学习的可能解决方案时发现,缺少用医学术语标记的统一语料库来训练医学语境学习的分析。因此,我们展示了一种机制,提出了统一的NER(命名实体识别)标记的医学语料库,该语料库使用14407例诊断为糖尿病和合并症的内分泌患者的CSV格式数据集。另一个语料库是ICD-10-CM编码方案,文本格式取自www.icd10data.com。ICD-10-CM语料库将被标记,以统一地理解医学背景,为此我们正在使用常见的NLP技术和框架进行不同的实验,如;TensorFlow, Keras, LSTM和Bi-LSTM。在我们的初步实验中,虽然(实例,标签)对形式的标签集被标记为在TensorFlow上形成的Sequential()模型。Keras和Bi-LSTM NLP算法。模型验证的最大精度为0.8846。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.70
自引率
0.00%
发文量
0
期刊最新文献
Detection of Streaks in Astronomical Images Using Machine Learning An Optimal BDCNN ML Architecture for Car Make Model Prediction A Bio-Inspired Method For Breast Histopathology Image Classification Using Transfer Learning Convolutional Neural Networks for Automated Diagnosis of Diabetic Retinopathy in Fundus Images Automated Staging and Grading for Retinopathy of Prematurity on Indian Database
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1