A Short Survey of LSTM Models for De-identification of Medical Free Text

Joffrey L. Leevy, T. Khoshgoftaar
{"title":"A Short Survey of LSTM Models for De-identification of Medical Free Text","authors":"Joffrey L. Leevy, T. Khoshgoftaar","doi":"10.1109/CIC50333.2020.00023","DOIUrl":null,"url":null,"abstract":"The confidentiality of patient information is legislated by governmental regulations in various countries, such as the Health Insurance Portability and Accountability Act (HIPAA) standards in the USA. Under these laws, adequate protections must be in place to safeguard patients' health records, which are often big data comprised of free text. Machine learning approaches are extensively used for the automated de-identification of medical free text, with outstanding results obtained from several studies that incorporate long short-term memory (LSTM) networks. These networks are a variant of the recurrent neural network (RNN) architecture. Our survey of LSTM models dates back five years, and the contribution of the findings is appreciable. Performance-wise, LSTMs generally surpassed other types of models used in automated de-identification of free text, namely conditional random field (CRF) algorithms and rule-based algorithms. In addition, hybrid or ensemble LSTM models did not outperform LSTM -only models. Finally, we note that the customization of gold-standard, de-identification datasets may result in overfitted models.","PeriodicalId":265435,"journal":{"name":"2020 IEEE 6th International Conference on Collaboration and Internet Computing (CIC)","volume":"40 5-6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 6th International Conference on Collaboration and Internet Computing (CIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIC50333.2020.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The confidentiality of patient information is legislated by governmental regulations in various countries, such as the Health Insurance Portability and Accountability Act (HIPAA) standards in the USA. Under these laws, adequate protections must be in place to safeguard patients' health records, which are often big data comprised of free text. Machine learning approaches are extensively used for the automated de-identification of medical free text, with outstanding results obtained from several studies that incorporate long short-term memory (LSTM) networks. These networks are a variant of the recurrent neural network (RNN) architecture. Our survey of LSTM models dates back five years, and the contribution of the findings is appreciable. Performance-wise, LSTMs generally surpassed other types of models used in automated de-identification of free text, namely conditional random field (CRF) algorithms and rule-based algorithms. In addition, hybrid or ensemble LSTM models did not outperform LSTM -only models. Finally, we note that the customization of gold-standard, de-identification datasets may result in overfitted models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
医学自由文本去识别的LSTM模型综述
患者信息的保密性是由各国政府法规制定的,例如美国的《健康保险流通与责任法案》(HIPAA)标准。根据这些法律,必须有足够的保护措施来保护患者的健康记录,这些记录通常是由免费文本组成的大数据。机器学习方法被广泛用于医学自由文本的自动去识别,从几项结合长短期记忆(LSTM)网络的研究中获得了突出的结果。这些网络是递归神经网络(RNN)架构的一种变体。我们对LSTM模型的调查可以追溯到五年前,研究结果的贡献是可观的。在性能方面,lstm通常优于自由文本自动去识别中使用的其他类型的模型,即条件随机场(CRF)算法和基于规则的算法。此外,混合或集成LSTM模型并不优于仅LSTM模型。最后,我们注意到,定制金标准,去识别数据集可能导致过拟合模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Discovering Localized Information for Heterogeneous Graph Node Representation Learning 2020 IEEE 6th International Conference on Collaboration and Internet Computing CIC 2020 Invisible Security: Protecting Users with No Time to Spare Hcpcs2Vec: Healthcare Procedure Embeddings for Medicare Fraud Prediction The 10 Research Topics in the Internet of Things
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1