Record Linkage in Healthcare: Applications, Opportunities, and Challenges for Public Health

G. Shah, K. Lertwachara, Anteneth Ayanso
{"title":"Record Linkage in Healthcare: Applications, Opportunities, and Challenges for Public Health","authors":"G. Shah, K. Lertwachara, Anteneth Ayanso","doi":"10.4018/JHDRI.2010070104","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed the development of new record linkage technologies that are increasingly being used for data integration in various application settings. The authors’ objective in this article is to provide a review of recent developments in medical record linkage and their implications in healthcare research and public health policies. In particular, the authors assess the key advantages and possible limitations of record linkage techniques and technologies in various health care scenarios where different pieces of patient records are collected and managed by different agencies. First, the authors provide a brief overview of deterministic, probabilistic, and unsupervised record linkage techniques and their advantages and limitations. Then, the authors describe current probablistic record linkage software and their functionalities, and present specific cases where probabilistic linkage has been successfully used to enhance decision-making in healthcare delivery as well as in healthcare-related public policy making. Finally, the authors outline some of the critical issues and challenges of integrating medical records across distributed databases, including technical considerations as well as concerns about patient privacy and confidentiality. databases tend to be fragmented and incomplete. Thus, the ability to compare and match data records from multiple sources in order to determine which sets of records belong to the same person, object, or event has become a critical task for many organizations. However. the possibility of extensive analysis using these databases relies on the ability to integrate heterogeneous databases across organizations and functional units. Such data integration requires the presence of an error-free unique identifier or key attribute common among the data sets beDOI: 10.4018/jhdri.2010070104 30 International Journal of Healthcare Delivery Reform Initiatives, 2(3), 29-47, July-September 2010 Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. ing matched. Unfortunately, in most real-world situations, this common key attribute across data sets is rarely available. Consequently, instead of relying upon a deterministic approach using unique identifiers, past research studies have proposed probabilistic algorithms to achieve the goal of record matching across heterogeneous databases. Among these early studies, seminal work by Newcombe, Kennedy, Axford, and James (1959) and Fellegi and Sunter (1969) provide theoretical frameworks for computeraided record linkage operations. Other more recent scholarly studies on this topic include Dey, Sarkar, and De (1998); Bell and Sethi (2001); Dey, Sarkar, and De (2002); Verykios, Moustakides, and Elfeky (2002); Sarathy and Muralidhar (2006); and Jiang, Sarkar, De, and Dey (2007). Although the algorithmic procedures to match data records suggested in these studies may vary, they share a common objective of linking records that belong to the same entity while minimizing the likelihood of erroneous matching (i.e., ensuring sensitivity and specificity). Statistical theory used in record linkage was developed in the 1950s and was further refined in the 1970s and 1980s (Jaro, 1989; Newcombe et al., 1959). Until the early1980s, no commercial record linkage software was marketed, and those with a need for record linkage had to develop their own software (e.g., the Generalized Record Linkage System (GRLS) developed at Statistics Canada). They often faced the choice of using less accurate methods or expending a considerable amount of resources to create proprietary systems. For example, in the late 1970s, the U.S. National Agricultural Statistics Service spent what is conservatively estimated as 50 staff-years to develop a state-of-the-art system (Day, 1997). In addition to the past studies mentioned above, scholarly work in this area span several other academic disciplines (e.g., statistics, information systems, management sciences) as well as communities of practitioners (e.g., in electronic commerce, public health, vital records, welfare fraud detection, e-government). In this article, we present a review of recent develoment in record linkage technologies relevant to healthcare research and public health policies. The remaining of the article is organized as follows. The next section summarizes the existing literature on record linkage and the importance of record linkage in healthcare and public health. A brief introduction to different record linkage techniques is presented. Examples of successful applications of record linkage in healthcare and public health are also offered. We then discuss potential opportunities and challenges in using record linkage. The last section concludes our discussion on this topic. PasT researcH In recorD LInKage Record linkage can be applicable both within and across data sources. Typically, record linkage is defined as a computer-based process of matching two or more records from different and often heterogeneous sources of data that refer to the same entities such as persons, events, or other objects of interest. However record linkage is sometimes performed within a single data set when multiple records are present in a single database for a person or other entity (e.g., records for multiple hospitalizations in a hospital discharge data set for a 12-month period). Record linkage within a single data set is also performed to remove duplicate records, referred to as “deduplication” (Winkler, 1999). There are many applications of record linkage in both public and private sectors and its use has become even more significant with advances in the underlying techniques and the implementation tools. Detailed technical descriptions of record linkage are available elsewhere (Fair, 1995, 1997; Newcombe, 1994). In addition to applications in health care and public health, record linkage is widely employed in other fields. For example, Probert, Semenciw, Mao, and Gentleman (1997) described how record linkage was used to integrate immigration and mortality databases in Canada. Quass and Starkey (2003), White (1997), and 17 more pages are available in the full version of this document, which may be purchased using the \"Add to Cart\" button on the product's webpage: www.igi-global.com/article/record-linkagehealthcare/51683?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Medicine, Healthcare, and Life Science. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2","PeriodicalId":352165,"journal":{"name":"International Journal of Healthcare Delivery Reform Initiatives","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Healthcare Delivery Reform Initiatives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/JHDRI.2010070104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Recent years have witnessed the development of new record linkage technologies that are increasingly being used for data integration in various application settings. The authors’ objective in this article is to provide a review of recent developments in medical record linkage and their implications in healthcare research and public health policies. In particular, the authors assess the key advantages and possible limitations of record linkage techniques and technologies in various health care scenarios where different pieces of patient records are collected and managed by different agencies. First, the authors provide a brief overview of deterministic, probabilistic, and unsupervised record linkage techniques and their advantages and limitations. Then, the authors describe current probablistic record linkage software and their functionalities, and present specific cases where probabilistic linkage has been successfully used to enhance decision-making in healthcare delivery as well as in healthcare-related public policy making. Finally, the authors outline some of the critical issues and challenges of integrating medical records across distributed databases, including technical considerations as well as concerns about patient privacy and confidentiality. databases tend to be fragmented and incomplete. Thus, the ability to compare and match data records from multiple sources in order to determine which sets of records belong to the same person, object, or event has become a critical task for many organizations. However. the possibility of extensive analysis using these databases relies on the ability to integrate heterogeneous databases across organizations and functional units. Such data integration requires the presence of an error-free unique identifier or key attribute common among the data sets beDOI: 10.4018/jhdri.2010070104 30 International Journal of Healthcare Delivery Reform Initiatives, 2(3), 29-47, July-September 2010 Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. ing matched. Unfortunately, in most real-world situations, this common key attribute across data sets is rarely available. Consequently, instead of relying upon a deterministic approach using unique identifiers, past research studies have proposed probabilistic algorithms to achieve the goal of record matching across heterogeneous databases. Among these early studies, seminal work by Newcombe, Kennedy, Axford, and James (1959) and Fellegi and Sunter (1969) provide theoretical frameworks for computeraided record linkage operations. Other more recent scholarly studies on this topic include Dey, Sarkar, and De (1998); Bell and Sethi (2001); Dey, Sarkar, and De (2002); Verykios, Moustakides, and Elfeky (2002); Sarathy and Muralidhar (2006); and Jiang, Sarkar, De, and Dey (2007). Although the algorithmic procedures to match data records suggested in these studies may vary, they share a common objective of linking records that belong to the same entity while minimizing the likelihood of erroneous matching (i.e., ensuring sensitivity and specificity). Statistical theory used in record linkage was developed in the 1950s and was further refined in the 1970s and 1980s (Jaro, 1989; Newcombe et al., 1959). Until the early1980s, no commercial record linkage software was marketed, and those with a need for record linkage had to develop their own software (e.g., the Generalized Record Linkage System (GRLS) developed at Statistics Canada). They often faced the choice of using less accurate methods or expending a considerable amount of resources to create proprietary systems. For example, in the late 1970s, the U.S. National Agricultural Statistics Service spent what is conservatively estimated as 50 staff-years to develop a state-of-the-art system (Day, 1997). In addition to the past studies mentioned above, scholarly work in this area span several other academic disciplines (e.g., statistics, information systems, management sciences) as well as communities of practitioners (e.g., in electronic commerce, public health, vital records, welfare fraud detection, e-government). In this article, we present a review of recent develoment in record linkage technologies relevant to healthcare research and public health policies. The remaining of the article is organized as follows. The next section summarizes the existing literature on record linkage and the importance of record linkage in healthcare and public health. A brief introduction to different record linkage techniques is presented. Examples of successful applications of record linkage in healthcare and public health are also offered. We then discuss potential opportunities and challenges in using record linkage. The last section concludes our discussion on this topic. PasT researcH In recorD LInKage Record linkage can be applicable both within and across data sources. Typically, record linkage is defined as a computer-based process of matching two or more records from different and often heterogeneous sources of data that refer to the same entities such as persons, events, or other objects of interest. However record linkage is sometimes performed within a single data set when multiple records are present in a single database for a person or other entity (e.g., records for multiple hospitalizations in a hospital discharge data set for a 12-month period). Record linkage within a single data set is also performed to remove duplicate records, referred to as “deduplication” (Winkler, 1999). There are many applications of record linkage in both public and private sectors and its use has become even more significant with advances in the underlying techniques and the implementation tools. Detailed technical descriptions of record linkage are available elsewhere (Fair, 1995, 1997; Newcombe, 1994). In addition to applications in health care and public health, record linkage is widely employed in other fields. For example, Probert, Semenciw, Mao, and Gentleman (1997) described how record linkage was used to integrate immigration and mortality databases in Canada. Quass and Starkey (2003), White (1997), and 17 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/record-linkagehealthcare/51683?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Medicine, Healthcare, and Life Science. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
医疗保健中的记录链接:公共卫生的应用、机遇和挑战
通常,记录链接被定义为一种基于计算机的过程,用于匹配来自不同且通常是异构的数据源的两条或多条记录,这些数据源引用相同的实体(如人员、事件或其他感兴趣的对象)。但是,当单个数据库中存在个人或其他实体的多条记录时,有时会在单个数据集中执行记录链接(例如,12个月期间住院出院数据集中的多次住院记录)。单个数据集中的记录链接也被执行以删除重复记录,称为“重复数据删除”(Winkler, 1999)。记录联系在公共和私营部门都有许多应用,随着基本技术和执行工具的进步,它的使用变得更加重要。关于记录联系的详细技术说明可在其他地方找到(Fair, 1995年,1997年;纽康比,1994)。除了在卫生保健和公共卫生领域的应用外,档案联动还广泛应用于其他领域。Quass和Starkey(2003年)、White(1997年)以及其他17页的内容可以在本文档的完整版本中找到,可以通过产品网页上的“添加到购物车”按钮购买:www.igi-global.com/article/record-linkagehealthcare/51683?camid=4v1本标题可在InfoSci-Journals、InfoSci-Journal journals、医学、保健和生命科学期刊中找到。向您的图书管理员推荐此产品:www.igi-global.com/e-resources/libraryrecommendation/?id=2
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PPP’s Application in Greek Health Infrastructure and Personnel’s Response Predicting Diabetes amongst Native American Elders: The Importance of Comorbid Diseases and their Interactions Clinical Commissioning Groups in the UK: A Knowledge Management Study The telematics infrastructure: The backbone of the German e-health card A Simple Model for a Complex Issue
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1