首页 > 最新文献

International Journal of Population Data Science最新文献

英文 中文
De-identification of Free Text Data containing Personal Health Information: A Scoping Review of Reviews 对包含个人健康信息的自由文本数据进行去身份化处理:审查范围界定审查
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-12-12 DOI: 10.23889/ijpds.v8i1.2153
Bekelu Negash, Alan Katz, Christine J. Neilson, Moniruzzaman Moni, Marc Nesca, Alexander Singer, J. Enns
IntroductionUsing data in research often requires that the data first be de-identified, particularly in the case of health data, which often include Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHII). There are established procedures for de-identifying structured data, but de-identifying clinical notes, electronic health records, and other records that include free text data is more complex. Several different ways to achieve this are documented in the literature. This scoping review identifies categories of de-identification methods that can be used for free text data.MethodsWe adopted an established scoping review methodology to examine review articles published up to May 9, 2022, in Ovid MEDLINE; Ovid Embase; Scopus; the ACM Digital Library; IEEE Explore; and Compendex. Our research question was: What methods are used to de-identify free text data? Two independent reviewers conducted title and abstract screening and full-text article screening using the online review management tool Covidence.ResultsThe initial literature search retrieved 3,312 articles, most of which focused primarily on structured data. Eighteen publications describing methods of de-identification of free text data met the inclusion criteria for our review. The majority of the included articles focused on removing categories of personal health information identified by the Health Insurance Portability and Accountability Act (HIPAA). The de-identification methods they described combined rule-based methods or machine learning with other strategies such as deep learning.ConclusionOur review identifies and categorises de-identification methods for free text data as rule-based methods, machine learning, deep learning and a combination of these and other approaches. Most of the articles we found in our search refer to de-identification methods that target some or all categories of PHII. Our review also highlights how de-identification systems for free text data have evolved over time and points to hybrid approaches as the most promising approach for the future.
导言在研究中使用数据通常需要首先对数据进行去标识化处理,尤其是健康数据,其中通常包括个人身份信息 (PII) 和/或个人健康识别信息 (PHII)。对结构化数据进行去标识化已有既定程序,但对临床笔记、电子健康记录和其他包含自由文本数据的记录进行去标识化则更为复杂。文献中记载了几种不同的实现方法。本范围综述确定了可用于自由文本数据的去标识化方法的类别。方法我们采用既定的范围综述方法,研究了截至 2022 年 5 月 9 日在 Ovid MEDLINE、Ovid Embase、Scopus、ACM 数字图书馆、IEEE Explore 和 Compendex 上发表的综述文章。我们的研究问题是使用什么方法对自由文本数据进行去标识化?两位独立审稿人使用在线审稿管理工具 Covidence 进行了标题和摘要筛选以及全文筛选。结果最初的文献检索共检索到 3312 篇文章,其中大部分主要侧重于结构化数据。有 18 篇介绍自由文本数据去标识化方法的文章符合我们的审查纳入标准。所收录的文章大多侧重于删除《健康保险可携性与责任法案》(HIPAA)所确定的个人健康信息类别。我们的综述将自由文本数据的去标识化方法分为基于规则的方法、机器学习、深度学习以及这些方法和其他方法的组合。我们在搜索中发现的大多数文章都提到了针对某些或所有 PHII 类别的去标识化方法。我们的综述还强调了自由文本数据去标识化系统是如何随着时间的推移而演变的,并指出混合方法是未来最有前途的方法。
{"title":"De-identification of Free Text Data containing Personal Health Information: A Scoping Review of Reviews","authors":"Bekelu Negash, Alan Katz, Christine J. Neilson, Moniruzzaman Moni, Marc Nesca, Alexander Singer, J. Enns","doi":"10.23889/ijpds.v8i1.2153","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2153","url":null,"abstract":"IntroductionUsing data in research often requires that the data first be de-identified, particularly in the case of health data, which often include Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHII). There are established procedures for de-identifying structured data, but de-identifying clinical notes, electronic health records, and other records that include free text data is more complex. Several different ways to achieve this are documented in the literature. This scoping review identifies categories of de-identification methods that can be used for free text data.\u0000MethodsWe adopted an established scoping review methodology to examine review articles published up to May 9, 2022, in Ovid MEDLINE; Ovid Embase; Scopus; the ACM Digital Library; IEEE Explore; and Compendex. Our research question was: What methods are used to de-identify free text data? Two independent reviewers conducted title and abstract screening and full-text article screening using the online review management tool Covidence.\u0000ResultsThe initial literature search retrieved 3,312 articles, most of which focused primarily on structured data. Eighteen publications describing methods of de-identification of free text data met the inclusion criteria for our review. The majority of the included articles focused on removing categories of personal health information identified by the Health Insurance Portability and Accountability Act (HIPAA). The de-identification methods they described combined rule-based methods or machine learning with other strategies such as deep learning.\u0000ConclusionOur review identifies and categorises de-identification methods for free text data as rule-based methods, machine learning, deep learning and a combination of these and other approaches. Most of the articles we found in our search refer to de-identification methods that target some or all categories of PHII. Our review also highlights how de-identification systems for free text data have evolved over time and points to hybrid approaches as the most promising approach for the future.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"63 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139009912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Four questions to guide decision-making for data sharing and integration. 指导数据共享和整合决策的四个问题。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-10-04 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i2.2159
Amy Hawn Nelson, Sharon Zanti

Introduction: This paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use.

Objectives: While this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context.

Methods: The framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States.

Results: The Four Questions-Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?-should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework.

Conclusions: A robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project.

Highlights: Strong data governance has five qualities: it is purpose-, value-, and principle-driven; strategically located; collaborative; iterative; and transparent.Through a series of public deliberation workgroups and 15 years of field experience, we developed a Four Question Framework to determine whether and how to move forward with building an IDS and at each stage of a data sharing and integration project.The Four Questions-Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?-should be carefully considered within established data governance processes and among core partners.

导言:本文提出了一个 "四问框架",以指导数据集成合作伙伴建立强大的治理和法律基础,支持合乎道德的数据使用:虽然该框架是根据美国日常整合公共数据的工作制定的,但它旨在成为一个简单易懂的工具,可适用于任何情况:方法:该框架是通过一系列公共审议工作组和 15 年来在美国各地与各种数据整合工作打交道的实地经验制定的:四个问题--这样做合法吗?这符合道德规范吗?这是个好主意吗?我们如何知道(以及由谁来决定)?这四个问题应在既定的数据管理框架内与核心合作伙伴一起考虑,以确定在建立集成数据系统 (IDS) 时以及在具体数据项目的每个阶段是否以及如何向前推进。我们将深入讨论这些问题,尤其关注治理在建立合法和合乎道德的数据使用方面的作用。此外,我们还提供了两个 IDS 站点的数据管理结构示例和假设情况,说明了四问框架的主要考虑因素:健全的管理流程对于确定数据共享和整合是否合法、合乎道德以及在当地环境下是否是一个好主意至关重要。这一过程是迭代性的,既是技术性的,也是关系性的,这意味着在数据使用项目的每个阶段都应优先考虑合作伙伴之间的真实合作。四个问题 "可作为确定是否进行数据共享和整合的指南,并应在项目的整个生命周期中定期重新审视:通过一系列公开讨论工作组和 15 年的实地经验,我们制定了 "四问框架",以确定是否以及如何在数据共享和整合项目的各个阶段推进 IDS 建设。这符合道德规范吗?这是个好主意吗?我们如何知道(由谁决定)?
{"title":"Four questions to guide decision-making for data sharing and integration.","authors":"Amy Hawn Nelson, Sharon Zanti","doi":"10.23889/ijpds.v8i2.2159","DOIUrl":"10.23889/ijpds.v8i2.2159","url":null,"abstract":"<p><strong>Introduction: </strong>This paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use.</p><p><strong>Objectives: </strong>While this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context.</p><p><strong>Methods: </strong>The framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States.</p><p><strong>Results: </strong>The Four Questions-<i>Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?</i>-should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework.</p><p><strong>Conclusions: </strong>A robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project.</p><p><strong>Highlights: </strong>Strong data governance has five qualities: it is purpose-, value-, and principle-driven; strategically located; collaborative; iterative; and transparent.Through a series of public deliberation workgroups and 15 years of field experience, we developed a Four Question Framework to determine whether and how to move forward with building an IDS and at each stage of a data sharing and integration project.The Four Questions-<i>Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?</i>-should be carefully considered within established data governance processes and among core partners.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 4","pages":"2159"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10900076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139991374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seasonal purchase of antihistamines and ovarian cancer risk in the Cancer Loyalty Card Study (CLOCS): results from an observational case-control study 癌症忠诚度卡研究(CLOCS)中季节性购买抗组胺药与卵巢癌症风险:观察性病例对照研究结果
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-06-04 DOI: 10.1101/2023.05.30.23290729
H. Brewer, Q. Jiang, S. Sundar, Y. Hirst, J. Flanagan
Objective: Antihistamine use has previously been associated with a reduction in incidence of ovarian cancer, particularly in premenopausal women. Herein, we investigate antihistamine exposure in relation to ovarian cancer risk using a novel data resource by examining purchase histories from retailer loyalty card data. Study Design: A subset of participants from the Cancer Loyalty Card Study (CLOCS) for which purchase histories were available were analysed in this study. Cases (n=153) were women in the UK with a first diagnosis of ovarian cancer between Jan 2018 to Jan 2022. Controls (n=120) were women in the UK without a diagnosis of ovarian cancer. Up to 6 years of purchase history was retrieved from two participating high street retailers from 2014 to 2022. Main outcome measures: Logistic regression was used to estimate the odds ratio (OR) and 95% confidence intervals (CIs) for ovarian cancer associated with antihistamine purchases, ever versus never, adjusting for age and oral contraceptive use. The association was stratified by season of purchase, age over and under 50 years, ovarian cancer histology, and family history. Results: Ever purchasing antihistamines was not significantly associated with ovarian cancer overall in this small study (OR:0.68, 95% CI: 0.39,1.19). However, antihistamine purchases were significantly associated with reduced ovarian cancer risk when purchased only in spring and/or summer (OR: 0.37, 95% CI: 0.17,0.82) compared with purchasing all year (OR: 0.99, 95% CI: 0.51,1.92). In the stratified analysis, the association was strongest in non-serous ovarian cancer (OR: 0.41, 95% CI:0.18,0.93). Conclusions: Antihistamine purchase is associated with reduced ovarian cancer risk when purchased seasonally in spring and summer. However, larger studies and more research is required to understand the mechanisms of reduced ovarian cancer risk related to seasonal purchases of antihistamines and allergies.
目的:抗组胺药物的使用与卵巢癌症发病率的降低有关,尤其是在绝经前妇女中。在此,我们通过检查零售商忠诚度卡数据中的购买历史,使用一种新的数据资源,研究抗组胺药物暴露与卵巢癌症风险的关系。研究设计:本研究分析了癌症忠诚度卡研究(CLOCS)的一部分参与者,他们有购买历史。病例(n=153)为2018年1月至2022年1月期间首次诊断为卵巢癌症的英国女性。对照组(n=120)为英国未被诊断为卵巢癌症的女性。2014年至2022年,从两家参与的商业街零售商那里检索到了长达6年的购买历史。主要结果指标:使用Logistic回归来估计卵巢癌症与抗组胺药购买相关的比值比(OR)和95%置信区间(CI),无论是否购买,均根据年龄和口服避孕药使用进行调整。根据购买季节、50岁以上和50岁以下年龄、卵巢癌症组织学和家族史对这种关联进行分层。结果:在这项小型研究中,购买抗组胺药与卵巢癌症总体无显著相关性(OR:0.68,95%CI:0.39,1.19),与全年购买(or:0.99,95%CI:0.51,1.92)相比,仅在春季和/或夏季购买抗组胺药与降低卵巢癌症风险显著相关(or:0.37,95%CI:0.17,0.82)。在分层分析中,结论:春季和夏季季节性购买抗组胺药物可降低卵巢癌症风险。然而,需要更大规模的研究和更多的研究来了解与季节性购买抗组胺药和过敏相关的卵巢癌症风险降低的机制。
{"title":"Seasonal purchase of antihistamines and ovarian cancer risk in the Cancer Loyalty Card Study (CLOCS): results from an observational case-control study","authors":"H. Brewer, Q. Jiang, S. Sundar, Y. Hirst, J. Flanagan","doi":"10.1101/2023.05.30.23290729","DOIUrl":"https://doi.org/10.1101/2023.05.30.23290729","url":null,"abstract":"Objective: Antihistamine use has previously been associated with a reduction in incidence of ovarian cancer, particularly in premenopausal women. Herein, we investigate antihistamine exposure in relation to ovarian cancer risk using a novel data resource by examining purchase histories from retailer loyalty card data. Study Design: A subset of participants from the Cancer Loyalty Card Study (CLOCS) for which purchase histories were available were analysed in this study. Cases (n=153) were women in the UK with a first diagnosis of ovarian cancer between Jan 2018 to Jan 2022. Controls (n=120) were women in the UK without a diagnosis of ovarian cancer. Up to 6 years of purchase history was retrieved from two participating high street retailers from 2014 to 2022. Main outcome measures: Logistic regression was used to estimate the odds ratio (OR) and 95% confidence intervals (CIs) for ovarian cancer associated with antihistamine purchases, ever versus never, adjusting for age and oral contraceptive use. The association was stratified by season of purchase, age over and under 50 years, ovarian cancer histology, and family history. Results: Ever purchasing antihistamines was not significantly associated with ovarian cancer overall in this small study (OR:0.68, 95% CI: 0.39,1.19). However, antihistamine purchases were significantly associated with reduced ovarian cancer risk when purchased only in spring and/or summer (OR: 0.37, 95% CI: 0.17,0.82) compared with purchasing all year (OR: 0.99, 95% CI: 0.51,1.92). In the stratified analysis, the association was strongest in non-serous ovarian cancer (OR: 0.41, 95% CI:0.18,0.93). Conclusions: Antihistamine purchase is associated with reduced ovarian cancer risk when purchased seasonally in spring and summer. However, larger studies and more research is required to understand the mechanisms of reduced ovarian cancer risk related to seasonal purchases of antihistamines and allergies.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43794449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons learned from using linked administrative data to evaluate the Family Nurse Partnership in England and Scotland. 利用关联行政数据评估英格兰和苏格兰家庭护士伙伴关系的经验教训。
IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-05-11 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i1.2113
Francesca L Cavallaro, Rebecca Cannings-John, Fiona Lugg-Widger, Ruth Gilbert, Eilis Kennedy, Sally Kendall, Michael Robling, Katie L Harron

Introduction: "Big data" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.

Objectives: We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.

Methods: We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.

Results: Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.

Conclusions: Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.

导言:"大数据"(包括关联的行政数据)可用于评估妇幼保健干预措施,为随机对照试验提供时间和成本效益上的替代方案。然而,利用这些数据来评估人口层面的干预措施可能具有挑战性:我们的目的是通过描述偏倚来源、经验教训和改进建议,为未来复杂干预措施的评估提供信息。我们基于两项观察性研究,使用来自卫生、教育和社会护理部门的关联行政数据,对英格兰和苏格兰的家庭护士伙伴关系(FNP)进行了评估:我们首先考虑了行政数据中不同来源的潜在偏差会如何影响评估结果。我们探讨了每项研究设计如何利用数据中的孕产妇混杂因素来解决这些偏差来源。然后,我们确定了在复杂干预的每个步骤中还可以获取哪些信息,以使分析人员能够最大限度地减少偏差,并最大限度地提高干预组和常规护理组之间的可比性,从而将观察到的任何差异归因于干预:总结出的经验包括:i) 需要有关干预活动(日期/地理位置)和常规护理的详细数据;ii) 改进有关数据链接质量的信息,以准确描述对照组的特征;iii) 更有效地提供链接数据,以确保结果的及时性;iv) 更好地测量影响合格者、接触者和注册者的混杂特征:链接的行政数据是评估 FNP 国家计划和其他复杂的人口干预措施的宝贵资源。然而,还需要有关当地计划实施和常规护理的信息,以考虑到接受干预者的特征偏差,并为了解效果机制提供信息。如果能将计划的实施与改进国家和地方数据收集工作以及稳健的准实验设计结合起来,就更有可能对复杂的公共卫生评价进行全国性的、持续的、稳健的评价。
{"title":"Lessons learned from using linked administrative data to evaluate the Family Nurse Partnership in England and Scotland.","authors":"Francesca L Cavallaro, Rebecca Cannings-John, Fiona Lugg-Widger, Ruth Gilbert, Eilis Kennedy, Sally Kendall, Michael Robling, Katie L Harron","doi":"10.23889/ijpds.v8i1.2113","DOIUrl":"10.23889/ijpds.v8i1.2113","url":null,"abstract":"<p><strong>Introduction: </strong>\"Big data\" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.</p><p><strong>Objectives: </strong>We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.</p><p><strong>Methods: </strong>We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.</p><p><strong>Results: </strong>Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.</p><p><strong>Conclusions: </strong>Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"2113"},"PeriodicalIF":1.6,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/fc/b1/ijpds-08-2113.PMC10476150.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10225318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sociodemographic inequalities of suicide: a population-based cohort study of adults in England and Wales 2011-2021 自杀的社会形态不平等:2011-2021年英格兰和威尔士成年人的一项基于人群的队列研究
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-04-06 DOI: 10.1101/2023.04.05.23288190
I. Ward, Katie Finning, D. Ayoubkhani, Katie Hendry, E. Sharland, Louis Appleby, V. Nafilyan
Background: Risk of suicide is complex and often a result of multiple interacting factors. It is vital research identifies predictors of suicide to provide a strong evidence base for targeted interventions. Methods: Using linked Census and population level mortality data we estimated rates of suicide across different groups in England and Wales and examine which factors are independently associated with the risk of suicide. Findings: The highest rates of suicide were amongst those who reported an impairment affecting their day-to-day activities, those who were long term unemployed or never had worked, or those who were single or separated. Rates of suicide were highest in the White and Mixed/multiple ethnic groups compared to other ethnicities, and in people who reported a religious affiliation compared with those who had no religion. Comparison of minimally adjusted models (predictor, sex and age) with fully-adjusted models (sex, age, ethnicity, region, partnership status, religious affiliation, day-to-day impairments, armed forces membership and socioeconomic status) identified key predictors which remain important risk factors after accounting for other characteristics; day-to-day impairments were still found to increase the incidence of suicide relative to those whose activities were not impaired after adjusting for employment status. Overall, rates of suicide were higher in men compared to females across all ages, with the highest rates in 40-to-50-year-olds. Interpretation: The findings of this work provide novel population level insights into the risk of suicide by sociodemographic characteristics. Understanding the interaction between key risk factors for suicide has important implications for national suicide prevention strategies.
背景:自杀风险是复杂的,往往是多种相互作用因素的结果。至关重要的是,研究确定自杀的预测因素,为有针对性的干预措施提供强有力的证据基础。方法:使用关联的人口普查和人口水平死亡率数据,我们估计了英格兰和威尔士不同群体的自杀率,并检查了哪些因素与自杀风险独立相关。调查结果:自杀率最高的是那些报告日常活动受到影响的人、长期失业或从未工作过的人、单身或分居的人。与其他种族相比,白人和混合/多族裔群体的自杀率最高,与无宗教信仰的人相比,有宗教信仰的人群的自杀率也最高。将最低调整模型(预测因子、性别和年龄)与完全调整模型(性别、年龄、种族、地区、伙伴关系、宗教信仰、日常损伤、武装部队成员和社会经济地位)进行比较,确定了在考虑其他特征后仍然是重要风险因素的关键预测因子;在调整就业状况后,与那些活动没有受损的人相比,日常损伤仍然会增加自杀的发生率。总体而言,在所有年龄段,男性的自杀率都高于女性,40-50岁的自杀率最高。解读:这项工作的发现为社会人口学特征带来的自杀风险提供了新的人群层面的见解。了解自杀的关键风险因素之间的相互作用对国家自杀预防战略具有重要意义。
{"title":"Sociodemographic inequalities of suicide: a population-based cohort study of adults in England and Wales 2011-2021","authors":"I. Ward, Katie Finning, D. Ayoubkhani, Katie Hendry, E. Sharland, Louis Appleby, V. Nafilyan","doi":"10.1101/2023.04.05.23288190","DOIUrl":"https://doi.org/10.1101/2023.04.05.23288190","url":null,"abstract":"Background: Risk of suicide is complex and often a result of multiple interacting factors. It is vital research identifies predictors of suicide to provide a strong evidence base for targeted interventions. Methods: Using linked Census and population level mortality data we estimated rates of suicide across different groups in England and Wales and examine which factors are independently associated with the risk of suicide. Findings: The highest rates of suicide were amongst those who reported an impairment affecting their day-to-day activities, those who were long term unemployed or never had worked, or those who were single or separated. Rates of suicide were highest in the White and Mixed/multiple ethnic groups compared to other ethnicities, and in people who reported a religious affiliation compared with those who had no religion. Comparison of minimally adjusted models (predictor, sex and age) with fully-adjusted models (sex, age, ethnicity, region, partnership status, religious affiliation, day-to-day impairments, armed forces membership and socioeconomic status) identified key predictors which remain important risk factors after accounting for other characteristics; day-to-day impairments were still found to increase the incidence of suicide relative to those whose activities were not impaired after adjusting for employment status. Overall, rates of suicide were higher in men compared to females across all ages, with the highest rates in 40-to-50-year-olds. Interpretation: The findings of this work provide novel population level insights into the risk of suicide by sociodemographic characteristics. Understanding the interaction between key risk factors for suicide has important implications for national suicide prevention strategies.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47342216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association between neighbourhood composition, kindergarten educator-reported distance learning barriers, and return to school concerns during the first wave of the COVID-19 pandemic in Ontario, Canada. 在加拿大安大略省新冠肺炎第一波疫情期间,社区构成、幼儿园教育者报告的远程学习障碍和返校问题之间的关联。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-04-04 eCollection Date: 2022-01-01 DOI: 10.23889/ijpds.v7i4.1761
Natalie Spadafora, Jade Wang, Caroline Reid-Westoby, Magdalena Janus

Introduction: Research to date has established that the COVID-19 pandemic has not impacted everyone equitably. Whether this unequitable impact was seen educationally with regards to educator reported barriers to distance learning, concerns and mental health is less clear.

Objective: The objective of this study was to explore the association between the neighbourhood composition of the school and kindergarten educator-reported barriers and concerns regarding children's learning during the first wave of COVID-19 related school closures in Ontario, Canada.

Methods: In the spring of 2020, we collected data from Ontario kindergarten educators (n = 2569; 74.2% kindergarten teachers, 25.8% early childhood educators; 97.6% female) using an online survey asking them about their experiences and challenges with online learning during the first round of school closures. We linked the educator responses to 2016 Canadian Census variables based on schools' postal codes. Bivariate correlations and Poisson regression analyses were used to determine if there was an association between neighbourhood composition and educator mental health, and the number of barriers and concerns reported by kindergarten educators.

Results: There were no significant findings with educator mental health and school neighbourhood characteristics. Educators who taught at schools in neighbourhoods with lower median income reported a greater number of barriers to online learning (e.g., parents/guardians not submitting assignments/providing updates on their child's learning) and concerns regarding the return to school in the fall of 2020 (e.g., students' readjustment to routines). There were no significant associations with educator reported barriers or concerns and any of the other Census neighbourhood variables (proportion of lone parent families, average household size, proportion of population that do no speak official language, proportion of population that are recent immigrants, or proportion of population ages 0-4).

Conclusions: Overall, our study suggests that the neighbourhood composition of the children's school location did not exacerbate the potential negative learning experiences of kindergarten students and educators during the COVID-19 pandemic, although we did find that educators teaching in schools in lower-SES neighbourhoods reported more barriers to online learning during this time. Taken together, our study suggests that remediation efforts should be focused on individual kindergarten children and their families as opposed to school location.

简介:迄今为止的研究表明,新冠肺炎大流行并没有公平地影响到每个人。这种不公平的影响是否在教育上被视为教育工作者报告的远程学习障碍、担忧和心理健康,目前尚不清楚。目的:本研究的目的是探讨在加拿大安大略省第一波新冠肺炎相关学校关闭期间,学校和幼儿园教育者报告的障碍与儿童学习问题之间的社区构成之间的关系。方法:2020年春季,我们通过在线调查收集了安大略省幼儿园教育工作者(n=2569;74.2%的幼儿园教师,25.8%的幼儿教育工作者;97.6%的女性)的数据,询问他们在第一轮学校关闭期间在线学习的经历和挑战。我们根据学校的邮政编码将教育工作者的反应与2016年加拿大人口普查变量联系起来。使用双变量相关性和泊松回归分析来确定邻里构成与教育者心理健康之间是否存在关联,以及幼儿园教育者报告的障碍和担忧的数量。结果:在教育者心理健康和学校邻里特征方面没有显著的发现。在中等收入较低社区的学校任教的教育工作者报告说,在线学习存在更多障碍(例如,父母/监护人没有提交作业/提供孩子学习的最新情况),并对2020年秋季返校表示担忧(例如,学生对日常生活的调整)。与教育工作者报告的障碍或担忧以及任何其他人口普查邻里变量(单亲家庭比例、平均家庭规模、不会说官方语言的人口比例、新移民人口比例或0-4岁人口比例)没有显著关联,我们的研究表明,在新冠肺炎大流行期间,儿童学校所在地的社区构成并没有加剧幼儿园学生和教育工作者的潜在负面学习体验,尽管我们确实发现,在社会经济地位较低的社区教学的教育工作者报告称,在此期间,在线学习面临更多障碍。总之,我们的研究表明,补救工作应侧重于幼儿园儿童及其家庭,而不是学校所在地。
{"title":"Association between neighbourhood composition, kindergarten educator-reported distance learning barriers, and return to school concerns during the first wave of the COVID-19 pandemic in Ontario, Canada.","authors":"Natalie Spadafora,&nbsp;Jade Wang,&nbsp;Caroline Reid-Westoby,&nbsp;Magdalena Janus","doi":"10.23889/ijpds.v7i4.1761","DOIUrl":"10.23889/ijpds.v7i4.1761","url":null,"abstract":"<p><strong>Introduction: </strong>Research to date has established that the COVID-19 pandemic has not impacted everyone equitably. Whether this unequitable impact was seen educationally with regards to educator reported barriers to distance learning, concerns and mental health is less clear.</p><p><strong>Objective: </strong>The objective of this study was to explore the association between the neighbourhood composition of the school and kindergarten educator-reported barriers and concerns regarding children's learning during the first wave of COVID-19 related school closures in Ontario, Canada.</p><p><strong>Methods: </strong>In the spring of 2020, we collected data from Ontario kindergarten educators (<i>n</i> = 2569; 74.2% kindergarten teachers, 25.8% early childhood educators; 97.6% female) using an online survey asking them about their experiences and challenges with online learning during the first round of school closures. We linked the educator responses to 2016 Canadian Census variables based on schools' postal codes. Bivariate correlations and Poisson regression analyses were used to determine if there was an association between neighbourhood composition and educator mental health, and the number of barriers and concerns reported by kindergarten educators.</p><p><strong>Results: </strong>There were no significant findings with educator mental health and school neighbourhood characteristics. Educators who taught at schools in neighbourhoods with lower median income reported a greater number of barriers to online learning (e.g., parents/guardians not submitting assignments/providing updates on their child's learning) and concerns regarding the return to school in the fall of 2020 (e.g., students' readjustment to routines). There were no significant associations with educator reported barriers or concerns and any of the other Census neighbourhood variables (proportion of lone parent families, average household size, proportion of population that do no speak official language, proportion of population that are recent immigrants, or proportion of population ages 0-4).</p><p><strong>Conclusions: </strong>Overall, our study suggests that the neighbourhood composition of the children's school location did not exacerbate the potential negative learning experiences of kindergarten students and educators during the COVID-19 pandemic, although we did find that educators teaching in schools in lower-SES neighbourhoods reported more barriers to online learning during this time. Taken together, our study suggests that remediation efforts should be focused on individual kindergarten children and their families as opposed to school location.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 4","pages":"1761"},"PeriodicalIF":0.0,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/88/13/ijpds-07-1761.PMC10170344.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9845521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microsimulation of an educational attainment register to predict future record linkage quality. 一个教育程度登记册的微观模拟,以预测未来的记录联动质量。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-04-03 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i1.2122
Rainer Schnell, Severin Weiand

Introduction: Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identifiers such as name, date of birth and sex. However, the data protection principle of data minimization aims to minimize the set of identifiers in databases.

Objectives: Therefore, the German Federal Ministry of Research and Education commissioned a study to inform legislation on the minimum set of identifiers required for a national educational register.

Methods: To justify our recommendations empirically, we implemented a microsimulation of about 20 million people. The simulated register accumulates changes and errors in identifiers due to migration, regional mobility, marriage, school career and mortality, thereby allowing the study of errors on longitudinal datasets. Updated records were linked yearly to the simulated register using several linkage methods. Clear-text methods as well as privacy-preserving (PPRL) methods were compared.

Results: The results indicate linkage bias if only the primary identifiers are available in the register. More detailed identifiers, including place of birth, are required to minimize linkage bias. The amount of information available to identify a person for matching is more critical for linkage quality than the record linkage method applied. Differences in linkage quality between the best procedures (probabilistic linkage and multiple matchkeys) are minor.

Conclusions: Microsimulation is a valuable tool for designing record linkage procedures. By modelling the processes resulting in changes or errors in quasi-identifiers, predicting data quality to be expected after the implementation of a register seems possible.

引言:全民教育程度登记对于教育规划和研究是必要的。建立和更新这样一个登记册需要定期链接数据库。如果没有唯一的国家身份号码,记录联系必须基于姓名、出生日期和性别等准标识符。然而,数据最小化的数据保护原则旨在最小化数据库中的标识符集。目标:因此,德国联邦研究和教育部委托进行了一项研究,为国家教育登记所需的最低标识符集的立法提供信息。方法:为了从经验上证明我们的建议,我们对大约2000万人进行了微观模拟。模拟登记册积累了移民、地区流动、婚姻、学校职业和死亡率导致的标识符变化和错误,从而可以研究纵向数据集上的错误。更新后的记录每年使用几种链接方法与模拟登记册进行链接。比较了明文方法和隐私保护(PPRL)方法。结果:如果寄存器中只有主要标识符可用,则结果表明存在链接偏差。需要更详细的标识符,包括出生地,以最大限度地减少联系偏差。与所应用的记录链接方法相比,可用于识别要匹配的人的信息量对链接质量更为关键。最佳程序(概率链接和多个匹配键)之间的链接质量差异很小。结论:微模拟是设计记录连接程序的一种有价值的工具。通过对导致准标识符变化或错误的过程进行建模,预测寄存器实施后的预期数据质量似乎是可能的。
{"title":"Microsimulation of an educational attainment register to predict future record linkage quality.","authors":"Rainer Schnell,&nbsp;Severin Weiand","doi":"10.23889/ijpds.v8i1.2122","DOIUrl":"10.23889/ijpds.v8i1.2122","url":null,"abstract":"<p><strong>Introduction: </strong>Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identifiers such as name, date of birth and sex. However, the data protection principle of data minimization aims to minimize the set of identifiers in databases.</p><p><strong>Objectives: </strong>Therefore, the German Federal Ministry of Research and Education commissioned a study to inform legislation on the minimum set of identifiers required for a national educational register.</p><p><strong>Methods: </strong>To justify our recommendations empirically, we implemented a microsimulation of about 20 million people. The simulated register accumulates changes and errors in identifiers due to migration, regional mobility, marriage, school career and mortality, thereby allowing the study of errors on longitudinal datasets. Updated records were linked yearly to the simulated register using several linkage methods. Clear-text methods as well as privacy-preserving (PPRL) methods were compared.</p><p><strong>Results: </strong>The results indicate linkage bias if only the primary identifiers are available in the register. More detailed identifiers, including place of birth, are required to minimize linkage bias. The amount of information available to identify a person for matching is more critical for linkage quality than the record linkage method applied. Differences in linkage quality between the best procedures (probabilistic linkage and multiple matchkeys) are minor.</p><p><strong>Conclusions: </strong>Microsimulation is a valuable tool for designing record linkage procedures. By modelling the processes resulting in changes or errors in quasi-identifiers, predicting data quality to be expected after the implementation of a register seems possible.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"2122"},"PeriodicalIF":0.0,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/26/51/ijpds-08-2122.PMC10463005.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10157692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Record linkage for routinely collected health data in an African health information exchange 非洲卫生信息交流中常规收集的卫生数据的记录联系
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-02-28 DOI: 10.23889/ijpds.v8i1.1771
T. Mutemaringa, A. Heekes, Mariette Smith, A. Boulle, Nicki Tiffin
Abstract Introduction The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages. Aim This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date. Methods We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI. Results The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID. The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.
摘要简介患者主索引(PMI)在患者信息管理和流行病学研究中发挥着重要作用,唯一患者标识符的可用性提高了跨不同数据集链接患者记录的准确性。然而,在我们的环境中,包含患者信息的所有数据集中很少存在唯一标识符。准标识符用于尝试链接患者记录,但有时存在更高的过度链接风险。因此,数据的质量和完整性会影响建立正确联系的能力。目的本文介绍了目前在南非西开普省卫生数据中心(PHDC)实施的记录链接系统,并评估了其迄今为止的产出。方法我们应用逐步确定的记录链接方法来链接从南非西开普省卫生信息系统常规收集的患者数据。链接过程中使用的变量包括南非国民身份号码(RSA ID)、出生日期、出生年份、出生月份、出生日、居住地址和联系信息。描述性分析用于估计省级采购经理人指数的重复水平和程度。结果省级采购经理人指数中重复的比例在10%至20%之间。重复主要是由拼写错误引起的,姓氏和名字的错误最多,大约22%的重复中同一个人的名字和姓氏不同。RSA ID是一个主要受完整性差影响的变量,只有不到30%的记录具有RSA ID。当前的链接算法需要改进,因为它使用了在英语化名称上开发和验证的算法,而这些算法可能不适用于本地名称。链接还受到与数据质量相关的问题的影响,这些问题与数据的常规性质有关,这往往使数据捕获时难以验证和强制执行完整性。
{"title":"Record linkage for routinely collected health data in an African health information exchange","authors":"T. Mutemaringa, A. Heekes, Mariette Smith, A. Boulle, Nicki Tiffin","doi":"10.23889/ijpds.v8i1.1771","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.1771","url":null,"abstract":"Abstract Introduction The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages. Aim This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date. Methods We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI. Results The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID. The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41495413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Record linkage for routinely collected health data in an African health information exchange. 非洲卫生信息交换中心常规收集的健康数据的记录链接。
IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-02-28 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v6i1.1771
Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin

Introduction: The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.

Aim: This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.

Methods: We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.

Results: The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.

简介病人主索引(PMI)在病人信息管理和流行病学研究中发挥着重要作用,病人唯一标识符的可用性提高了不同数据集之间病人记录链接的准确性。然而,在我们的环境中,包含病人信息的所有数据集中很少有唯一的标识符。准标识符被用来尝试链接病人记录,但有时会带来更高的过度链接风险。因此,数据质量和完整性会影响正确链接的能力。目的:本文介绍了目前在南非西开普省卫生数据中心(PHDC)实施的病历链接系统,并对其迄今为止的产出进行了评估:方法:我们采用逐步确定性记录关联方法,将南非西开普省卫生信息系统中定期收集的患者数据关联起来。链接过程中使用的变量包括南非身份证号码(RSA ID)、出生日期、出生年份、出生月份、出生日期、居住地址和联系方式。描述性分析用于估计省级 PMI 中重复的程度和范围:结果:省级人口普查中的重复比例在 10%至 20%之间。重复的主要原因是拼写错误,而姓氏和名字的错误占大多数,在大约22%的重复中,同一个人的名字和姓氏是不同的。目前的链接算法需要改进,因为它所使用的算法是针对英国化姓名开发和验证的,而对本地姓名可能效果不佳。链接还受到数据质量相关问题的影响,这些问题与数据的常规性质有关,通常很难在数据采集时验证和执行完整性。
{"title":"Record linkage for routinely collected health data in an African health information exchange.","authors":"Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin","doi":"10.23889/ijpds.v6i1.1771","DOIUrl":"10.23889/ijpds.v6i1.1771","url":null,"abstract":"<p><strong>Introduction: </strong>The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.</p><p><strong>Aim: </strong>This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.</p><p><strong>Methods: </strong>We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.</p><p><strong>Results: </strong>The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"1771"},"PeriodicalIF":1.6,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/83/ijpds-08-1771.PMC10448229.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10250795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Everybody's talking about equity, but is anyone really listening?: The case for better data-driven learning in health systems. 人人都在谈论公平,但真的有人在听吗?在卫生系统中更好地以数据为导向进行学习。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-02-22 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v5i4.2125
Nakia K Lee-Foon, Robert J Reid

Data collection, analysis, and data driven action cycles have been viewed as vital components of healthcare for decades. Throughout the COVID-19 pandemic, case incidence and mortality data have consistently been used by various levels of governments and health institutions to inform pandemic strategies and service distribution. However, these responses are often inequitable, underscoring pre-existing healthcare disparities faced by marginalized populations. This has prompted governments to finally face these disparities and find ways to quickly deliver more equitable pandemic support. These rapid data informed supports proved that learning health systems (LHS) could be quickly mobilized and effectively used to develop healthcare actions that delivered healthcare interventions that matched diverse populations' needs in equitable and affordable ways. Within LHS, data are viewed as a starting point researchers can use to inform practice and subsequent research. Despite this innovative approach, the quality and depth of data collection and robust analyses varies throughout healthcare, with data lacking across the quadruple aims. Often, large data gaps pertaining to community socio-demographics, patient perceptions of healthcare quality and the social determinants of health exist. This prevents a robust understanding of the healthcare landscape, leaving marginalized populations uncounted and at the sidelines of improvement efforts. These gaps are often viewed by researchers as indication that more data is needed rather than an opportunity to critically analyze and iteratively learn from multiple sources of pre-existing data. This continued cycle of data collection and analysis leaves one to wonder if healthcare has a data problem or a learning problem. In this commentary, we discuss ways healthcare data are often used and how LHS disrupts this cycle, turning data into learning opportunities that inform healthcare practice and future research in real time. We conclude by proposing several ways to make learning from data just as important as the data itself.

几十年来,数据收集、分析和数据驱动的行动周期一直被视为医疗保健的重要组成部分。在 COVID-19 大流行期间,各级政府和医疗机构一直在使用病例发生率和死亡率数据,为大流行战略和服务分配提供依据。然而,这些应对措施往往是不公平的,凸显了边缘化人群所面临的原有医疗差距。这促使各国政府最终正视这些差距,并想方设法迅速提供更公平的大流行病支持。这些快速的数据支持证明,学习型医疗系统(LHS)可以被迅速动员起来,并有效地用于制定医疗保健行动,以公平、可负担的方式提供符合不同人群需求的医疗保健干预措施。在学习型保健系统中,数据被视为研究人员可用于指导实践和后续研究的起点。尽管采用了这一创新方法,但在整个医疗保健领域,数据收集和可靠分析的质量和深度各不相同,在四重目标方面缺乏数据。通常情况下,在社区社会人口统计、患者对医疗质量的看法以及健康的社会决定因素等方面存在巨大的数据缺口。这妨碍了人们对医疗保健状况的深入了解,使边缘化人群未被计算在内,处于改进工作的边缘。研究人员通常将这些差距视为需要更多数据的迹象,而不是批判性分析和迭代学习多种已有数据来源的机会。这种持续的数据收集和分析循环让人不禁怀疑,医疗保健究竟是存在数据问题还是学习问题。在这篇评论中,我们将讨论医疗保健数据的使用方式,以及 LHS 如何打破这种循环,将数据转化为学习机会,为医疗保健实践和未来研究提供实时信息。最后,我们提出了几种方法,使从数据中学习与数据本身同等重要。
{"title":"Everybody's talking about equity, but is anyone really listening?: The case for better data-driven learning in health systems.","authors":"Nakia K Lee-Foon, Robert J Reid","doi":"10.23889/ijpds.v5i4.2125","DOIUrl":"10.23889/ijpds.v5i4.2125","url":null,"abstract":"<p><p>Data collection, analysis, and data driven action cycles have been viewed as vital components of healthcare for decades. Throughout the COVID-19 pandemic, case incidence and mortality data have consistently been used by various levels of governments and health institutions to inform pandemic strategies and service distribution. However, these responses are often inequitable, underscoring pre-existing healthcare disparities faced by marginalized populations. This has prompted governments to finally face these disparities and find ways to quickly deliver more equitable pandemic support. These rapid data informed supports proved that learning health systems (LHS) could be quickly mobilized and effectively used to develop healthcare actions that delivered healthcare interventions that matched diverse populations' needs in equitable and affordable ways. Within LHS, data are viewed as a starting point researchers can use to inform practice and subsequent research. Despite this innovative approach, the quality and depth of data collection and robust analyses varies throughout healthcare, with data lacking across the quadruple aims. Often, large data gaps pertaining to community socio-demographics, patient perceptions of healthcare quality and the social determinants of health exist. This prevents a robust understanding of the healthcare landscape, leaving marginalized populations uncounted and at the sidelines of improvement efforts. These gaps are often viewed by researchers as indication that more data is needed rather than an opportunity to critically analyze and iteratively learn from multiple sources of pre-existing data. This continued cycle of data collection and analysis leaves one to wonder if healthcare has a data problem or a learning problem. In this commentary, we discuss ways healthcare data are often used and how LHS disrupts this cycle, turning data into learning opportunities that inform healthcare practice and future research in real time. We conclude by proposing several ways to make learning from data just as important as the data itself.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"5 4","pages":"2125"},"PeriodicalIF":0.0,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/67/35/ijpds-08-2125.PMC10463004.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10159133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Population Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1