首页 > 最新文献

International Journal of Population Data Science最新文献

英文 中文
Lessons learned from using linked administrative data to evaluate the Family Nurse Partnership in England and Scotland. 利用关联行政数据评估英格兰和苏格兰家庭护士伙伴关系的经验教训。
IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-05-11 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i1.2113
Francesca L Cavallaro, Rebecca Cannings-John, Fiona Lugg-Widger, Ruth Gilbert, Eilis Kennedy, Sally Kendall, Michael Robling, Katie L Harron

Introduction: "Big data" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.

Objectives: We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.

Methods: We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.

Results: Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.

Conclusions: Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.

导言:"大数据"(包括关联的行政数据)可用于评估妇幼保健干预措施,为随机对照试验提供时间和成本效益上的替代方案。然而,利用这些数据来评估人口层面的干预措施可能具有挑战性:我们的目的是通过描述偏倚来源、经验教训和改进建议,为未来复杂干预措施的评估提供信息。我们基于两项观察性研究,使用来自卫生、教育和社会护理部门的关联行政数据,对英格兰和苏格兰的家庭护士伙伴关系(FNP)进行了评估:我们首先考虑了行政数据中不同来源的潜在偏差会如何影响评估结果。我们探讨了每项研究设计如何利用数据中的孕产妇混杂因素来解决这些偏差来源。然后,我们确定了在复杂干预的每个步骤中还可以获取哪些信息,以使分析人员能够最大限度地减少偏差,并最大限度地提高干预组和常规护理组之间的可比性,从而将观察到的任何差异归因于干预:总结出的经验包括:i) 需要有关干预活动(日期/地理位置)和常规护理的详细数据;ii) 改进有关数据链接质量的信息,以准确描述对照组的特征;iii) 更有效地提供链接数据,以确保结果的及时性;iv) 更好地测量影响合格者、接触者和注册者的混杂特征:链接的行政数据是评估 FNP 国家计划和其他复杂的人口干预措施的宝贵资源。然而,还需要有关当地计划实施和常规护理的信息,以考虑到接受干预者的特征偏差,并为了解效果机制提供信息。如果能将计划的实施与改进国家和地方数据收集工作以及稳健的准实验设计结合起来,就更有可能对复杂的公共卫生评价进行全国性的、持续的、稳健的评价。
{"title":"Lessons learned from using linked administrative data to evaluate the Family Nurse Partnership in England and Scotland.","authors":"Francesca L Cavallaro, Rebecca Cannings-John, Fiona Lugg-Widger, Ruth Gilbert, Eilis Kennedy, Sally Kendall, Michael Robling, Katie L Harron","doi":"10.23889/ijpds.v8i1.2113","DOIUrl":"10.23889/ijpds.v8i1.2113","url":null,"abstract":"<p><strong>Introduction: </strong>\"Big data\" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.</p><p><strong>Objectives: </strong>We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.</p><p><strong>Methods: </strong>We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.</p><p><strong>Results: </strong>Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.</p><p><strong>Conclusions: </strong>Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"2113"},"PeriodicalIF":1.6,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/fc/b1/ijpds-08-2113.PMC10476150.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10225318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sociodemographic inequalities of suicide: a population-based cohort study of adults in England and Wales 2011-2021 自杀的社会形态不平等:2011-2021年英格兰和威尔士成年人的一项基于人群的队列研究
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-04-06 DOI: 10.1101/2023.04.05.23288190
I. Ward, Katie Finning, D. Ayoubkhani, Katie Hendry, E. Sharland, Louis Appleby, V. Nafilyan
Background: Risk of suicide is complex and often a result of multiple interacting factors. It is vital research identifies predictors of suicide to provide a strong evidence base for targeted interventions. Methods: Using linked Census and population level mortality data we estimated rates of suicide across different groups in England and Wales and examine which factors are independently associated with the risk of suicide. Findings: The highest rates of suicide were amongst those who reported an impairment affecting their day-to-day activities, those who were long term unemployed or never had worked, or those who were single or separated. Rates of suicide were highest in the White and Mixed/multiple ethnic groups compared to other ethnicities, and in people who reported a religious affiliation compared with those who had no religion. Comparison of minimally adjusted models (predictor, sex and age) with fully-adjusted models (sex, age, ethnicity, region, partnership status, religious affiliation, day-to-day impairments, armed forces membership and socioeconomic status) identified key predictors which remain important risk factors after accounting for other characteristics; day-to-day impairments were still found to increase the incidence of suicide relative to those whose activities were not impaired after adjusting for employment status. Overall, rates of suicide were higher in men compared to females across all ages, with the highest rates in 40-to-50-year-olds. Interpretation: The findings of this work provide novel population level insights into the risk of suicide by sociodemographic characteristics. Understanding the interaction between key risk factors for suicide has important implications for national suicide prevention strategies.
背景:自杀风险是复杂的,往往是多种相互作用因素的结果。至关重要的是,研究确定自杀的预测因素,为有针对性的干预措施提供强有力的证据基础。方法:使用关联的人口普查和人口水平死亡率数据,我们估计了英格兰和威尔士不同群体的自杀率,并检查了哪些因素与自杀风险独立相关。调查结果:自杀率最高的是那些报告日常活动受到影响的人、长期失业或从未工作过的人、单身或分居的人。与其他种族相比,白人和混合/多族裔群体的自杀率最高,与无宗教信仰的人相比,有宗教信仰的人群的自杀率也最高。将最低调整模型(预测因子、性别和年龄)与完全调整模型(性别、年龄、种族、地区、伙伴关系、宗教信仰、日常损伤、武装部队成员和社会经济地位)进行比较,确定了在考虑其他特征后仍然是重要风险因素的关键预测因子;在调整就业状况后,与那些活动没有受损的人相比,日常损伤仍然会增加自杀的发生率。总体而言,在所有年龄段,男性的自杀率都高于女性,40-50岁的自杀率最高。解读:这项工作的发现为社会人口学特征带来的自杀风险提供了新的人群层面的见解。了解自杀的关键风险因素之间的相互作用对国家自杀预防战略具有重要意义。
{"title":"Sociodemographic inequalities of suicide: a population-based cohort study of adults in England and Wales 2011-2021","authors":"I. Ward, Katie Finning, D. Ayoubkhani, Katie Hendry, E. Sharland, Louis Appleby, V. Nafilyan","doi":"10.1101/2023.04.05.23288190","DOIUrl":"https://doi.org/10.1101/2023.04.05.23288190","url":null,"abstract":"Background: Risk of suicide is complex and often a result of multiple interacting factors. It is vital research identifies predictors of suicide to provide a strong evidence base for targeted interventions. Methods: Using linked Census and population level mortality data we estimated rates of suicide across different groups in England and Wales and examine which factors are independently associated with the risk of suicide. Findings: The highest rates of suicide were amongst those who reported an impairment affecting their day-to-day activities, those who were long term unemployed or never had worked, or those who were single or separated. Rates of suicide were highest in the White and Mixed/multiple ethnic groups compared to other ethnicities, and in people who reported a religious affiliation compared with those who had no religion. Comparison of minimally adjusted models (predictor, sex and age) with fully-adjusted models (sex, age, ethnicity, region, partnership status, religious affiliation, day-to-day impairments, armed forces membership and socioeconomic status) identified key predictors which remain important risk factors after accounting for other characteristics; day-to-day impairments were still found to increase the incidence of suicide relative to those whose activities were not impaired after adjusting for employment status. Overall, rates of suicide were higher in men compared to females across all ages, with the highest rates in 40-to-50-year-olds. Interpretation: The findings of this work provide novel population level insights into the risk of suicide by sociodemographic characteristics. Understanding the interaction between key risk factors for suicide has important implications for national suicide prevention strategies.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47342216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association between neighbourhood composition, kindergarten educator-reported distance learning barriers, and return to school concerns during the first wave of the COVID-19 pandemic in Ontario, Canada. 在加拿大安大略省新冠肺炎第一波疫情期间,社区构成、幼儿园教育者报告的远程学习障碍和返校问题之间的关联。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-04-04 eCollection Date: 2022-01-01 DOI: 10.23889/ijpds.v7i4.1761
Natalie Spadafora, Jade Wang, Caroline Reid-Westoby, Magdalena Janus

Introduction: Research to date has established that the COVID-19 pandemic has not impacted everyone equitably. Whether this unequitable impact was seen educationally with regards to educator reported barriers to distance learning, concerns and mental health is less clear.

Objective: The objective of this study was to explore the association between the neighbourhood composition of the school and kindergarten educator-reported barriers and concerns regarding children's learning during the first wave of COVID-19 related school closures in Ontario, Canada.

Methods: In the spring of 2020, we collected data from Ontario kindergarten educators (n = 2569; 74.2% kindergarten teachers, 25.8% early childhood educators; 97.6% female) using an online survey asking them about their experiences and challenges with online learning during the first round of school closures. We linked the educator responses to 2016 Canadian Census variables based on schools' postal codes. Bivariate correlations and Poisson regression analyses were used to determine if there was an association between neighbourhood composition and educator mental health, and the number of barriers and concerns reported by kindergarten educators.

Results: There were no significant findings with educator mental health and school neighbourhood characteristics. Educators who taught at schools in neighbourhoods with lower median income reported a greater number of barriers to online learning (e.g., parents/guardians not submitting assignments/providing updates on their child's learning) and concerns regarding the return to school in the fall of 2020 (e.g., students' readjustment to routines). There were no significant associations with educator reported barriers or concerns and any of the other Census neighbourhood variables (proportion of lone parent families, average household size, proportion of population that do no speak official language, proportion of population that are recent immigrants, or proportion of population ages 0-4).

Conclusions: Overall, our study suggests that the neighbourhood composition of the children's school location did not exacerbate the potential negative learning experiences of kindergarten students and educators during the COVID-19 pandemic, although we did find that educators teaching in schools in lower-SES neighbourhoods reported more barriers to online learning during this time. Taken together, our study suggests that remediation efforts should be focused on individual kindergarten children and their families as opposed to school location.

简介:迄今为止的研究表明,新冠肺炎大流行并没有公平地影响到每个人。这种不公平的影响是否在教育上被视为教育工作者报告的远程学习障碍、担忧和心理健康,目前尚不清楚。目的:本研究的目的是探讨在加拿大安大略省第一波新冠肺炎相关学校关闭期间,学校和幼儿园教育者报告的障碍与儿童学习问题之间的社区构成之间的关系。方法:2020年春季,我们通过在线调查收集了安大略省幼儿园教育工作者(n=2569;74.2%的幼儿园教师,25.8%的幼儿教育工作者;97.6%的女性)的数据,询问他们在第一轮学校关闭期间在线学习的经历和挑战。我们根据学校的邮政编码将教育工作者的反应与2016年加拿大人口普查变量联系起来。使用双变量相关性和泊松回归分析来确定邻里构成与教育者心理健康之间是否存在关联,以及幼儿园教育者报告的障碍和担忧的数量。结果:在教育者心理健康和学校邻里特征方面没有显著的发现。在中等收入较低社区的学校任教的教育工作者报告说,在线学习存在更多障碍(例如,父母/监护人没有提交作业/提供孩子学习的最新情况),并对2020年秋季返校表示担忧(例如,学生对日常生活的调整)。与教育工作者报告的障碍或担忧以及任何其他人口普查邻里变量(单亲家庭比例、平均家庭规模、不会说官方语言的人口比例、新移民人口比例或0-4岁人口比例)没有显著关联,我们的研究表明,在新冠肺炎大流行期间,儿童学校所在地的社区构成并没有加剧幼儿园学生和教育工作者的潜在负面学习体验,尽管我们确实发现,在社会经济地位较低的社区教学的教育工作者报告称,在此期间,在线学习面临更多障碍。总之,我们的研究表明,补救工作应侧重于幼儿园儿童及其家庭,而不是学校所在地。
{"title":"Association between neighbourhood composition, kindergarten educator-reported distance learning barriers, and return to school concerns during the first wave of the COVID-19 pandemic in Ontario, Canada.","authors":"Natalie Spadafora,&nbsp;Jade Wang,&nbsp;Caroline Reid-Westoby,&nbsp;Magdalena Janus","doi":"10.23889/ijpds.v7i4.1761","DOIUrl":"10.23889/ijpds.v7i4.1761","url":null,"abstract":"<p><strong>Introduction: </strong>Research to date has established that the COVID-19 pandemic has not impacted everyone equitably. Whether this unequitable impact was seen educationally with regards to educator reported barriers to distance learning, concerns and mental health is less clear.</p><p><strong>Objective: </strong>The objective of this study was to explore the association between the neighbourhood composition of the school and kindergarten educator-reported barriers and concerns regarding children's learning during the first wave of COVID-19 related school closures in Ontario, Canada.</p><p><strong>Methods: </strong>In the spring of 2020, we collected data from Ontario kindergarten educators (<i>n</i> = 2569; 74.2% kindergarten teachers, 25.8% early childhood educators; 97.6% female) using an online survey asking them about their experiences and challenges with online learning during the first round of school closures. We linked the educator responses to 2016 Canadian Census variables based on schools' postal codes. Bivariate correlations and Poisson regression analyses were used to determine if there was an association between neighbourhood composition and educator mental health, and the number of barriers and concerns reported by kindergarten educators.</p><p><strong>Results: </strong>There were no significant findings with educator mental health and school neighbourhood characteristics. Educators who taught at schools in neighbourhoods with lower median income reported a greater number of barriers to online learning (e.g., parents/guardians not submitting assignments/providing updates on their child's learning) and concerns regarding the return to school in the fall of 2020 (e.g., students' readjustment to routines). There were no significant associations with educator reported barriers or concerns and any of the other Census neighbourhood variables (proportion of lone parent families, average household size, proportion of population that do no speak official language, proportion of population that are recent immigrants, or proportion of population ages 0-4).</p><p><strong>Conclusions: </strong>Overall, our study suggests that the neighbourhood composition of the children's school location did not exacerbate the potential negative learning experiences of kindergarten students and educators during the COVID-19 pandemic, although we did find that educators teaching in schools in lower-SES neighbourhoods reported more barriers to online learning during this time. Taken together, our study suggests that remediation efforts should be focused on individual kindergarten children and their families as opposed to school location.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 4","pages":"1761"},"PeriodicalIF":0.0,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/88/13/ijpds-07-1761.PMC10170344.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9845521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microsimulation of an educational attainment register to predict future record linkage quality. 一个教育程度登记册的微观模拟,以预测未来的记录联动质量。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-04-03 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i1.2122
Rainer Schnell, Severin Weiand

Introduction: Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identifiers such as name, date of birth and sex. However, the data protection principle of data minimization aims to minimize the set of identifiers in databases.

Objectives: Therefore, the German Federal Ministry of Research and Education commissioned a study to inform legislation on the minimum set of identifiers required for a national educational register.

Methods: To justify our recommendations empirically, we implemented a microsimulation of about 20 million people. The simulated register accumulates changes and errors in identifiers due to migration, regional mobility, marriage, school career and mortality, thereby allowing the study of errors on longitudinal datasets. Updated records were linked yearly to the simulated register using several linkage methods. Clear-text methods as well as privacy-preserving (PPRL) methods were compared.

Results: The results indicate linkage bias if only the primary identifiers are available in the register. More detailed identifiers, including place of birth, are required to minimize linkage bias. The amount of information available to identify a person for matching is more critical for linkage quality than the record linkage method applied. Differences in linkage quality between the best procedures (probabilistic linkage and multiple matchkeys) are minor.

Conclusions: Microsimulation is a valuable tool for designing record linkage procedures. By modelling the processes resulting in changes or errors in quasi-identifiers, predicting data quality to be expected after the implementation of a register seems possible.

引言:全民教育程度登记对于教育规划和研究是必要的。建立和更新这样一个登记册需要定期链接数据库。如果没有唯一的国家身份号码,记录联系必须基于姓名、出生日期和性别等准标识符。然而,数据最小化的数据保护原则旨在最小化数据库中的标识符集。目标:因此,德国联邦研究和教育部委托进行了一项研究,为国家教育登记所需的最低标识符集的立法提供信息。方法:为了从经验上证明我们的建议,我们对大约2000万人进行了微观模拟。模拟登记册积累了移民、地区流动、婚姻、学校职业和死亡率导致的标识符变化和错误,从而可以研究纵向数据集上的错误。更新后的记录每年使用几种链接方法与模拟登记册进行链接。比较了明文方法和隐私保护(PPRL)方法。结果:如果寄存器中只有主要标识符可用,则结果表明存在链接偏差。需要更详细的标识符,包括出生地,以最大限度地减少联系偏差。与所应用的记录链接方法相比,可用于识别要匹配的人的信息量对链接质量更为关键。最佳程序(概率链接和多个匹配键)之间的链接质量差异很小。结论:微模拟是设计记录连接程序的一种有价值的工具。通过对导致准标识符变化或错误的过程进行建模,预测寄存器实施后的预期数据质量似乎是可能的。
{"title":"Microsimulation of an educational attainment register to predict future record linkage quality.","authors":"Rainer Schnell,&nbsp;Severin Weiand","doi":"10.23889/ijpds.v8i1.2122","DOIUrl":"10.23889/ijpds.v8i1.2122","url":null,"abstract":"<p><strong>Introduction: </strong>Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identifiers such as name, date of birth and sex. However, the data protection principle of data minimization aims to minimize the set of identifiers in databases.</p><p><strong>Objectives: </strong>Therefore, the German Federal Ministry of Research and Education commissioned a study to inform legislation on the minimum set of identifiers required for a national educational register.</p><p><strong>Methods: </strong>To justify our recommendations empirically, we implemented a microsimulation of about 20 million people. The simulated register accumulates changes and errors in identifiers due to migration, regional mobility, marriage, school career and mortality, thereby allowing the study of errors on longitudinal datasets. Updated records were linked yearly to the simulated register using several linkage methods. Clear-text methods as well as privacy-preserving (PPRL) methods were compared.</p><p><strong>Results: </strong>The results indicate linkage bias if only the primary identifiers are available in the register. More detailed identifiers, including place of birth, are required to minimize linkage bias. The amount of information available to identify a person for matching is more critical for linkage quality than the record linkage method applied. Differences in linkage quality between the best procedures (probabilistic linkage and multiple matchkeys) are minor.</p><p><strong>Conclusions: </strong>Microsimulation is a valuable tool for designing record linkage procedures. By modelling the processes resulting in changes or errors in quasi-identifiers, predicting data quality to be expected after the implementation of a register seems possible.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"2122"},"PeriodicalIF":0.0,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/26/51/ijpds-08-2122.PMC10463005.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10157692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Record linkage for routinely collected health data in an African health information exchange 非洲卫生信息交流中常规收集的卫生数据的记录联系
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-02-28 DOI: 10.23889/ijpds.v8i1.1771
T. Mutemaringa, A. Heekes, Mariette Smith, A. Boulle, Nicki Tiffin
Abstract Introduction The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages. Aim This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date. Methods We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI. Results The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID. The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.
摘要简介患者主索引(PMI)在患者信息管理和流行病学研究中发挥着重要作用,唯一患者标识符的可用性提高了跨不同数据集链接患者记录的准确性。然而,在我们的环境中,包含患者信息的所有数据集中很少存在唯一标识符。准标识符用于尝试链接患者记录,但有时存在更高的过度链接风险。因此,数据的质量和完整性会影响建立正确联系的能力。目的本文介绍了目前在南非西开普省卫生数据中心(PHDC)实施的记录链接系统,并评估了其迄今为止的产出。方法我们应用逐步确定的记录链接方法来链接从南非西开普省卫生信息系统常规收集的患者数据。链接过程中使用的变量包括南非国民身份号码(RSA ID)、出生日期、出生年份、出生月份、出生日、居住地址和联系信息。描述性分析用于估计省级采购经理人指数的重复水平和程度。结果省级采购经理人指数中重复的比例在10%至20%之间。重复主要是由拼写错误引起的,姓氏和名字的错误最多,大约22%的重复中同一个人的名字和姓氏不同。RSA ID是一个主要受完整性差影响的变量,只有不到30%的记录具有RSA ID。当前的链接算法需要改进,因为它使用了在英语化名称上开发和验证的算法,而这些算法可能不适用于本地名称。链接还受到与数据质量相关的问题的影响,这些问题与数据的常规性质有关,这往往使数据捕获时难以验证和强制执行完整性。
{"title":"Record linkage for routinely collected health data in an African health information exchange","authors":"T. Mutemaringa, A. Heekes, Mariette Smith, A. Boulle, Nicki Tiffin","doi":"10.23889/ijpds.v8i1.1771","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.1771","url":null,"abstract":"Abstract Introduction The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages. Aim This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date. Methods We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI. Results The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID. The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41495413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Record linkage for routinely collected health data in an African health information exchange. 非洲卫生信息交换中心常规收集的健康数据的记录链接。
IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-02-28 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v6i1.1771
Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin

Introduction: The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.

Aim: This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.

Methods: We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.

Results: The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.

简介病人主索引(PMI)在病人信息管理和流行病学研究中发挥着重要作用,病人唯一标识符的可用性提高了不同数据集之间病人记录链接的准确性。然而,在我们的环境中,包含病人信息的所有数据集中很少有唯一的标识符。准标识符被用来尝试链接病人记录,但有时会带来更高的过度链接风险。因此,数据质量和完整性会影响正确链接的能力。目的:本文介绍了目前在南非西开普省卫生数据中心(PHDC)实施的病历链接系统,并对其迄今为止的产出进行了评估:方法:我们采用逐步确定性记录关联方法,将南非西开普省卫生信息系统中定期收集的患者数据关联起来。链接过程中使用的变量包括南非身份证号码(RSA ID)、出生日期、出生年份、出生月份、出生日期、居住地址和联系方式。描述性分析用于估计省级 PMI 中重复的程度和范围:结果:省级人口普查中的重复比例在 10%至 20%之间。重复的主要原因是拼写错误,而姓氏和名字的错误占大多数,在大约22%的重复中,同一个人的名字和姓氏是不同的。目前的链接算法需要改进,因为它所使用的算法是针对英国化姓名开发和验证的,而对本地姓名可能效果不佳。链接还受到数据质量相关问题的影响,这些问题与数据的常规性质有关,通常很难在数据采集时验证和执行完整性。
{"title":"Record linkage for routinely collected health data in an African health information exchange.","authors":"Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin","doi":"10.23889/ijpds.v6i1.1771","DOIUrl":"10.23889/ijpds.v6i1.1771","url":null,"abstract":"<p><strong>Introduction: </strong>The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.</p><p><strong>Aim: </strong>This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.</p><p><strong>Methods: </strong>We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.</p><p><strong>Results: </strong>The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"1771"},"PeriodicalIF":1.6,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/83/ijpds-08-1771.PMC10448229.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10250795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Everybody's talking about equity, but is anyone really listening?: The case for better data-driven learning in health systems. 人人都在谈论公平,但真的有人在听吗?在卫生系统中更好地以数据为导向进行学习。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-02-22 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v5i4.2125
Nakia K Lee-Foon, Robert J Reid

Data collection, analysis, and data driven action cycles have been viewed as vital components of healthcare for decades. Throughout the COVID-19 pandemic, case incidence and mortality data have consistently been used by various levels of governments and health institutions to inform pandemic strategies and service distribution. However, these responses are often inequitable, underscoring pre-existing healthcare disparities faced by marginalized populations. This has prompted governments to finally face these disparities and find ways to quickly deliver more equitable pandemic support. These rapid data informed supports proved that learning health systems (LHS) could be quickly mobilized and effectively used to develop healthcare actions that delivered healthcare interventions that matched diverse populations' needs in equitable and affordable ways. Within LHS, data are viewed as a starting point researchers can use to inform practice and subsequent research. Despite this innovative approach, the quality and depth of data collection and robust analyses varies throughout healthcare, with data lacking across the quadruple aims. Often, large data gaps pertaining to community socio-demographics, patient perceptions of healthcare quality and the social determinants of health exist. This prevents a robust understanding of the healthcare landscape, leaving marginalized populations uncounted and at the sidelines of improvement efforts. These gaps are often viewed by researchers as indication that more data is needed rather than an opportunity to critically analyze and iteratively learn from multiple sources of pre-existing data. This continued cycle of data collection and analysis leaves one to wonder if healthcare has a data problem or a learning problem. In this commentary, we discuss ways healthcare data are often used and how LHS disrupts this cycle, turning data into learning opportunities that inform healthcare practice and future research in real time. We conclude by proposing several ways to make learning from data just as important as the data itself.

几十年来,数据收集、分析和数据驱动的行动周期一直被视为医疗保健的重要组成部分。在 COVID-19 大流行期间,各级政府和医疗机构一直在使用病例发生率和死亡率数据,为大流行战略和服务分配提供依据。然而,这些应对措施往往是不公平的,凸显了边缘化人群所面临的原有医疗差距。这促使各国政府最终正视这些差距,并想方设法迅速提供更公平的大流行病支持。这些快速的数据支持证明,学习型医疗系统(LHS)可以被迅速动员起来,并有效地用于制定医疗保健行动,以公平、可负担的方式提供符合不同人群需求的医疗保健干预措施。在学习型保健系统中,数据被视为研究人员可用于指导实践和后续研究的起点。尽管采用了这一创新方法,但在整个医疗保健领域,数据收集和可靠分析的质量和深度各不相同,在四重目标方面缺乏数据。通常情况下,在社区社会人口统计、患者对医疗质量的看法以及健康的社会决定因素等方面存在巨大的数据缺口。这妨碍了人们对医疗保健状况的深入了解,使边缘化人群未被计算在内,处于改进工作的边缘。研究人员通常将这些差距视为需要更多数据的迹象,而不是批判性分析和迭代学习多种已有数据来源的机会。这种持续的数据收集和分析循环让人不禁怀疑,医疗保健究竟是存在数据问题还是学习问题。在这篇评论中,我们将讨论医疗保健数据的使用方式,以及 LHS 如何打破这种循环,将数据转化为学习机会,为医疗保健实践和未来研究提供实时信息。最后,我们提出了几种方法,使从数据中学习与数据本身同等重要。
{"title":"Everybody's talking about equity, but is anyone really listening?: The case for better data-driven learning in health systems.","authors":"Nakia K Lee-Foon, Robert J Reid","doi":"10.23889/ijpds.v5i4.2125","DOIUrl":"10.23889/ijpds.v5i4.2125","url":null,"abstract":"<p><p>Data collection, analysis, and data driven action cycles have been viewed as vital components of healthcare for decades. Throughout the COVID-19 pandemic, case incidence and mortality data have consistently been used by various levels of governments and health institutions to inform pandemic strategies and service distribution. However, these responses are often inequitable, underscoring pre-existing healthcare disparities faced by marginalized populations. This has prompted governments to finally face these disparities and find ways to quickly deliver more equitable pandemic support. These rapid data informed supports proved that learning health systems (LHS) could be quickly mobilized and effectively used to develop healthcare actions that delivered healthcare interventions that matched diverse populations' needs in equitable and affordable ways. Within LHS, data are viewed as a starting point researchers can use to inform practice and subsequent research. Despite this innovative approach, the quality and depth of data collection and robust analyses varies throughout healthcare, with data lacking across the quadruple aims. Often, large data gaps pertaining to community socio-demographics, patient perceptions of healthcare quality and the social determinants of health exist. This prevents a robust understanding of the healthcare landscape, leaving marginalized populations uncounted and at the sidelines of improvement efforts. These gaps are often viewed by researchers as indication that more data is needed rather than an opportunity to critically analyze and iteratively learn from multiple sources of pre-existing data. This continued cycle of data collection and analysis leaves one to wonder if healthcare has a data problem or a learning problem. In this commentary, we discuss ways healthcare data are often used and how LHS disrupts this cycle, turning data into learning opportunities that inform healthcare practice and future research in real time. We conclude by proposing several ways to make learning from data just as important as the data itself.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"5 4","pages":"2125"},"PeriodicalIF":0.0,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/67/35/ijpds-08-2125.PMC10463004.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10159133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Student achievement trajectories in Ontario: Creating and validating a province-wide, multi-cohort and longitudinal database. 安大略省的学生成绩轨迹:创建并验证全省范围内的多队列纵向数据库。
IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-02-02 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i1.1843
Jeanne Sinclair, Scott Davies, Magdalena Janus

Introduction: Longitudinal data that tracks student achievement over many years are crucial for understanding children's learning and for guiding effective policies and interventions. Despite being Canada's most populous province, Ontario lacks such large-scale and longitudinal data on student learning. Linking datasets across cohorts requires rigorous linkage protocols, flexible handling of complex cohort structures, methods to validate linked datasets, and viable organizational partnerships. We linked administrative data on early child development and educational achievement and merged two datasets on characteristics of students' neighborhoods and schools. We developed a linkage protocol and validated how the resulting database could be generalized to Ontario's student population.

Methods and analysis: Two main individual-level data sources were linked: 1) the Early Development Instrument (EDI), a school readiness assessment of all Ontario public school kindergartners that is administered in three-year cycles, and 2) Ontario's Educational Quality and Assessment Office's (EQAO) math and reading assessments in grades 3, 6, 9, and 10. To compensate for their lack of a common personal identification number, a deterministic linkage process was developed using several administrative variables. A school-level and a neighborhood-level dataset were also later linked. We examined differences between unlinked and linked cases across several variables.

Results and implications: We successfully linked 50% of the EDI's 374,239 cases, 86,778 of which contained all five datapoints, creating a database tracking achievement for multiple cohorts from kindergarten through grade 10, with covariates for their development, demographics, affect, neighborhoods, and schools. Analyses revealed only negligible differences between linked and unlinked cases across several demographic measures, while small differences were detected across a neighborhood socioeconomic index and some measures of child development. In conclusion, we recommend the filling of key voids in sustainable research capacity by creating representative data through linkage protocols and data verification.

导言:追踪学生多年成绩的纵向数据对于了解儿童的学习情况以及指导有效的政策和干预措施至关重要。尽管安大略省是加拿大人口最多的省份,但却缺乏这种大规模的学生学习纵向数据。要将不同队列的数据集连接起来,需要严格的连接协议、灵活处理复杂的队列结构、验证连接数据集的方法以及可行的组织合作关系。我们链接了有关儿童早期发展和教育成就的行政数据,并合并了有关学生所在社区和学校特征的两个数据集。我们制定了一个链接协议,并验证了由此产生的数据库如何能够推广到安大略省的学生群体:我们连接了两个主要的个人层面数据源:1) 早期发展工具 (EDI),这是对所有安大略省公立学校幼儿园学生进行的入学准备评估,以三年为一个周期;以及 2) 安大略省教育质量和评估办公室 (EQAO) 对 3、6、9 和 10 年级学生进行的数学和阅读评估。为了弥补缺乏通用个人身份号码的缺陷,我们利用几个行政变量制定了一个确定性的联系程序。随后还链接了学校层面和社区层面的数据集。我们研究了未链接和已链接案例在多个变量上的差异:我们成功地链接了教育指标数据库中 374,239 个案例中的 50%,其中 86,778 个案例包含了所有五个数据点,从而创建了一个数据库,追踪从幼儿园到十年级多个组群的成绩,以及他们的发展、人口统计学、影响、邻里和学校等协变因素。分析表明,在几项人口统计学指标上,有关联和无关联案例之间的差异微乎其微,而在邻里社会经济指数和一些儿童发展指标上,则发现了微小的差异。总之,我们建议通过关联协议和数据验证来创建具有代表性的数据,从而填补可持续研究能力的关键空白。
{"title":"Student achievement trajectories in Ontario: Creating and validating a province-wide, multi-cohort and longitudinal database.","authors":"Jeanne Sinclair, Scott Davies, Magdalena Janus","doi":"10.23889/ijpds.v8i1.1843","DOIUrl":"10.23889/ijpds.v8i1.1843","url":null,"abstract":"<p><strong>Introduction: </strong>Longitudinal data that tracks student achievement over many years are crucial for understanding children's learning and for guiding effective policies and interventions. Despite being Canada's most populous province, Ontario lacks such large-scale and longitudinal data on student learning. Linking datasets across cohorts requires rigorous linkage protocols, flexible handling of complex cohort structures, methods to validate linked datasets, and viable organizational partnerships. We linked administrative data on early child development and educational achievement and merged two datasets on characteristics of students' neighborhoods and schools. We developed a linkage protocol and validated how the resulting database could be generalized to Ontario's student population.</p><p><strong>Methods and analysis: </strong>Two main individual-level data sources were linked: 1) the Early Development Instrument (EDI), a school readiness assessment of all Ontario public school kindergartners that is administered in three-year cycles, and 2) Ontario's Educational Quality and Assessment Office's (EQAO) math and reading assessments in grades 3, 6, 9, and 10. To compensate for their lack of a common personal identification number, a deterministic linkage process was developed using several administrative variables. A school-level and a neighborhood-level dataset were also later linked. We examined differences between unlinked and linked cases across several variables.</p><p><strong>Results and implications: </strong>We successfully linked 50% of the EDI's 374,239 cases, 86,778 of which contained all five datapoints, creating a database tracking achievement for multiple cohorts from kindergarten through grade 10, with covariates for their development, demographics, affect, neighborhoods, and schools. Analyses revealed only negligible differences between linked and unlinked cases across several demographic measures, while small differences were detected across a neighborhood socioeconomic index and some measures of child development. In conclusion, we recommend the filling of key voids in sustainable research capacity by creating representative data through linkage protocols and data verification.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"1843"},"PeriodicalIF":1.6,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/9b/ba/ijpds-08-1843.PMC10450363.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10111635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using administrative records to support the linkage of census data: protocol for building a longitudinal infrastructure of U.S. census records. 利用行政记录支持人口普查数据的链接:建立美国人口普查记录纵向基础设施的规程。
IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-01-11 eCollection Date: 2022-01-01 DOI: 10.23889/ijpds.v7i4.1764
J Trent Alexander, Katie R Genadek

This article describes the linkage methods that will be used in the Decennial Census Digitization and Linkage project (DCDL), which is completing the final four decades of a longitudinal census infrastructure covering the past 170 years of United States history. DCDL is digitizing and creating linkages between nearly a billion records across the 1960 through 1990 U.S. censuses, as well as to already-linked records from the censuses of 1940, 2000, 2010, and 2020. Our main goals in this article are to (1) describe the development of the DCDL and the protocol we will follow to build the linkages between the census files, (2) outline the techniques we will use to evaluate the quality of the links, and (3) show how the assignment and evaluation of these linkages leverages the joint use of routinely collected administrative data and non-routine survey data.

本文介绍了十年一次的人口普查数字化和链接项目(DCDL)将使用的链接方法,该项目正在完成涵盖美国过去 170 年历史的纵向人口普查基础设施的最后四十年。DCDL 正在对 1960 年至 1990 年美国人口普查的近十亿条记录以及 1940 年、2000 年、2010 年和 2020 年人口普查的已链接记录进行数字化并建立链接。我们在本文中的主要目标是:(1)介绍 DCDL 的开发以及我们将遵循的在普查档案之间建立链接的协议;(2)概述我们将用于评估链接质量的技术;以及(3)展示这些链接的分配和评估如何利用常规收集的行政数据和非日常调查数据的联合使用。
{"title":"Using administrative records to support the linkage of census data: protocol for building a longitudinal infrastructure of U.S. census records.","authors":"J Trent Alexander, Katie R Genadek","doi":"10.23889/ijpds.v7i4.1764","DOIUrl":"10.23889/ijpds.v7i4.1764","url":null,"abstract":"<p><p>This article describes the linkage methods that will be used in the Decennial Census Digitization and Linkage project (DCDL), which is completing the final four decades of a longitudinal census infrastructure covering the past 170 years of United States history. DCDL is digitizing and creating linkages between nearly a billion records across the 1960 through 1990 U.S. censuses, as well as to already-linked records from the censuses of 1940, 2000, 2010, and 2020. Our main goals in this article are to (1) describe the development of the DCDL and the protocol we will follow to build the linkages between the census files, (2) outline the techniques we will use to evaluate the quality of the links, and (3) show how the assignment and evaluation of these linkages leverages the joint use of routinely collected administrative data and non-routine survey data.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 4","pages":"1764"},"PeriodicalIF":1.6,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/9c/d6/ijpds-07-1764.PMC9869857.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9200879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thirty-three myths and misconceptions about population data: from data capture and processing to linkage. 关于人口数据的33个迷思和误解:从数据获取和处理到联系。
Q3 HEALTH CARE SCIENCES & SERVICES Pub Date : 2023-01-01 DOI: 10.23889/ijpds.v8i1.2115
Peter Christen, Rainer Schnell

Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population data can often only be unlocked when such data are linked to other databases. Record linkage often implies subtle technical problems, which are easily missed. We discuss a diverse range of myths and misconceptions relevant for anybody capturing, processing, linking, or analysing population data. Remarkably, many of these myths and misconceptions are due to the social nature of data collections and are therefore missed by purely technical accounts of data processing. Many are also not well documented in scientific publications. We conclude with a set of recommendations for using population data.

涵盖人口中所有个人的数据库越来越多地用于研究和决策。这类数据库的庞大规模常常被误认为是有效推断的保证。然而,人口数据的特点使其难以使用。通常会对人口覆盖范围和数据质量作出各种假设,包括如何获取这些数据以及对这些数据进行了何种处理。此外,人口数据的全部潜力往往只有在这些数据与其他数据库相联系时才能发挥出来。记录链接通常意味着微妙的技术问题,这些问题很容易被忽略。我们讨论了与捕获、处理、链接或分析人口数据相关的各种各样的神话和误解。值得注意的是,许多这些神话和误解是由于数据收集的社会性质,因此被数据处理的纯技术描述所忽略。许多也没有在科学出版物中得到很好的记录。最后,我们提出了一组使用人口数据的建议。
{"title":"Thirty-three myths and misconceptions about population data: from data capture and processing to linkage.","authors":"Peter Christen,&nbsp;Rainer Schnell","doi":"10.23889/ijpds.v8i1.2115","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2115","url":null,"abstract":"<p><p>Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population data can often only be unlocked when such data are linked to other databases. Record linkage often implies subtle technical problems, which are easily missed. We discuss a diverse range of myths and misconceptions relevant for anybody capturing, processing, linking, or analysing population data. Remarkably, many of these myths and misconceptions are due to the social nature of data collections and are therefore missed by purely technical accounts of data processing. Many are also not well documented in scientific publications. We conclude with a set of recommendations for using population data.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"2115"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b0/03/ijpds-08-2115.PMC10454001.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10503962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
International Journal of Population Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1