International Journal of Population Data Science最新文献_第5页

The application of population data linkage to capture sibling health outcomes among children and young adults with neurodevelopmental conditions. A scoping review. 人口数据链接的应用，以捕获兄弟姐妹的健康结果在儿童和年轻人的神经发育条件。范围审查。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-03-18 eCollection Date: 2025-01-01 DOI: 10.23889/ijpds.v10i1.2413

Caitlin Gray, Helen Leonard, Matthew N Cooper, Dheeraj Rai, Emma J Glasson

Introduction: Siblings of children with neurodevelopmental conditions have unique experiences and challenges related to their sibling role. Some develop mental health concerns as measured by self-reported surveys or parent report. Few data are available at the population level, owing to difficulties capturing wide-scale health data for siblings. Data linkage is a technique that can facilitate such research.

Objective: To explore the application of population data linkage as a research method to capture health outcomes of siblings of children with neurodevelopmental conditions.

Inclusion criteria: Peer reviewed papers that captured health outcomes for siblings of children and young adults with neurodevelopmental conditions using population data linkage.

Methods: JBI Scoping review methods were followed. Papers were searched within CINAHL, Ovid, Scopus, and Web of Science from 2000 to 2024 using search terms relating to 'data linkage' 'neurodevelopmental conditions' 'siblings' and 'health outcomes'.

Results: The final data extraction included 31 papers. The neurodevelopmental conditions of index children were autism, attention deficit hyperactivity disorder, intellectual disability, cerebral palsy and developmental delay. The mean follow-up time was 31 years, and the majority of studies originated from Scandinavia. Sibling health outcomes observed were psychiatric diagnoses, self-harm and suicide, other neurodevelopmental conditions, and medical conditions such as atopic disease, cancer and obesity.

Conclusion: Data linkage can help capture sibling health outcomes quickly across large cohorts with a range of neurodevelopmental conditions. Future research could be enhanced by focusing on siblings as the primary group of interest, increased integration of genealogical data, and comparisons between diagnostic groups and severity levels. Adoption of established rigorous reporting methods will increase the replicability of this type of research, and provide a stronger evidence-base from which to inform sibling supports.

儿童的兄弟姐妹与神经发育条件有独特的经验和挑战相关的兄弟姐妹的角色。根据自我报告的调查或家长报告，一些人出现了心理健康问题。由于难以获得兄弟姐妹的大规模健康数据，人口一级的数据很少。数据链接是一种可以促进这种研究的技术。目的：探讨将人口数据联动作为一种研究方法，捕捉神经发育障碍儿童兄弟姐妹的健康状况。纳入标准：同行评议的论文，利用人口数据链接捕获患有神经发育疾病的儿童和年轻人的兄弟姐妹的健康结果。方法：采用JBI范围审查方法。在2000年至2024年期间，在CINAHL、Ovid、Scopus和Web of Science中检索了与“数据链接”、“神经发育状况”、“兄弟姐妹”和“健康结果”相关的搜索词。结果：最终数据提取包括31篇论文。指数儿童的神经发育状况为自闭症、注意缺陷多动障碍、智力障碍、脑瘫和发育迟缓。平均随访时间为31年，大多数研究来自斯堪的纳维亚半岛。观察到的兄弟姐妹健康结果包括精神诊断、自残和自杀、其他神经发育状况，以及特应性疾病、癌症和肥胖等医疗状况。结论：数据链接可以帮助在具有一系列神经发育条件的大型队列中快速捕获兄弟姐妹的健康结果。未来的研究可以通过关注兄弟姐妹作为主要关注群体，增加家谱数据的整合以及诊断组和严重程度之间的比较来加强。采用既定的严格报告方法将增加这类研究的可复制性，并提供更有力的证据基础，以告知兄弟姐妹的支持。

{"title":"The application of population data linkage to capture sibling health outcomes among children and young adults with neurodevelopmental conditions. A scoping review.","authors":"Caitlin Gray, Helen Leonard, Matthew N Cooper, Dheeraj Rai, Emma J Glasson","doi":"10.23889/ijpds.v10i1.2413","DOIUrl":"10.23889/ijpds.v10i1.2413","url":null,"abstract":"Introduction: Siblings of children with neurodevelopmental conditions have unique experiences and challenges related to their sibling role. Some develop mental health concerns as measured by self-reported surveys or parent report. Few data are available at the population level, owing to difficulties capturing wide-scale health data for siblings. Data linkage is a technique that can facilitate such research.Objective: To explore the application of population data linkage as a research method to capture health outcomes of siblings of children with neurodevelopmental conditions.Inclusion criteria: Peer reviewed papers that captured health outcomes for siblings of children and young adults with neurodevelopmental conditions using population data linkage.Methods: JBI Scoping review methods were followed. Papers were searched within CINAHL, Ovid, Scopus, and Web of Science from 2000 to 2024 using search terms relating to 'data linkage' 'neurodevelopmental conditions' 'siblings' and 'health outcomes'.Results: The final data extraction included 31 papers. The neurodevelopmental conditions of index children were autism, attention deficit hyperactivity disorder, intellectual disability, cerebral palsy and developmental delay. The mean follow-up time was 31 years, and the majority of studies originated from Scandinavia. Sibling health outcomes observed were psychiatric diagnoses, self-harm and suicide, other neurodevelopmental conditions, and medical conditions such as atopic disease, cancer and obesity.Conclusion: Data linkage can help capture sibling health outcomes quickly across large cohorts with a range of neurodevelopmental conditions. Future research could be enhanced by focusing on siblings as the primary group of interest, increased integration of genealogical data, and comparisons between diagnostic groups and severity levels. Adoption of established rigorous reporting methods will increase the replicability of this type of research, and provide a stronger evidence-base from which to inform sibling supports.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2413"},"PeriodicalIF":1.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11923734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Note: Alternative Name Encodings - Using Jyutping or Pinyin as tonal representations of Chinese names for data linkage. 数据说明：备选名称编码-使用拼音或拼音作为数据链接的中文名称的音调表示。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-03-11 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i5.2935

Joseph Lam, Mario Cortina-Borja, Robert Aldridge, Ruth Blackburn, Katie Harron

Accurate data linkage across large administrative databases is crucial for addressing complex research and policy questions, yet linkage errors-stemming from inconsistent name representations-can introduce biases, predominantly for names not given in English. This data note examines the impact of romanisation on linkage accuracy, focusing on Chinese names and comparing standardised systems (Jyutping and Pinyin) with the non-standardised Hong Kong Government Cantonese Romanisation (HKG-romanisation). We identify three primary issues: language-specific variations in romanisation, the loss of tonal information inherent to tonal languages, and discrepancies in name order conventions. Using a dataset of 771 Hong Kong student names, our analysis reveals that standardised romanisation systems enhance the uniqueness and consistency of name representations, thereby improving linkage precision and recall compared to HKG-romanisation. Specifically, Jyutping and Pinyin achieved over 95% recall in blocking strategies, whereas HKG-romanisation only reached 68.8%. Incorporating tonal information further improved recall. These findings underscore the necessity of adopting standardised, tone-sensitive romanisation systems and flexible database designs to reduce linkage errors and promote data equity for under-represented groups. We advocate for the implementation of phonetic encodings in databases, alongside language-specific pre-processing protocols, to ensure more inclusive and accurate data linkage processes.

跨大型管理数据库的准确数据链接对于解决复杂的研究和政策问题至关重要，然而链接错误——源于不一致的名称表示——可能会引入偏见，主要是对于非英文名称。本数据记录考察了罗马化对链接准确性的影响，重点是中文名称，并比较了标准化系统（拼音和拼音）和非标准化的香港政府粤语罗马化（HKG-romanisation）。我们确定了三个主要问题：罗马化的语言特定变化，声调语言固有的音调信息的丢失，以及名称顺序约定的差异。使用771个香港学生姓名数据集，我们的分析显示，标准化罗马化系统提高了姓名表示的唯一性和一致性，从而提高了连接精度和召回率。在屏蔽策略中，拼字和拼音的召回率达到95%以上，而香港字母罗马化的召回率仅为68.8%。结合音调信息进一步提高了记忆力。这些发现强调了采用标准化、音调敏感的罗马化系统和灵活的数据库设计的必要性，以减少链接错误，促进代表性不足群体的数据公平。我们提倡在数据库中实施语音编码，同时采用特定语言的预处理协议，以确保更包容和准确的数据链接过程。

{"title":"Data Note: Alternative Name Encodings - Using Jyutping or Pinyin as tonal representations of Chinese names for data linkage.","authors":"Joseph Lam, Mario Cortina-Borja, Robert Aldridge, Ruth Blackburn, Katie Harron","doi":"10.23889/ijpds.v8i5.2935","DOIUrl":"10.23889/ijpds.v8i5.2935","url":null,"abstract":"Accurate data linkage across large administrative databases is crucial for addressing complex research and policy questions, yet linkage errors-stemming from inconsistent name representations-can introduce biases, predominantly for names not given in English. This data note examines the impact of romanisation on linkage accuracy, focusing on Chinese names and comparing standardised systems (Jyutping and Pinyin) with the non-standardised Hong Kong Government Cantonese Romanisation (HKG-romanisation). We identify three primary issues: language-specific variations in romanisation, the loss of tonal information inherent to tonal languages, and discrepancies in name order conventions. Using a dataset of 771 Hong Kong student names, our analysis reveals that standardised romanisation systems enhance the uniqueness and consistency of name representations, thereby improving linkage precision and recall compared to HKG-romanisation. Specifically, Jyutping and Pinyin achieved over 95% recall in blocking strategies, whereas HKG-romanisation only reached 68.8%. Incorporating tonal information further improved recall. These findings underscore the necessity of adopting standardised, tone-sensitive romanisation systems and flexible database designs to reduce linkage errors and promote data equity for under-represented groups. We advocate for the implementation of phonetic encodings in databases, alongside language-specific pre-processing protocols, to ensure more inclusive and accurate data linkage processes.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 5","pages":"2935"},"PeriodicalIF":1.6,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143616678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cohort Profile Update: Reflecting back and looking ahead: Updating the Comparative Outcomes and Service Utilization Trends (COAST) Study to include 28 years of linked data from people with and without HIV in British Columbia, Canada. 队列概况更新：回顾和展望：更新比较结果和服务利用趋势（COAST）研究，包括加拿大不列颠哥伦比亚省艾滋病毒感染者和非艾滋病毒感染者28年的相关数据。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-03-06 eCollection Date: 2025-01-01 DOI: 10.23889/ijpds.v10i1.2496

Michael O Budu, Katherine W Kooij, Kate Heath, Taylor McLinden, Claudette Cardinal, Scott D Emerson, Paul Sereda, Jason Trigg, Jenny Li, Erin Ding, Mark W Hull, Kate Salters, Viviane D Lima, Rolando Barrios, Julio S G Montaner, Robert S Hogg

Introduction: The Comparative Outcomes and Service Utilization Trends (COAST) study compares health outcomes among People With HIV (PWH) and People Without HIV (PWoH) in British Columbia (BC), Canada. The cohort was recently updated to include persons diagnosed with HIV after March 31, 2013, and expanded to broaden research applications.

Methods: COAST includes PWH and a 10% random sample of the general population without HIV, all aged ≥19. Our study links an HIV registry to healthcare practitioner billing, hospital and emergency department attendance data, prescription drug dispensations, and a cancer registry. Our cohort update included new sampling strategies, adding data on emergency department visits not previously captured, and extending our follow-up period to 28 years (from 1992 to 2020). COAST now includes 17,119 PWH and 615,264 PWoH.

Findings to date: COAST has contributed to our understanding of combination antiretroviral therapy (ART) use, health service utilization, chronic diseases, mental health and substance use disorders, and mortality among PWH in BC. Key findings include earlier age at diagnosis of certain chronic conditions, a higher incidence of mood disorders among PWH, and noteworthy shifts in causes of death among PWH on ART. The updated cohort will provide insights into the changing nature of the population living with HIV in BC and serves as a novel foundation for further research.

Future plans: To explore and extend knowledge of the evolving trends among people living and aging with HIV in BC, regular data linkage updates and the inclusion of additional datasets are scheduled every two years.

前言：比较结果和服务利用趋势（COAST）研究比较了加拿大不列颠哥伦比亚省（BC）艾滋病毒感染者（PWH）和非艾滋病毒感染者（PWoH）的健康结果。该队列最近进行了更新，纳入了2013年3月31日之后被诊断为艾滋病毒的人，并扩大了研究应用范围。方法：COAST包括PWH和10%的无HIV的普通人群，年龄≥19岁。我们的研究将HIV登记与医疗从业人员账单、医院和急诊科出诊数据、处方药配药和癌症登记联系起来。我们的队列更新包括新的抽样策略，增加了以前未捕获的急诊科就诊数据，并将随访期延长至28年（从1992年到2020年）。COAST现在包括17,119名PWH和615,264名PWoH。迄今为止的发现：COAST有助于我们了解BC省PWH中抗逆转录病毒联合治疗（ART）的使用、卫生服务的利用、慢性病、精神健康和物质使用障碍以及死亡率。主要发现包括诊断某些慢性疾病的年龄更早，PWH中情绪障碍的发生率更高，以及接受抗逆转录病毒治疗的PWH中死亡原因的显著变化。更新的队列将提供对BC省艾滋病毒感染者不断变化的性质的见解，并为进一步研究提供新的基础。未来计划：为了探索和扩展对不列颠哥伦比亚省艾滋病毒感染者和老年感染者不断变化趋势的了解，计划每两年定期更新数据链接并纳入额外的数据集。

{"title":"Cohort Profile Update: Reflecting back and looking ahead: Updating the Comparative Outcomes and Service Utilization Trends (COAST) Study to include 28 years of linked data from people with and without HIV in British Columbia, Canada.","authors":"Michael O Budu, Katherine W Kooij, Kate Heath, Taylor McLinden, Claudette Cardinal, Scott D Emerson, Paul Sereda, Jason Trigg, Jenny Li, Erin Ding, Mark W Hull, Kate Salters, Viviane D Lima, Rolando Barrios, Julio S G Montaner, Robert S Hogg","doi":"10.23889/ijpds.v10i1.2496","DOIUrl":"10.23889/ijpds.v10i1.2496","url":null,"abstract":"Introduction: The Comparative Outcomes and Service Utilization Trends (COAST) study compares health outcomes among People With HIV (PWH) and People Without HIV (PWoH) in British Columbia (BC), Canada. The cohort was recently updated to include persons diagnosed with HIV after March 31, 2013, and expanded to broaden research applications.Methods: COAST includes PWH and a 10% random sample of the general population without HIV, all aged ≥19. Our study links an HIV registry to healthcare practitioner billing, hospital and emergency department attendance data, prescription drug dispensations, and a cancer registry. Our cohort update included new sampling strategies, adding data on emergency department visits not previously captured, and extending our follow-up period to 28 years (from 1992 to 2020). COAST now includes 17,119 PWH and 615,264 PWoH.Findings to date: COAST has contributed to our understanding of combination antiretroviral therapy (ART) use, health service utilization, chronic diseases, mental health and substance use disorders, and mortality among PWH in BC. Key findings include earlier age at diagnosis of certain chronic conditions, a higher incidence of mood disorders among PWH, and noteworthy shifts in causes of death among PWH on ART. The updated cohort will provide insights into the changing nature of the population living with HIV in BC and serves as a novel foundation for further research.Future plans: To explore and extend knowledge of the evolving trends among people living and aging with HIV in BC, regular data linkage updates and the inclusion of additional datasets are scheduled every two years.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2496"},"PeriodicalIF":1.6,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11922098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143665089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using a deterministic matching computer routine to identify hospital episodes in a Brazilian de-identified administrative database for the analysis of obstetrics hospitalisations. 使用确定性匹配计算机程序在巴西去识别管理数据库中识别医院事件，用于分析产科住院情况。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-03-03 eCollection Date: 2025-01-01 DOI: 10.23889/ijpds.v10i1.2467

Claudia Medina Coeli, Rosa Maria Soares Madeira Domingues, Lana Meijinhos, Daniela Medina Coeli Bastos, Rejane Sobrino Pinheiro, Valeria Saraceni, Marcos Augusto Bastos Dias, Natália Santana Paiva, Kenneth Rochel de Camargo

Introduction: The absence of a unique patient identifier in the Brazilian hospital administrative database prevents the identification of hospital episodes with multiple hospitalisations of the same patient.

Objectives: This study aims to evaluate the information gain by using a computer routine to identify acute Obstetrics hospital episodes and its impact on assessing marks of case severity.

Methods: The data source was a de-identified Brazilian hospital administrative database from 2017 to 2020, including hospitalisations records of women of reproductive age (10 to 49 years old) for treating acute conditions (N=16,087,490). We processed this database by combining C++ and Python routines to create a hospital episodes database. From the latter, we selected obstetrics hospital episodes from 2018 to 2019 (N = 4,926,877). We compared selected characteristics of the hospital episodes according to their type (multiple vs single records per episode), testing for differences using effect size measures. We compared relative differences in case severity marks when using the hospital episode as the unit of analysis to that of isolated hospitalisations (N = 5,018,350).

Results: Compared to single-record episodes, multiple-records episodes had longer length of stay, higher amount reimbursed, and lower proportion of discharge alive. When comparing isolated hospitalisations to hospital episodes analysis, we observed an increase in all case severity indicators, especially for hospital deaths, with an increment of 13.15%. The computer routine decreased the hospital admissions with a reason for hospital discharge that did not indicate the outcome (hospital stay or inter-hospital transfer) from 2.29% to 0.73.

Conclusions: The deterministic matching computer routine proved valuable for identifying records that refer to the same hospital episode, which improved the assessment of severe cases.

简介：巴西医院管理数据库中缺乏唯一的患者标识符，因此无法识别同一患者多次住院的医院事件。目的：本研究旨在评估使用计算机常规识别产科医院急性发作的信息获取及其对评估病例严重程度标志的影响。方法：数据来源为2017年至2020年巴西医院管理数据库，包括育龄妇女（10至49岁）治疗急性疾病的住院记录（N=16,087,490）。我们通过结合c++和Python例程来处理这个数据库，创建了一个医院集数据库。从后者中，我们选择2018 - 2019年产科医院事件（N = 4,926,877）。我们根据类型比较了医院发作的选定特征（每次发作有多个或单个记录），使用效应量测量来检验差异。我们比较了使用医院事件作为分析单位与孤立住院的病例严重程度标记的相对差异（N = 5,018,350）。结果：与单病历相比，多病历的住院时间更长，报销金额更高，出院存活率更低。当将孤立住院与医院事件分析进行比较时，我们观察到所有病例严重程度指标的增加，特别是医院死亡，增加了13.15%。计算机程序将没有表明结果（住院或医院间转院）的出院原因的住院率从2.29%降低到0.73。结论：确定性匹配计算机程序在识别同一医院事件的记录方面证明是有价值的，这改善了重症病例的评估。

{"title":"Using a deterministic matching computer routine to identify hospital episodes in a Brazilian de-identified administrative database for the analysis of obstetrics hospitalisations.","authors":"Claudia Medina Coeli, Rosa Maria Soares Madeira Domingues, Lana Meijinhos, Daniela Medina Coeli Bastos, Rejane Sobrino Pinheiro, Valeria Saraceni, Marcos Augusto Bastos Dias, Natália Santana Paiva, Kenneth Rochel de Camargo","doi":"10.23889/ijpds.v10i1.2467","DOIUrl":"10.23889/ijpds.v10i1.2467","url":null,"abstract":"Introduction: The absence of a unique patient identifier in the Brazilian hospital administrative database prevents the identification of hospital episodes with multiple hospitalisations of the same patient.Objectives: This study aims to evaluate the information gain by using a computer routine to identify acute Obstetrics hospital episodes and its impact on assessing marks of case severity.Methods: The data source was a de-identified Brazilian hospital administrative database from 2017 to 2020, including hospitalisations records of women of reproductive age (10 to 49 years old) for treating acute conditions (N=16,087,490). We processed this database by combining C++ and Python routines to create a hospital episodes database. From the latter, we selected obstetrics hospital episodes from 2018 to 2019 (N = 4,926,877). We compared selected characteristics of the hospital episodes according to their type (multiple vs single records per episode), testing for differences using effect size measures. We compared relative differences in case severity marks when using the hospital episode as the unit of analysis to that of isolated hospitalisations (N = 5,018,350).Results: Compared to single-record episodes, multiple-records episodes had longer length of stay, higher amount reimbursed, and lower proportion of discharge alive. When comparing isolated hospitalisations to hospital episodes analysis, we observed an increase in all case severity indicators, especially for hospital deaths, with an increment of 13.15%. The computer routine decreased the hospital admissions with a reason for hospital discharge that did not indicate the outcome (hospital stay or inter-hospital transfer) from 2.29% to 0.73.Conclusions: The deterministic matching computer routine proved valuable for identifying records that refer to the same hospital episode, which improved the assessment of severe cases.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2467"},"PeriodicalIF":1.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Early child development in England: cross-sectional analysis of ASQ^®-3 records from the 2-2½-year universal health visiting review using national administrative data (Community Service Dataset, CSDS). 英国儿童早期发展：使用国家行政数据（社区服务数据集，CSDS）对2-2年半全民健康访问回顾中的ASQ®-3记录进行横断面分析。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-02-27 eCollection Date: 2024-01-01 DOI: 10.23889/ijpds.v9i2.2459

Jayu Jung, Sarah Cattan, Claire Powell, Jane Barlow, Mengyun Liu, Amanda Clery, Louise Mc Grath-Lone, Catherine Bunting, Jenny Woodman

Introduction: The Ages & Stages Questionnaire 3rd Edition (ASQ^®-3) is a tool to measure developmental delay for children aged between 1 - 66 months originally developed in the United States. This measure has been collected in England since 2015 as a part of mandated 2-2½-year health visiting reviews and collated nationally in the Community Services Dataset (CSDS). CSDS is known to be incomplete and to-date there have not been any published analyses of ASQ^®-3 held within CSDS.

Objectives: This study aimed to a) identify a subset of complete child development data for children aged two in England using ASQ^®-3 data in CSDS between 2018/19-2020/21; b) use this subset of data to analyse child development age 2-2½-years in England.

Methods: This study compared counts of ASQ^®-3 records in CSDS by local authority and financial quarter against national, publicly available Health Visitor Service Delivery Metrics (HVSDM) to identify local authorities with complete ASQ^®-3 records in CSDS. This study described child development in this subset of the data using both a binary cut-off of whether a child reached expected level of development and the continuous ASQ^®-3 score.

Results: Among the 226,505 children from 64 local authorities in the sample with complete ASQ^®-3 data, 86.2% met expected level of development. Children from the most deprived neighbourhoods (82.6%), children recorded as Black (78.9%), and boys (81.7%) were less likely to meet expected level of development.

Conclusions: To fully understand early child development across England, the completeness of ASQ^®-3 data in the CSDS requires improvement. Second, in order to interpret the national CSDS data on child development, ASQ^®-3 should be standardised and validated in an English context with attention paid to implementation and subsequent referral and support pathways. Our study provides a minimum estimate of children needing developmental support (13.8%), with many more children likely to be experiencing moderate or mild delay but not identified by the ASQ^®-3 cut-offs for expected development.

年龄与阶段问卷第三版（ASQ®-3）是一种工具，用于衡量1 - 66个月儿童的发育迟缓，最初在美国开发。这项措施自2015年以来一直在英格兰收集，作为强制性的2-2年半健康访问审查的一部分，并在全国范围内整理在社区服务数据集（CSDS）中。众所周知，CSDS是不完整的，迄今为止，在CSDS中还没有任何发表的ASQ®-3分析。目的：本研究旨在a)使用2018/19-2020/21年期间CSDS中的ASQ®-3数据确定英国两岁儿童的完整儿童发育数据子集；b)使用这部分数据分析英国2-2岁半儿童的发育情况。方法：本研究将地方当局和财政季度在CSDS中ASQ®-3记录的计数与国家公开的健康访问者服务交付指标（HVSDM）进行比较，以确定在CSDS中具有完整ASQ®-3记录的地方当局。本研究使用儿童是否达到预期发展水平和连续ASQ®-3评分的二元截止值来描述该数据子集中的儿童发展。结果：样本中来自64个地方政府的226505名儿童具有完整的ASQ®-3数据，86.2%的儿童达到了预期的发展水平。来自最贫困社区的儿童（82.6%）、黑人儿童（78.9%）和男孩（81.7%）不太可能达到预期的发展水平。结论：为了充分了解整个英国的早期儿童发展，CSDS中ASQ®-3数据的完整性需要改进。其次，为了解释国家CSDS关于儿童发展的数据，ASQ®-3应该在英语背景下进行标准化和验证，并关注实施和随后的转诊和支持途径。我们的研究提供了需要发展支持的儿童的最低估计（13.8%），还有更多的儿童可能经历中度或轻度的延迟，但没有被ASQ®-3的预期发展界限所识别。

{"title":"Early child development in England: cross-sectional analysis of ASQ®-3 records from the 2-2½-year universal health visiting review using national administrative data (Community Service Dataset, CSDS).","authors":"Jayu Jung, Sarah Cattan, Claire Powell, Jane Barlow, Mengyun Liu, Amanda Clery, Louise Mc Grath-Lone, Catherine Bunting, Jenny Woodman","doi":"10.23889/ijpds.v9i2.2459","DOIUrl":"10.23889/ijpds.v9i2.2459","url":null,"abstract":"Introduction: The Ages & Stages Questionnaire 3rd Edition (ASQ®-3) is a tool to measure developmental delay for children aged between 1 - 66 months originally developed in the United States. This measure has been collected in England since 2015 as a part of mandated 2-2½-year health visiting reviews and collated nationally in the Community Services Dataset (CSDS). CSDS is known to be incomplete and to-date there have not been any published analyses of ASQ®-3 held within CSDS.Objectives: This study aimed to a) identify a subset of complete child development data for children aged two in England using ASQ®-3 data in CSDS between 2018/19-2020/21; b) use this subset of data to analyse child development age 2-2½-years in England.Methods: This study compared counts of ASQ®-3 records in CSDS by local authority and financial quarter against national, publicly available Health Visitor Service Delivery Metrics (HVSDM) to identify local authorities with complete ASQ®-3 records in CSDS. This study described child development in this subset of the data using both a binary cut-off of whether a child reached expected level of development and the continuous ASQ®-3 score.Results: Among the 226,505 children from 64 local authorities in the sample with complete ASQ®-3 data, 86.2% met expected level of development. Children from the most deprived neighbourhoods (82.6%), children recorded as Black (78.9%), and boys (81.7%) were less likely to meet expected level of development.Conclusions: To fully understand early child development across England, the completeness of ASQ®-3 data in the CSDS requires improvement. Second, in order to interpret the national CSDS data on child development, ASQ®-3 should be standardised and validated in an English context with attention paid to implementation and subsequent referral and support pathways. Our study provides a minimum estimate of children needing developmental support (13.8%), with many more children likely to be experiencing moderate or mild delay but not identified by the ASQ®-3 cut-offs for expected development.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"9 2","pages":"2459"},"PeriodicalIF":1.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934300/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing uncertainty in identifying pregnancies in the English CPRD GOLD Pregnancy Register: a methodological study using a worked example. 解决在英语CPRD GOLD妊娠登记中确定妊娠的不确定性：使用工作实例的方法学研究。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-02-25 eCollection Date: 2025-01-01 DOI: 10.23889/ijpds.v10i1.2471

Yangmei Li, Jennifer J Kurinczuk, Fiona Alderdice, Maria A Quigley, Oliver Rivero-Arias, Julia Sanders, Sara Kenyon, Dimitrios Siassakos, Nikesh Parekh, Suresha De Almeida, Claire Carson

Introduction: Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases.

Objectives: This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care.

Methods: We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: 'as provided', 'derived' (using our algorithm) and 'strictly derived' (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the 'as provided' population.

Results: Our algorithm added 22,270 (~7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies.For all pregnancies across women's reproductive history, in the 'strictly derived' population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the 'as provided' population.

Conclusion: PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies' specific requirements and may adapt our proposed methodology accordingly to align with their research questions.

电子健康记录对于妊娠相关研究是无价的。临床实践研究数据链（CPRD）妊娠登记（PR）识别初级保健记录中的妊娠，包括不确定病例。目的：本文概述了一种方法，以减少在CPRD GOLD PR数据中识别怀孕的不确定性，通过一项研究调查孕前护理的提供为例。方法：采用母婴链接（MBL）和妇产医院事件统计（HES）对CPRD PR数据进行整理和扩充。该研究包括所有年龄在18-48岁的女性，于2017年1月1日在CPRD的一家英国全科医生诊所注册，提前一年注册并有资格获得医院数据链接。我们开发了一种清理和组合算法，并进一步应用严格的数据质量标准，形成了三个群体：“提供”、“派生”（使用我们的算法）和“严格派生”（使用更严格的数据质量标准）。我们比较了这些人群的特征和结果，检查了使用“提供”人群进行效果估计的潜在偏差。结果：我们的算法将医院数据中的22,270例（约7%）妊娠添加到CPRD PR（1997-2021）中，消除了冲突妊娠和结局未知的妊娠，并最大限度地减少了过去妊娠的潜在非同期记录或部分妊娠记录。对于妇女生殖史上的所有怀孕，在数据质量较好的“严格派生”人口中，观察到已有疾病的患病率较高，孕前护理增加。在这个数据集中，暴露和结果的记录更好，与“提供”的人群相比，暴露和结果之间的关联程度降低了。结论：PR资料使用前需要清洗。本研究提出了一种实用的方法，利用现有的CPRD数据和相关记录来识别怀孕，而不需要额外的数据。研究人员应该仔细考虑他们的研究的具体要求，并可能相应地调整我们提出的方法，以配合他们的研究问题。

{"title":"Addressing uncertainty in identifying pregnancies in the English CPRD GOLD Pregnancy Register: a methodological study using a worked example.","authors":"Yangmei Li, Jennifer J Kurinczuk, Fiona Alderdice, Maria A Quigley, Oliver Rivero-Arias, Julia Sanders, Sara Kenyon, Dimitrios Siassakos, Nikesh Parekh, Suresha De Almeida, Claire Carson","doi":"10.23889/ijpds.v10i1.2471","DOIUrl":"10.23889/ijpds.v10i1.2471","url":null,"abstract":"Introduction: Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases.Objectives: This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care.Methods: We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: 'as provided', 'derived' (using our algorithm) and 'strictly derived' (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the 'as provided' population.Results: Our algorithm added 22,270 (~7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies.For all pregnancies across women's reproductive history, in the 'strictly derived' population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the 'as provided' population.Conclusion: PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies' specific requirements and may adapt our proposed methodology accordingly to align with their research questions.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2471"},"PeriodicalIF":1.6,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adapting historical clinical genetic test records for anonymised data linkage: obstacles and opportunities. 适应历史临床基因检测记录的匿名数据链接：障碍和机遇。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-02-20 eCollection Date: 2023-01-01 DOI: 10.23889/ijpds.v8i5.2924

Robert T Maddison, Karen R Reed, Rebecca Cannings-John, Fiona Lugg-Widger, Thomas Stoneman, Sarah Anderson, Andrew E Fry

Introduction: Cystic fibrosis (CF) heterozygotes (also known as 'carriers') are people who have one mutated copy of the CFTR gene. Research into the health risks of CF carriers has been limited by a lack of large cohorts tested for CF carrier status, but routine clinical testing identifies CF carriers in the population. Such test records additionally contain large amounts of clinical information, making them a valuable research resource to not only identify CF carriers in the population but also to provide additional data not found elsewhere.

Methods: Following governance approvals, we adapted 30 years worth of CF genetic testing records generated by the All-Wales Medical Genomics Service (AWMGS) and submitted them to the SAIL Databank for anonymised linkage.

Results: Unexpected obstacles meant that a minimum amount of clinical information could be annotated ahead of linkage. The raw data were highly heterogeneous due to the records' longitudinal collection and clinical origins, making standardisation difficult. Moreover, the presence of unique identifiers in the clinical data violated the separation principle, requiring manual annotation to produce a cleaned dataset. Explicit identification of patients or their relatives throughout the records complicated split file anonymisation.

Conclusion: Extracting useful information from historical clinical genetic test records is a significant challenge with technical and governance aspects. The mixing of unique identifiers with clinical data in heterogeneous, unstructured free text combined with a lack of automated tools meant that manual annotation was required to adhere to the separation principle. As such, only a minimum of the available clinical data was annotatable within the project timeline and mutually exclusive access to the identifiable and pseudonymised data meant that annotations could not later be validated. Future efforts to link clinical genetic test records for research must consider these challenges in their approach.

简介：囊性纤维化（CF）杂合子（也称为“携带者”）是指具有CFTR基因突变拷贝的人。对CF携带者健康风险的研究由于缺乏CF携带者状态的大型队列检测而受到限制，但常规临床检测可识别人群中的CF携带者。这些检测记录还包含大量的临床信息，使其成为一种宝贵的研究资源，不仅可以识别人群中的CF携带者，还可以提供其他地方找不到的额外数据。方法：在政府批准后，我们改编了由全威尔士医学基因组学服务（AWMGS）生成的30年CF基因检测记录，并将其提交给SAIL数据库进行匿名链接。结果：意想不到的障碍意味着在连接之前可以注释最少的临床信息。由于记录的纵向收集和临床来源，原始数据高度异质性，使标准化变得困难。此外，临床数据中存在的唯一标识符违反了分离原则，需要手工标注才能生成一个干净的数据集。明确识别患者或其亲属的全程记录复杂的分割文件匿名化。结论：从历史临床基因检测记录中提取有用信息是技术和管理方面的重大挑战。在异构的、非结构化的自由文本中混合了临床数据的唯一标识符，再加上缺乏自动化工具，这意味着需要手动注释来坚持分离原则。因此，在项目时间表内，只有最少的可用临床数据是可注释的，并且对可识别数据和假名数据的互斥访问意味着注释不能在以后进行验证。未来将临床基因检测记录与研究联系起来的努力必须考虑到这些挑战。

{"title":"Adapting historical clinical genetic test records for anonymised data linkage: obstacles and opportunities.","authors":"Robert T Maddison, Karen R Reed, Rebecca Cannings-John, Fiona Lugg-Widger, Thomas Stoneman, Sarah Anderson, Andrew E Fry","doi":"10.23889/ijpds.v8i5.2924","DOIUrl":"10.23889/ijpds.v8i5.2924","url":null,"abstract":"Introduction: Cystic fibrosis (CF) heterozygotes (also known as 'carriers') are people who have one mutated copy of the CFTR gene. Research into the health risks of CF carriers has been limited by a lack of large cohorts tested for CF carrier status, but routine clinical testing identifies CF carriers in the population. Such test records additionally contain large amounts of clinical information, making them a valuable research resource to not only identify CF carriers in the population but also to provide additional data not found elsewhere.Methods: Following governance approvals, we adapted 30 years worth of CF genetic testing records generated by the All-Wales Medical Genomics Service (AWMGS) and submitted them to the SAIL Databank for anonymised linkage.Results: Unexpected obstacles meant that a minimum amount of clinical information could be annotated ahead of linkage. The raw data were highly heterogeneous due to the records' longitudinal collection and clinical origins, making standardisation difficult. Moreover, the presence of unique identifiers in the clinical data violated the separation principle, requiring manual annotation to produce a cleaned dataset. Explicit identification of patients or their relatives throughout the records complicated split file anonymisation.Conclusion: Extracting useful information from historical clinical genetic test records is a significant challenge with technical and governance aspects. The mixing of unique identifiers with clinical data in heterogeneous, unstructured free text combined with a lack of automated tools meant that manual annotation was required to adhere to the separation principle. As such, only a minimum of the available clinical data was annotatable within the project timeline and mutually exclusive access to the identifiable and pseudonymised data meant that annotations could not later be validated. Future efforts to link clinical genetic test records for research must consider these challenges in their approach.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 5","pages":"2924"},"PeriodicalIF":1.6,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11922013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143665092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving opportunities for data linkage within Children Looked After administrative records in Wales. 改善威尔士儿童照顾行政记录中数据联系的机会。

IF 2.2 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-02-19 eCollection Date: 2025-01-01 DOI: 10.23889/ijpds.v10i1.2383

Grace A Bailey, Alex Lee, Saira Ahmed, Ieuan Scanlon, Laura E Cowley, Amy Stuart, Ian Farr, Caroline Brooks, Laura North, Lucy J Griffiths

Introduction: Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, these are known as Anonymised Linking Fields (ALFs). Assignment of an ALF enables linkage of individuals across multiple routinely collected datasets. Within the Children Looked After (CLA) Wales dataset, only 37% of the children have an ALF, limiting linkage to other datasets and, as a result, potential research. There are also other known data issues, including discrepancies with the week of births, duplicate identifiers and year-on-year changes in identifiers. Objectives To improve accuracy and availability of the ALFs in the CLA dataset, and overall research quality.

Methods: Using several datasets within the SAIL Databank, we developed a six-step CLA matching algorithm to improve the ALF matching rate and correct for data errors. To assess the performance of our algorithm, we benchmarked against routine ALFs already identified via the algorithm currently used by SAIL.

Results: Our algorithm increased ALF matching by 25%, assigning 61% of individuals an ALF. Inconsistent weeks of birth, and incorrect and duplicate identifiers were resolved. When benchmarking against the current ALF-assigning algorithm used by SAIL, our algorithm had an overall sensitivity of 90%.

Conclusion: We have developed an algorithm which demonstrates comparable ALF matching performance to the current algorithm used within SAIL, and which greatly improves the ALF matching in the CLA dataset. This algorithm may help to overcome potential bias due to missing data, and increases the potential for linkage to other datasets. Further development and refinement could result in the algorithm being applied to other datasets in SAIL.

基于人口的行政数据联动是研究重要公共问题的有力工具。为了克服机密性和披露问题，记录被去标识化并分配一个唯一标识符。在安全匿名信息链接（SAIL）数据库中，这些被称为匿名链接字段（alf）。分配一个ALF可以实现跨多个常规收集的数据集的个体链接。在儿童看护（CLA）威尔士数据集中，只有37%的儿童有ALF，限制了与其他数据集的联系，从而限制了潜在的研究。还有其他已知的数据问题，包括出生周不一致、标识符重复以及标识符逐年变化。目的提高CLA数据集中alf的准确性和可用性，提高整体研究质量。方法：利用SAIL数据库中的多个数据集，开发了一种六步CLA匹配算法，以提高ALF匹配率并纠正数据错误。为了评估算法的性能，我们对SAIL目前使用的算法已经确定的常规alf进行了基准测试。结果：我们的算法将ALF匹配率提高了25%，为61%的个体分配了一个ALF。解决了不一致的出生周以及不正确和重复的标识符。当与SAIL使用的当前alf分配算法进行基准测试时，我们的算法的总体灵敏度为90%。结论：我们开发了一种算法，其ALF匹配性能与SAIL中使用的现有算法相当，并且大大提高了CLA数据集中的ALF匹配。该算法可以帮助克服由于缺失数据造成的潜在偏差，并增加与其他数据集的链接潜力。进一步的开发和改进可以使该算法应用于SAIL中的其他数据集。

{"title":"Improving opportunities for data linkage within Children Looked After administrative records in Wales.","authors":"Grace A Bailey, Alex Lee, Saira Ahmed, Ieuan Scanlon, Laura E Cowley, Amy Stuart, Ian Farr, Caroline Brooks, Laura North, Lucy J Griffiths","doi":"10.23889/ijpds.v10i1.2383","DOIUrl":"10.23889/ijpds.v10i1.2383","url":null,"abstract":"Introduction: Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, these are known as Anonymised Linking Fields (ALFs). Assignment of an ALF enables linkage of individuals across multiple routinely collected datasets. Within the Children Looked After (CLA) Wales dataset, only 37% of the children have an ALF, limiting linkage to other datasets and, as a result, potential research. There are also other known data issues, including discrepancies with the week of births, duplicate identifiers and year-on-year changes in identifiers. Objectives To improve accuracy and availability of the ALFs in the CLA dataset, and overall research quality.Methods: Using several datasets within the SAIL Databank, we developed a six-step CLA matching algorithm to improve the ALF matching rate and correct for data errors. To assess the performance of our algorithm, we benchmarked against routine ALFs already identified via the algorithm currently used by SAIL.Results: Our algorithm increased ALF matching by 25%, assigning 61% of individuals an ALF. Inconsistent weeks of birth, and incorrect and duplicate identifiers were resolved. When benchmarking against the current ALF-assigning algorithm used by SAIL, our algorithm had an overall sensitivity of 90%.Conclusion: We have developed an algorithm which demonstrates comparable ALF matching performance to the current algorithm used within SAIL, and which greatly improves the ALF matching in the CLA dataset. This algorithm may help to overcome potential bias due to missing data, and increases the potential for linkage to other datasets. Further development and refinement could result in the algorithm being applied to other datasets in SAIL.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2383"},"PeriodicalIF":2.2,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

UK Longitudinal Linkage Collaboration (UK LLC): The National Trusted Research Environment for Longitudinal Research. 英国纵向联系合作组织（UK LLC）：国家纵向研究可信研究环境。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-02-17 eCollection Date: 2025-01-01 DOI: 10.23889/ijpds.v10i1.2468

Andy Boyd, Katharine M Evans, Emma L Turner, Robin Flaig, Jacqui Oakley, Kirsteen C Campbell, Richard Thomas, Stela McLachlan, Matthew Crane, Rebecca Whitehorn, Rachel Calkin, Abigail Hill, Samantha Berman, David Ford, Martin Tobin, David Porteous, Danielle F Gomes, Maria-Paz Garcia, Andrew Wong, Aida Sanchez, Chris Orton, Simon Thompson, John Gulliver, Kathryn Adams, Ellena Badrick, Chiara Batini, Michaela Benzeval, Susie Boatman, Gerome Breen, Shannon Bristow, Abigail Britten, Luke Bryant, Adam Butterworth, Archie Campbell, Sarah Chave, John Danesh, Jayati Das-Munshi, Karen Dennison, Emanuele Di Angelantonio, Thalia C Eley, Helen Fisher, Emla Fitzsimons, Alissa Goodman, Michael Gregg, Anna L Guyatt, Anna Hansell, Rebecca Harmston, Andy Heard, Morag Henderson, Rosie Hill, Szu-Chia Huang, Catherine John, Frank Kee, Nathalie Kingston, Jack Kneeshaw, Rashmi Kumar, Genevieve Lachance, Celestine Lockhart, Hazel Lockhart-Jones, Sarah Markham, Dan Mason, Bernadette McGuinness, Maisie McKenzie, Amy McMahon, Chelsea Mika Malouf, Mark Mumme, Charlotte Neville, Kate Northstone, Zoe Oldfield, Dara O'Neill, Manish Pareek, John Pickavance, Yasmin Rahman, Holly Reilly, Angela Scott, Deb Smith, Andrew Steptoe, Claire Steves, Cathie Sudlow, Gerald Sze, Nicholas L Timpson, Tapiwa Tungamirai, Laura Venn, Matthew Walker, Neil Walker, Nicolas Wareham, Aidan Watmuff, Tony Webb, Karen Williams, John Wright, Darioush Yarand, George B Ploubidis, John Macleod, Jonathan Ac Sterne, Nishi Chaturvedi

Introduction: The UK Longitudinal Linkage Collaboration (UK LLC) is the national Trusted Research Environment (TRE) for the UK's longitudinal research community, supporting the UK's unparalleled collection of Longitudinal Population Studies (LPS). Initially set up as a COVID-19 research resource, UK LLC is now a generic database for any research for the public good.

Objectives: UK LLC supports longitudinal research by providing record linkage and TRE services.

Methods: The UK LLC partnership provides a secure analytics environment, a trusted third-party linkage processor and a comprehensive governance framework to minimise risks to participant confidentiality. UK LLC is ISO 27001 certified and accredited by the UK Statistics Authority as a processor under the Digital Economy Act. The active involvement by members of UK LLC's public involvement programme ensures UK LLC is acceptable to LPS participants and the wider public. All UK LPS are eligible for inclusion. Researchers can apply to access the TRE via an approach that fulfils the needs of the LPS, the linked data owners and includes a review by public contributors.

Results: Twenty-two LPS have so far joined UK LLC. Where permissions allow, participants are linked to their National Health Service (NHS) England, NHS Wales and place-based records, with work ongoing to link to NHS Scotland and non-health administrative records, including Department for Work and Pensions and His Majesty's (HM) Revenue and Customs. UK LLC Explore allows potential researchers to discover the breadth of data available in the TRE. All applications are listed on UK LLC's publicly accessible Data Access Register.

Conclusions: UK LLC enables researchers to interrogate pooled LPS participant data that are systematically linked to diverse records. UK LLC remains open to additional LPS joining the partnership and will increase the breadth of data to support the longitudinal research community and attract increasing numbers of researchers across multiple disciplines, government departments and industry.

简介：英国纵向联系合作组织（UK LLC）是英国纵向研究界的国家可信研究环境（TRE），为英国无与伦比的纵向人口研究（LPS）资料库提供支持。UK LLC 最初是作为 COVID-19 研究资源建立的，现在已成为一个通用数据库，可用于任何公益研究：UK LLC 通过提供记录链接和 TRE 服务支持纵向研究：UK LLC 合作伙伴关系提供了一个安全的分析环境、一个值得信赖的第三方链接处理器和一个全面的管理框架，以最大限度地降低参与者的保密风险。UK LLC 通过了 ISO 27001 认证，并被英国统计局认可为《数字经济法案》规定的处理商。UK LLC 公众参与计划成员的积极参与确保了 UK LLC 为 LPS 参与者和广大公众所接受。所有英国 LPS 都有资格被纳入。研究人员可以通过满足 LPS、链接数据所有者需求的方法申请访问 TRE，其中包括由公众贡献者进行审查：结果：迄今为止，已有 22 个 LPS 加入英国 LLC。在权限允许的情况下，参与者与英格兰国家医疗服务系统（NHS）、威尔士国家医疗服务系统（NHS）和地方记录进行了链接，与苏格兰国家医疗服务系统（NHS）和非医疗行政记录（包括就业与养老金部和英国税务海关总署）的链接工作正在进行中。UK LLC Explore 允许潜在研究人员发现 TRE 中可用数据的广度。所有申请都列在 UK LLC 的公开数据访问注册表（Data Access Register.Conclusions）上：英国有限责任公司使研究人员能够查询汇集的 LPS 参与者数据，这些数据与不同的记录进行了系统链接。英国有限责任公司对更多的 LPS 加入合作伙伴关系持开放态度，并将增加数据的广度，为纵向研究界提供支持，吸引更多跨学科、跨政府部门和跨行业的研究人员。

{"title":"UK Longitudinal Linkage Collaboration (UK LLC): The National Trusted Research Environment for Longitudinal Research.","authors":"Andy Boyd, Katharine M Evans, Emma L Turner, Robin Flaig, Jacqui Oakley, Kirsteen C Campbell, Richard Thomas, Stela McLachlan, Matthew Crane, Rebecca Whitehorn, Rachel Calkin, Abigail Hill, Samantha Berman, David Ford, Martin Tobin, David Porteous, Danielle F Gomes, Maria-Paz Garcia, Andrew Wong, Aida Sanchez, Chris Orton, Simon Thompson, John Gulliver, Kathryn Adams, Ellena Badrick, Chiara Batini, Michaela Benzeval, Susie Boatman, Gerome Breen, Shannon Bristow, Abigail Britten, Luke Bryant, Adam Butterworth, Archie Campbell, Sarah Chave, John Danesh, Jayati Das-Munshi, Karen Dennison, Emanuele Di Angelantonio, Thalia C Eley, Helen Fisher, Emla Fitzsimons, Alissa Goodman, Michael Gregg, Anna L Guyatt, Anna Hansell, Rebecca Harmston, Andy Heard, Morag Henderson, Rosie Hill, Szu-Chia Huang, Catherine John, Frank Kee, Nathalie Kingston, Jack Kneeshaw, Rashmi Kumar, Genevieve Lachance, Celestine Lockhart, Hazel Lockhart-Jones, Sarah Markham, Dan Mason, Bernadette McGuinness, Maisie McKenzie, Amy McMahon, Chelsea Mika Malouf, Mark Mumme, Charlotte Neville, Kate Northstone, Zoe Oldfield, Dara O'Neill, Manish Pareek, John Pickavance, Yasmin Rahman, Holly Reilly, Angela Scott, Deb Smith, Andrew Steptoe, Claire Steves, Cathie Sudlow, Gerald Sze, Nicholas L Timpson, Tapiwa Tungamirai, Laura Venn, Matthew Walker, Neil Walker, Nicolas Wareham, Aidan Watmuff, Tony Webb, Karen Williams, John Wright, Darioush Yarand, George B Ploubidis, John Macleod, Jonathan Ac Sterne, Nishi Chaturvedi","doi":"10.23889/ijpds.v10i1.2468","DOIUrl":"10.23889/ijpds.v10i1.2468","url":null,"abstract":"Introduction: The UK Longitudinal Linkage Collaboration (UK LLC) is the national Trusted Research Environment (TRE) for the UK's longitudinal research community, supporting the UK's unparalleled collection of Longitudinal Population Studies (LPS). Initially set up as a COVID-19 research resource, UK LLC is now a generic database for any research for the public good.Objectives: UK LLC supports longitudinal research by providing record linkage and TRE services.Methods: The UK LLC partnership provides a secure analytics environment, a trusted third-party linkage processor and a comprehensive governance framework to minimise risks to participant confidentiality. UK LLC is ISO 27001 certified and accredited by the UK Statistics Authority as a processor under the Digital Economy Act. The active involvement by members of UK LLC's public involvement programme ensures UK LLC is acceptable to LPS participants and the wider public. All UK LPS are eligible for inclusion. Researchers can apply to access the TRE via an approach that fulfils the needs of the LPS, the linked data owners and includes a review by public contributors.Results: Twenty-two LPS have so far joined UK LLC. Where permissions allow, participants are linked to their National Health Service (NHS) England, NHS Wales and place-based records, with work ongoing to link to NHS Scotland and non-health administrative records, including Department for Work and Pensions and His Majesty's (HM) Revenue and Customs. UK LLC Explore allows potential researchers to discover the breadth of data available in the TRE. All applications are listed on UK LLC's publicly accessible Data Access Register.Conclusions: UK LLC enables researchers to interrogate pooled LPS participant data that are systematically linked to diverse records. UK LLC remains open to additional LPS joining the partnership and will increase the breadth of data to support the longitudinal research community and attract increasing numbers of researchers across multiple disciplines, government departments and industry.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2468"},"PeriodicalIF":1.6,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11931487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kids' Environment and Health Cohort: Database Protocol: supplementary appendix. 儿童环境与健康队列：数据库协议：补充附录。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science

Pub Date : 2025-02-13 eCollection Date: 2025-01-01 DOI: 10.23889/ijpds.v10i1.2475

Selin Akaraci, Alison Macfarlane, Amal Rammah, Emilie Courtin, Esther Lewis, Faith Miller, Jason Powell-Bavester, Jessica Mitchell, Joana Cruz, Matthew Lilliman, Niloofar Shoari, Samantha Hajna, Steven Cummins, Tolu Adedire, Vahe Nafilyan, Pia Hardelid

Introduction: Environmental exposures are known to affect the health and well-being of populations throughout the life course. Children are particularly susceptible to environmental impacts on educational and health outcomes as they spend more time in their local environments compared to adults. In England, no national, longitudinal dataset linking information about the physical and social environment in and around homes and schools to children's health and education outcomes currently exists. This limits our understanding of how environments might impact the health and well-being of children as they grow up.

Objective: To establish the Kids' Environment and Health Cohort, a research-ready, de-identified and annually updated national birth cohort of all children born in England from 2006 onwards.

Methods: The Kids' Environment and Health Cohort will link birth and mortality records, health and educational attainment datasets, to maternal health (up to 12 months prior to their child's birth), and environmental data for all children born in England from 2006 - approximately 11 million children at first build. A subset of children born between 2010 and 2012, and between 2020 and 2022 will be linked to their mothers' 2011 or 2021 Census records, respectively. The cohort database will be held in, and accessed via, a trusted research environment (TRE) at the Office for National Statistics (ONS). All geographical identifiers in the cohort, allowing for linkage to further environmental data, will be securely held by the ONS, separately to the main cohort, and will be encrypted before being shared with researchers.

Conclusion: The Kids' Environment and Health Cohort will, for the first time, link administrative health and education data to longitudinal environmental exposures for children at national level in England. It will serve as a data resource to support research about the health and well-being of children via improved home and school environments.

导言：众所周知，环境暴露会影响人们一生的健康和福祉。儿童特别容易受到环境对教育和健康结果的影响，因为与成年人相比，他们在当地环境中度过的时间更长。在英格兰，目前没有将家庭和学校内外的物质和社会环境与儿童健康和教育成果联系起来的全国性纵向数据集。这限制了我们对环境如何影响儿童成长过程中的健康和福祉的理解。目的：建立儿童环境与健康队列，这是一个研究就绪、不确定并每年更新的国家出生队列，涵盖自2006年以来在英格兰出生的所有儿童。方法：儿童环境和健康队列将把出生和死亡记录、健康和教育成就数据集与2006年以来在英格兰出生的所有儿童的产妇健康（孩子出生前12个月）和环境数据联系起来——首次建立的儿童约为1100万。2010年至2012年和2020年至2022年之间出生的孩子将分别与他们母亲2011年或2021年的人口普查记录相关联。队列数据库将保存在国家统计局（ONS）的可信研究环境（TRE）中，并通过该环境进行访问。队列中的所有地理标识符，允许与进一步的环境数据联系，将由国家统计局安全保存，与主要队列分开，并在与研究人员共享之前进行加密。结论：儿童环境与健康队列将首次将英格兰国家一级的行政卫生和教育数据与儿童的纵向环境暴露联系起来。它将作为一种数据资源，通过改善家庭和学校环境，支持有关儿童健康和福祉的研究。

{"title":"Kids' Environment and Health Cohort: Database Protocol: supplementary appendix.","authors":"Selin Akaraci, Alison Macfarlane, Amal Rammah, Emilie Courtin, Esther Lewis, Faith Miller, Jason Powell-Bavester, Jessica Mitchell, Joana Cruz, Matthew Lilliman, Niloofar Shoari, Samantha Hajna, Steven Cummins, Tolu Adedire, Vahe Nafilyan, Pia Hardelid","doi":"10.23889/ijpds.v10i1.2475","DOIUrl":"10.23889/ijpds.v10i1.2475","url":null,"abstract":"Introduction: Environmental exposures are known to affect the health and well-being of populations throughout the life course. Children are particularly susceptible to environmental impacts on educational and health outcomes as they spend more time in their local environments compared to adults. In England, no national, longitudinal dataset linking information about the physical and social environment in and around homes and schools to children's health and education outcomes currently exists. This limits our understanding of how environments might impact the health and well-being of children as they grow up.Objective: To establish the Kids' Environment and Health Cohort, a research-ready, de-identified and annually updated national birth cohort of all children born in England from 2006 onwards.Methods: The Kids' Environment and Health Cohort will link birth and mortality records, health and educational attainment datasets, to maternal health (up to 12 months prior to their child's birth), and environmental data for all children born in England from 2006 - approximately 11 million children at first build. A subset of children born between 2010 and 2012, and between 2020 and 2022 will be linked to their mothers' 2011 or 2021 Census records, respectively. The cohort database will be held in, and accessed via, a trusted research environment (TRE) at the Office for National Statistics (ONS). All geographical identifiers in the cohort, allowing for linkage to further environmental data, will be securely held by the ONS, separately to the main cohort, and will be encrypted before being shared with researchers.Conclusion: The Kids' Environment and Health Cohort will, for the first time, link administrative health and education data to longitudinal environmental exposures for children at national level in England. It will serve as a data resource to support research about the health and well-being of children via improved home and school environments.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2475"},"PeriodicalIF":1.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0