Pub Date : 2025-02-27eCollection Date: 2024-01-01DOI: 10.23889/ijpds.v9i2.2459
Jayu Jung, Sarah Cattan, Claire Powell, Jane Barlow, Mengyun Liu, Amanda Clery, Louise Mc Grath-Lone, Catherine Bunting, Jenny Woodman
Introduction: The Ages & Stages Questionnaire 3rd Edition (ASQ®-3) is a tool to measure developmental delay for children aged between 1 - 66 months originally developed in the United States. This measure has been collected in England since 2015 as a part of mandated 2-2½-year health visiting reviews and collated nationally in the Community Services Dataset (CSDS). CSDS is known to be incomplete and to-date there have not been any published analyses of ASQ®-3 held within CSDS.
Objectives: This study aimed to a) identify a subset of complete child development data for children aged two in England using ASQ®-3 data in CSDS between 2018/19-2020/21; b) use this subset of data to analyse child development age 2-2½-years in England.
Methods: This study compared counts of ASQ®-3 records in CSDS by local authority and financial quarter against national, publicly available Health Visitor Service Delivery Metrics (HVSDM) to identify local authorities with complete ASQ®-3 records in CSDS. This study described child development in this subset of the data using both a binary cut-off of whether a child reached expected level of development and the continuous ASQ®-3 score.
Results: Among the 226,505 children from 64 local authorities in the sample with complete ASQ®-3 data, 86.2% met expected level of development. Children from the most deprived neighbourhoods (82.6%), children recorded as Black (78.9%), and boys (81.7%) were less likely to meet expected level of development.
Conclusions: To fully understand early child development across England, the completeness of ASQ®-3 data in the CSDS requires improvement. Second, in order to interpret the national CSDS data on child development, ASQ®-3 should be standardised and validated in an English context with attention paid to implementation and subsequent referral and support pathways. Our study provides a minimum estimate of children needing developmental support (13.8%), with many more children likely to be experiencing moderate or mild delay but not identified by the ASQ®-3 cut-offs for expected development.
{"title":"Early child development in England: cross-sectional analysis of ASQ<sup>®</sup>-3 records from the 2-2½-year universal health visiting review using national administrative data (Community Service Dataset, CSDS).","authors":"Jayu Jung, Sarah Cattan, Claire Powell, Jane Barlow, Mengyun Liu, Amanda Clery, Louise Mc Grath-Lone, Catherine Bunting, Jenny Woodman","doi":"10.23889/ijpds.v9i2.2459","DOIUrl":"10.23889/ijpds.v9i2.2459","url":null,"abstract":"<p><strong>Introduction: </strong>The Ages & Stages Questionnaire 3rd Edition (ASQ<sup>®</sup>-3) is a tool to measure developmental delay for children aged between 1 - 66 months originally developed in the United States. This measure has been collected in England since 2015 as a part of mandated 2-2½-year health visiting reviews and collated nationally in the Community Services Dataset (CSDS). CSDS is known to be incomplete and to-date there have not been any published analyses of ASQ<sup>®</sup>-3 held within CSDS.</p><p><strong>Objectives: </strong>This study aimed to a) identify a subset of complete child development data for children aged two in England using ASQ<sup>®</sup>-3 data in CSDS between 2018/19-2020/21; b) use this subset of data to analyse child development age 2-2½-years in England.</p><p><strong>Methods: </strong>This study compared counts of ASQ<sup>®</sup>-3 records in CSDS by local authority and financial quarter against national, publicly available Health Visitor Service Delivery Metrics (HVSDM) to identify local authorities with complete ASQ<sup>®</sup>-3 records in CSDS. This study described child development in this subset of the data using both a binary cut-off of whether a child reached expected level of development and the continuous ASQ<sup>®</sup>-3 score.</p><p><strong>Results: </strong>Among the 226,505 children from 64 local authorities in the sample with complete ASQ<sup>®</sup>-3 data, 86.2% met expected level of development. Children from the most deprived neighbourhoods (82.6%), children recorded as Black (78.9%), and boys (81.7%) were less likely to meet expected level of development.</p><p><strong>Conclusions: </strong>To fully understand early child development across England, the completeness of ASQ<sup>®</sup>-3 data in the CSDS requires improvement. Second, in order to interpret the national CSDS data on child development, ASQ<sup>®</sup>-3 should be standardised and validated in an English context with attention paid to implementation and subsequent referral and support pathways. Our study provides a minimum estimate of children needing developmental support (13.8%), with many more children likely to be experiencing moderate or mild delay but not identified by the ASQ<sup>®</sup>-3 cut-offs for expected development.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"9 2","pages":"2459"},"PeriodicalIF":1.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934300/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-25eCollection Date: 2025-01-01DOI: 10.23889/ijpds.v10i1.2471
Yangmei Li, Jennifer J Kurinczuk, Fiona Alderdice, Maria A Quigley, Oliver Rivero-Arias, Julia Sanders, Sara Kenyon, Dimitrios Siassakos, Nikesh Parekh, Suresha De Almeida, Claire Carson
Introduction: Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases.
Objectives: This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care.
Methods: We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: 'as provided', 'derived' (using our algorithm) and 'strictly derived' (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the 'as provided' population.
Results: Our algorithm added 22,270 (~7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies.For all pregnancies across women's reproductive history, in the 'strictly derived' population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the 'as provided' population.
Conclusion: PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies' specific requirements and may adapt our proposed methodology accordingly to align with their research questions.
{"title":"Addressing uncertainty in identifying pregnancies in the English CPRD GOLD Pregnancy Register: a methodological study using a worked example.","authors":"Yangmei Li, Jennifer J Kurinczuk, Fiona Alderdice, Maria A Quigley, Oliver Rivero-Arias, Julia Sanders, Sara Kenyon, Dimitrios Siassakos, Nikesh Parekh, Suresha De Almeida, Claire Carson","doi":"10.23889/ijpds.v10i1.2471","DOIUrl":"10.23889/ijpds.v10i1.2471","url":null,"abstract":"<p><strong>Introduction: </strong>Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases.</p><p><strong>Objectives: </strong>This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care.</p><p><strong>Methods: </strong>We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: 'as provided', 'derived' (using our algorithm) and 'strictly derived' (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the 'as provided' population.</p><p><strong>Results: </strong>Our algorithm added 22,270 (~7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies.For all pregnancies across women's reproductive history, in the 'strictly derived' population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the 'as provided' population.</p><p><strong>Conclusion: </strong>PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies' specific requirements and may adapt our proposed methodology accordingly to align with their research questions.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2471"},"PeriodicalIF":1.6,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-20eCollection Date: 2023-01-01DOI: 10.23889/ijpds.v8i5.2924
Robert T Maddison, Karen R Reed, Rebecca Cannings-John, Fiona Lugg-Widger, Thomas Stoneman, Sarah Anderson, Andrew E Fry
Introduction: Cystic fibrosis (CF) heterozygotes (also known as 'carriers') are people who have one mutated copy of the CFTR gene. Research into the health risks of CF carriers has been limited by a lack of large cohorts tested for CF carrier status, but routine clinical testing identifies CF carriers in the population. Such test records additionally contain large amounts of clinical information, making them a valuable research resource to not only identify CF carriers in the population but also to provide additional data not found elsewhere.
Methods: Following governance approvals, we adapted 30 years worth of CF genetic testing records generated by the All-Wales Medical Genomics Service (AWMGS) and submitted them to the SAIL Databank for anonymised linkage.
Results: Unexpected obstacles meant that a minimum amount of clinical information could be annotated ahead of linkage. The raw data were highly heterogeneous due to the records' longitudinal collection and clinical origins, making standardisation difficult. Moreover, the presence of unique identifiers in the clinical data violated the separation principle, requiring manual annotation to produce a cleaned dataset. Explicit identification of patients or their relatives throughout the records complicated split file anonymisation.
Conclusion: Extracting useful information from historical clinical genetic test records is a significant challenge with technical and governance aspects. The mixing of unique identifiers with clinical data in heterogeneous, unstructured free text combined with a lack of automated tools meant that manual annotation was required to adhere to the separation principle. As such, only a minimum of the available clinical data was annotatable within the project timeline and mutually exclusive access to the identifiable and pseudonymised data meant that annotations could not later be validated. Future efforts to link clinical genetic test records for research must consider these challenges in their approach.
{"title":"Adapting historical clinical genetic test records for anonymised data linkage: obstacles and opportunities.","authors":"Robert T Maddison, Karen R Reed, Rebecca Cannings-John, Fiona Lugg-Widger, Thomas Stoneman, Sarah Anderson, Andrew E Fry","doi":"10.23889/ijpds.v8i5.2924","DOIUrl":"10.23889/ijpds.v8i5.2924","url":null,"abstract":"<p><strong>Introduction: </strong>Cystic fibrosis (CF) heterozygotes (also known as 'carriers') are people who have one mutated copy of the <i>CFTR</i> gene. Research into the health risks of CF carriers has been limited by a lack of large cohorts tested for CF carrier status, but routine clinical testing identifies CF carriers in the population. Such test records additionally contain large amounts of clinical information, making them a valuable research resource to not only identify CF carriers in the population but also to provide additional data not found elsewhere.</p><p><strong>Methods: </strong>Following governance approvals, we adapted 30 years worth of CF genetic testing records generated by the All-Wales Medical Genomics Service (AWMGS) and submitted them to the SAIL Databank for anonymised linkage.</p><p><strong>Results: </strong>Unexpected obstacles meant that a minimum amount of clinical information could be annotated ahead of linkage. The raw data were highly heterogeneous due to the records' longitudinal collection and clinical origins, making standardisation difficult. Moreover, the presence of unique identifiers in the clinical data violated the separation principle, requiring manual annotation to produce a cleaned dataset. Explicit identification of patients or their relatives throughout the records complicated split file anonymisation.</p><p><strong>Conclusion: </strong>Extracting useful information from historical clinical genetic test records is a significant challenge with technical and governance aspects. The mixing of unique identifiers with clinical data in heterogeneous, unstructured free text combined with a lack of automated tools meant that manual annotation was required to adhere to the separation principle. As such, only a minimum of the available clinical data was annotatable within the project timeline and mutually exclusive access to the identifiable and pseudonymised data meant that annotations could not later be validated. Future efforts to link clinical genetic test records for research must consider these challenges in their approach.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 5","pages":"2924"},"PeriodicalIF":1.6,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11922013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143665092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-19eCollection Date: 2025-01-01DOI: 10.23889/ijpds.v10i1.2383
Grace A Bailey, Alex Lee, Saira Ahmed, Ieuan Scanlon, Laura E Cowley, Amy Stuart, Ian Farr, Caroline Brooks, Laura North, Lucy J Griffiths
Introduction: Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, these are known as Anonymised Linking Fields (ALFs). Assignment of an ALF enables linkage of individuals across multiple routinely collected datasets. Within the Children Looked After (CLA) Wales dataset, only 37% of the children have an ALF, limiting linkage to other datasets and, as a result, potential research. There are also other known data issues, including discrepancies with the week of births, duplicate identifiers and year-on-year changes in identifiers. Objectives To improve accuracy and availability of the ALFs in the CLA dataset, and overall research quality.
Methods: Using several datasets within the SAIL Databank, we developed a six-step CLA matching algorithm to improve the ALF matching rate and correct for data errors. To assess the performance of our algorithm, we benchmarked against routine ALFs already identified via the algorithm currently used by SAIL.
Results: Our algorithm increased ALF matching by 25%, assigning 61% of individuals an ALF. Inconsistent weeks of birth, and incorrect and duplicate identifiers were resolved. When benchmarking against the current ALF-assigning algorithm used by SAIL, our algorithm had an overall sensitivity of 90%.
Conclusion: We have developed an algorithm which demonstrates comparable ALF matching performance to the current algorithm used within SAIL, and which greatly improves the ALF matching in the CLA dataset. This algorithm may help to overcome potential bias due to missing data, and increases the potential for linkage to other datasets. Further development and refinement could result in the algorithm being applied to other datasets in SAIL.
{"title":"Improving opportunities for data linkage within Children Looked After administrative records in Wales.","authors":"Grace A Bailey, Alex Lee, Saira Ahmed, Ieuan Scanlon, Laura E Cowley, Amy Stuart, Ian Farr, Caroline Brooks, Laura North, Lucy J Griffiths","doi":"10.23889/ijpds.v10i1.2383","DOIUrl":"10.23889/ijpds.v10i1.2383","url":null,"abstract":"<p><strong>Introduction: </strong>Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, these are known as Anonymised Linking Fields (ALFs). Assignment of an ALF enables linkage of individuals across multiple routinely collected datasets. Within the Children Looked After (CLA) Wales dataset, only 37% of the children have an ALF, limiting linkage to other datasets and, as a result, potential research. There are also other known data issues, including discrepancies with the week of births, duplicate identifiers and year-on-year changes in identifiers. Objectives To improve accuracy and availability of the ALFs in the CLA dataset, and overall research quality.</p><p><strong>Methods: </strong>Using several datasets within the SAIL Databank, we developed a six-step CLA matching algorithm to improve the ALF matching rate and correct for data errors. To assess the performance of our algorithm, we benchmarked against routine ALFs already identified via the algorithm currently used by SAIL.</p><p><strong>Results: </strong>Our algorithm increased ALF matching by 25%, assigning 61% of individuals an ALF. Inconsistent weeks of birth, and incorrect and duplicate identifiers were resolved. When benchmarking against the current ALF-assigning algorithm used by SAIL, our algorithm had an overall sensitivity of 90%.</p><p><strong>Conclusion: </strong>We have developed an algorithm which demonstrates comparable ALF matching performance to the current algorithm used within SAIL, and which greatly improves the ALF matching in the CLA dataset. This algorithm may help to overcome potential bias due to missing data, and increases the potential for linkage to other datasets. Further development and refinement could result in the algorithm being applied to other datasets in SAIL.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2383"},"PeriodicalIF":2.2,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-19eCollection Date: 2023-01-01DOI: 10.23889/ijpds.v8i6.3139
Amanj Kurdi, Laura Stobo, Morven Millar, Will Clayton, Andrew Merrick, Stuart McTaggart, Tanja Mueller, Marion Bennie
Introduction: The Homecare Medicines (HCM) dataset is a national, patient-level dataset developed by Public Health Scotland (PHS) to capture the supply of specialist medicines delivered through homecare services in Scotland. These services are a critical component of outpatient treatment pathways, particularly for long-term conditions requiring specialist care, such as inflammatory arthritis, cancer, and immune-mediated diseases. Prior to 2019, data on homecare prescribing were fragmented and locally held, limiting national analyses.
Methods: The dataset was initially established during the COVID-19 pandemic to identify immunocompromised patients for vaccine prioritisation. Monthly supply-level data are submitted by homecare providers. Each record includes a pseudonymised unique patient identifier, derived through national health person-level data linkage processes and standardised medicine information mapped to the NHS dictionary of medicines and devices (dm+d), including medicine name (brand and/or generic), formulation and supply date, and, where provided, treatment indication. The presence of a unique patient identifier enables deterministic linkage with a range of national datasets, including community and hospital prescribing, hospital admissions, mortality, cancer registry, and demographic indicators.
Results: The HCM dataset is held securely within the PHS national data infrastructure and accessed via the National Safe Haven. As of April 2025, it includes data from five national providers and covers approximately 98% of the Scottish homecare market. The dataset comprises over 1.3 million supply records for more than 41,000 patients since 2019. Data quality is high for core fields, with missingness levels very low-almost all key variables have <1% missing values-and more than 99.9% of records are successfully indexed with the unique patient identifier. Indication data is partially complete and improving. Medicines are coded using standardised drug dictionaries.
Conclusion: Access to the HCM dataset is available through eDRIS subject to Public Benefit and Privacy Panel (HSC-PBPP) approval. The dataset is well suited for studies on medicine utilisation, equity in access, treatment outcomes, and service planning. Ongoing improvements include enhanced indication capture and integration with Scotland's wider digital prescribing infrastructure.
{"title":"Data Resource Profile: Public Health Scotland (PHS) Homecare Medicines Dataset: A National Resource for Linked Prescribing Data for Specialist Medicines Prescribed in Hospital Outpatient setting and Supplied Via Homecare Services.","authors":"Amanj Kurdi, Laura Stobo, Morven Millar, Will Clayton, Andrew Merrick, Stuart McTaggart, Tanja Mueller, Marion Bennie","doi":"10.23889/ijpds.v8i6.3139","DOIUrl":"https://doi.org/10.23889/ijpds.v8i6.3139","url":null,"abstract":"<p><strong>Introduction: </strong>The Homecare Medicines (HCM) dataset is a national, patient-level dataset developed by Public Health Scotland (PHS) to capture the supply of specialist medicines delivered through homecare services in Scotland. These services are a critical component of outpatient treatment pathways, particularly for long-term conditions requiring specialist care, such as inflammatory arthritis, cancer, and immune-mediated diseases. Prior to 2019, data on homecare prescribing were fragmented and locally held, limiting national analyses.</p><p><strong>Methods: </strong>The dataset was initially established during the COVID-19 pandemic to identify immunocompromised patients for vaccine prioritisation. Monthly supply-level data are submitted by homecare providers. Each record includes a pseudonymised unique patient identifier, derived through national health person-level data linkage processes and standardised medicine information mapped to the NHS dictionary of medicines and devices (dm+d), including medicine name (brand and/or generic), formulation and supply date, and, where provided, treatment indication. The presence of a unique patient identifier enables deterministic linkage with a range of national datasets, including community and hospital prescribing, hospital admissions, mortality, cancer registry, and demographic indicators.</p><p><strong>Results: </strong>The HCM dataset is held securely within the PHS national data infrastructure and accessed via the National Safe Haven. As of April 2025, it includes data from five national providers and covers approximately 98% of the Scottish homecare market. The dataset comprises over 1.3 million supply records for more than 41,000 patients since 2019. Data quality is high for core fields, with missingness levels very low-almost all key variables have <1% missing values-and more than 99.9% of records are successfully indexed with the unique patient identifier. Indication data is partially complete and improving. Medicines are coded using standardised drug dictionaries.</p><p><strong>Conclusion: </strong>Access to the HCM dataset is available through eDRIS subject to Public Benefit and Privacy Panel (HSC-PBPP) approval. The dataset is well suited for studies on medicine utilisation, equity in access, treatment outcomes, and service planning. Ongoing improvements include enhanced indication capture and integration with Scotland's wider digital prescribing infrastructure.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 6","pages":"3139"},"PeriodicalIF":2.2,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13001856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17eCollection Date: 2025-01-01DOI: 10.23889/ijpds.v10i1.2468
Andy Boyd, Katharine M Evans, Emma L Turner, Robin Flaig, Jacqui Oakley, Kirsteen C Campbell, Richard Thomas, Stela McLachlan, Matthew Crane, Rebecca Whitehorn, Rachel Calkin, Abigail Hill, Samantha Berman, David Ford, Martin Tobin, David Porteous, Danielle F Gomes, Maria-Paz Garcia, Andrew Wong, Aida Sanchez, Chris Orton, Simon Thompson, John Gulliver, Kathryn Adams, Ellena Badrick, Chiara Batini, Michaela Benzeval, Susie Boatman, Gerome Breen, Shannon Bristow, Abigail Britten, Luke Bryant, Adam Butterworth, Archie Campbell, Sarah Chave, John Danesh, Jayati Das-Munshi, Karen Dennison, Emanuele Di Angelantonio, Thalia C Eley, Helen Fisher, Emla Fitzsimons, Alissa Goodman, Michael Gregg, Anna L Guyatt, Anna Hansell, Rebecca Harmston, Andy Heard, Morag Henderson, Rosie Hill, Szu-Chia Huang, Catherine John, Frank Kee, Nathalie Kingston, Jack Kneeshaw, Rashmi Kumar, Genevieve Lachance, Celestine Lockhart, Hazel Lockhart-Jones, Sarah Markham, Dan Mason, Bernadette McGuinness, Maisie McKenzie, Amy McMahon, Chelsea Mika Malouf, Mark Mumme, Charlotte Neville, Kate Northstone, Zoe Oldfield, Dara O'Neill, Manish Pareek, John Pickavance, Yasmin Rahman, Holly Reilly, Angela Scott, Deb Smith, Andrew Steptoe, Claire Steves, Cathie Sudlow, Gerald Sze, Nicholas L Timpson, Tapiwa Tungamirai, Laura Venn, Matthew Walker, Neil Walker, Nicolas Wareham, Aidan Watmuff, Tony Webb, Karen Williams, John Wright, Darioush Yarand, George B Ploubidis, John Macleod, Jonathan Ac Sterne, Nishi Chaturvedi
Introduction: The UK Longitudinal Linkage Collaboration (UK LLC) is the national Trusted Research Environment (TRE) for the UK's longitudinal research community, supporting the UK's unparalleled collection of Longitudinal Population Studies (LPS). Initially set up as a COVID-19 research resource, UK LLC is now a generic database for any research for the public good.
Objectives: UK LLC supports longitudinal research by providing record linkage and TRE services.
Methods: The UK LLC partnership provides a secure analytics environment, a trusted third-party linkage processor and a comprehensive governance framework to minimise risks to participant confidentiality. UK LLC is ISO 27001 certified and accredited by the UK Statistics Authority as a processor under the Digital Economy Act. The active involvement by members of UK LLC's public involvement programme ensures UK LLC is acceptable to LPS participants and the wider public. All UK LPS are eligible for inclusion. Researchers can apply to access the TRE via an approach that fulfils the needs of the LPS, the linked data owners and includes a review by public contributors.
Results: Twenty-two LPS have so far joined UK LLC. Where permissions allow, participants are linked to their National Health Service (NHS) England, NHS Wales and place-based records, with work ongoing to link to NHS Scotland and non-health administrative records, including Department for Work and Pensions and His Majesty's (HM) Revenue and Customs. UK LLC Explore allows potential researchers to discover the breadth of data available in the TRE. All applications are listed on UK LLC's publicly accessible Data Access Register.
Conclusions: UK LLC enables researchers to interrogate pooled LPS participant data that are systematically linked to diverse records. UK LLC remains open to additional LPS joining the partnership and will increase the breadth of data to support the longitudinal research community and attract increasing numbers of researchers across multiple disciplines, government departments and industry.
简介:英国纵向联系合作组织(UK LLC)是英国纵向研究界的国家可信研究环境(TRE),为英国无与伦比的纵向人口研究(LPS)资料库提供支持。UK LLC 最初是作为 COVID-19 研究资源建立的,现在已成为一个通用数据库,可用于任何公益研究:UK LLC 通过提供记录链接和 TRE 服务支持纵向研究:UK LLC 合作伙伴关系提供了一个安全的分析环境、一个值得信赖的第三方链接处理器和一个全面的管理框架,以最大限度地降低参与者的保密风险。UK LLC 通过了 ISO 27001 认证,并被英国统计局认可为《数字经济法案》规定的处理商。UK LLC 公众参与计划成员的积极参与确保了 UK LLC 为 LPS 参与者和广大公众所接受。所有英国 LPS 都有资格被纳入。研究人员可以通过满足 LPS、链接数据所有者需求的方法申请访问 TRE,其中包括由公众贡献者进行审查:结果:迄今为止,已有 22 个 LPS 加入英国 LLC。在权限允许的情况下,参与者与英格兰国家医疗服务系统(NHS)、威尔士国家医疗服务系统(NHS)和地方记录进行了链接,与苏格兰国家医疗服务系统(NHS)和非医疗行政记录(包括就业与养老金部和英国税务海关总署)的链接工作正在进行中。UK LLC Explore 允许潜在研究人员发现 TRE 中可用数据的广度。所有申请都列在 UK LLC 的公开数据访问注册表(Data Access Register.Conclusions)上:英国有限责任公司使研究人员能够查询汇集的 LPS 参与者数据,这些数据与不同的记录进行了系统链接。英国有限责任公司对更多的 LPS 加入合作伙伴关系持开放态度,并将增加数据的广度,为纵向研究界提供支持,吸引更多跨学科、跨政府部门和跨行业的研究人员。
{"title":"UK Longitudinal Linkage Collaboration (UK LLC): The National Trusted Research Environment for Longitudinal Research.","authors":"Andy Boyd, Katharine M Evans, Emma L Turner, Robin Flaig, Jacqui Oakley, Kirsteen C Campbell, Richard Thomas, Stela McLachlan, Matthew Crane, Rebecca Whitehorn, Rachel Calkin, Abigail Hill, Samantha Berman, David Ford, Martin Tobin, David Porteous, Danielle F Gomes, Maria-Paz Garcia, Andrew Wong, Aida Sanchez, Chris Orton, Simon Thompson, John Gulliver, Kathryn Adams, Ellena Badrick, Chiara Batini, Michaela Benzeval, Susie Boatman, Gerome Breen, Shannon Bristow, Abigail Britten, Luke Bryant, Adam Butterworth, Archie Campbell, Sarah Chave, John Danesh, Jayati Das-Munshi, Karen Dennison, Emanuele Di Angelantonio, Thalia C Eley, Helen Fisher, Emla Fitzsimons, Alissa Goodman, Michael Gregg, Anna L Guyatt, Anna Hansell, Rebecca Harmston, Andy Heard, Morag Henderson, Rosie Hill, Szu-Chia Huang, Catherine John, Frank Kee, Nathalie Kingston, Jack Kneeshaw, Rashmi Kumar, Genevieve Lachance, Celestine Lockhart, Hazel Lockhart-Jones, Sarah Markham, Dan Mason, Bernadette McGuinness, Maisie McKenzie, Amy McMahon, Chelsea Mika Malouf, Mark Mumme, Charlotte Neville, Kate Northstone, Zoe Oldfield, Dara O'Neill, Manish Pareek, John Pickavance, Yasmin Rahman, Holly Reilly, Angela Scott, Deb Smith, Andrew Steptoe, Claire Steves, Cathie Sudlow, Gerald Sze, Nicholas L Timpson, Tapiwa Tungamirai, Laura Venn, Matthew Walker, Neil Walker, Nicolas Wareham, Aidan Watmuff, Tony Webb, Karen Williams, John Wright, Darioush Yarand, George B Ploubidis, John Macleod, Jonathan Ac Sterne, Nishi Chaturvedi","doi":"10.23889/ijpds.v10i1.2468","DOIUrl":"10.23889/ijpds.v10i1.2468","url":null,"abstract":"<p><strong>Introduction: </strong>The UK Longitudinal Linkage Collaboration (UK LLC) is the national Trusted Research Environment (TRE) for the UK's longitudinal research community, supporting the UK's unparalleled collection of Longitudinal Population Studies (LPS). Initially set up as a COVID-19 research resource, UK LLC is now a generic database for any research for the public good.</p><p><strong>Objectives: </strong>UK LLC supports longitudinal research by providing record linkage and TRE services.</p><p><strong>Methods: </strong>The UK LLC partnership provides a secure analytics environment, a trusted third-party linkage processor and a comprehensive governance framework to minimise risks to participant confidentiality. UK LLC is ISO 27001 certified and accredited by the UK Statistics Authority as a processor under the Digital Economy Act. The active involvement by members of UK LLC's public involvement programme ensures UK LLC is acceptable to LPS participants and the wider public. All UK LPS are eligible for inclusion. Researchers can apply to access the TRE via an approach that fulfils the needs of the LPS, the linked data owners and includes a review by public contributors.</p><p><strong>Results: </strong>Twenty-two LPS have so far joined UK LLC. Where permissions allow, participants are linked to their National Health Service (NHS) England, NHS Wales and place-based records, with work ongoing to link to NHS Scotland and non-health administrative records, including Department for Work and Pensions and His Majesty's (HM) Revenue and Customs. UK LLC Explore allows potential researchers to discover the breadth of data available in the TRE. All applications are listed on UK LLC's publicly accessible Data Access Register.</p><p><strong>Conclusions: </strong>UK LLC enables researchers to interrogate pooled LPS participant data that are systematically linked to diverse records. UK LLC remains open to additional LPS joining the partnership and will increase the breadth of data to support the longitudinal research community and attract increasing numbers of researchers across multiple disciplines, government departments and industry.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2468"},"PeriodicalIF":2.2,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11931487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13eCollection Date: 2025-01-01DOI: 10.23889/ijpds.v10i1.2475
Selin Akaraci, Alison Macfarlane, Amal Rammah, Emilie Courtin, Esther Lewis, Faith Miller, Jason Powell-Bavester, Jessica Mitchell, Joana Cruz, Matthew Lilliman, Niloofar Shoari, Samantha Hajna, Steven Cummins, Tolu Adedire, Vahe Nafilyan, Pia Hardelid
Introduction: Environmental exposures are known to affect the health and well-being of populations throughout the life course. Children are particularly susceptible to environmental impacts on educational and health outcomes as they spend more time in their local environments compared to adults. In England, no national, longitudinal dataset linking information about the physical and social environment in and around homes and schools to children's health and education outcomes currently exists. This limits our understanding of how environments might impact the health and well-being of children as they grow up.
Objective: To establish the Kids' Environment and Health Cohort, a research-ready, de-identified and annually updated national birth cohort of all children born in England from 2006 onwards.
Methods: The Kids' Environment and Health Cohort will link birth and mortality records, health and educational attainment datasets, to maternal health (up to 12 months prior to their child's birth), and environmental data for all children born in England from 2006 - approximately 11 million children at first build. A subset of children born between 2010 and 2012, and between 2020 and 2022 will be linked to their mothers' 2011 or 2021 Census records, respectively. The cohort database will be held in, and accessed via, a trusted research environment (TRE) at the Office for National Statistics (ONS). All geographical identifiers in the cohort, allowing for linkage to further environmental data, will be securely held by the ONS, separately to the main cohort, and will be encrypted before being shared with researchers.
Conclusion: The Kids' Environment and Health Cohort will, for the first time, link administrative health and education data to longitudinal environmental exposures for children at national level in England. It will serve as a data resource to support research about the health and well-being of children via improved home and school environments.
{"title":"Kids' Environment and Health Cohort: Database Protocol: supplementary appendix.","authors":"Selin Akaraci, Alison Macfarlane, Amal Rammah, Emilie Courtin, Esther Lewis, Faith Miller, Jason Powell-Bavester, Jessica Mitchell, Joana Cruz, Matthew Lilliman, Niloofar Shoari, Samantha Hajna, Steven Cummins, Tolu Adedire, Vahe Nafilyan, Pia Hardelid","doi":"10.23889/ijpds.v10i1.2475","DOIUrl":"10.23889/ijpds.v10i1.2475","url":null,"abstract":"<p><strong>Introduction: </strong>Environmental exposures are known to affect the health and well-being of populations throughout the life course. Children are particularly susceptible to environmental impacts on educational and health outcomes as they spend more time in their local environments compared to adults. In England, no national, longitudinal dataset linking information about the physical and social environment in and around homes and schools to children's health and education outcomes currently exists. This limits our understanding of how environments might impact the health and well-being of children as they grow up.</p><p><strong>Objective: </strong>To establish the Kids' Environment and Health Cohort, a research-ready, de-identified and annually updated national birth cohort of all children born in England from 2006 onwards.</p><p><strong>Methods: </strong>The Kids' Environment and Health Cohort will link birth and mortality records, health and educational attainment datasets, to maternal health (up to 12 months prior to their child's birth), and environmental data for all children born in England from 2006 - approximately 11 million children at first build. A subset of children born between 2010 and 2012, and between 2020 and 2022 will be linked to their mothers' 2011 or 2021 Census records, respectively. The cohort database will be held in, and accessed via, a trusted research environment (TRE) at the Office for National Statistics (ONS). All geographical identifiers in the cohort, allowing for linkage to further environmental data, will be securely held by the ONS, separately to the main cohort, and will be encrypted before being shared with researchers.</p><p><strong>Conclusion: </strong>The Kids' Environment and Health Cohort will, for the first time, link administrative health and education data to longitudinal environmental exposures for children at national level in England. It will serve as a data resource to support research about the health and well-being of children via improved home and school environments.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2475"},"PeriodicalIF":1.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-11eCollection Date: 2025-01-01DOI: 10.23889/ijpds.v10i1.2519
Maria Peppa, Kate M Lewis, Bianca De Stavola, Pia Hardelid, Ruth Gilbert
Background: Children with major congenital anomalies (MCAs) disproportionately experience complex health problems requiring additional health and educational support.
Objectives: To describe survival to the start of school and recorded special educational needs (SEN) provision among children with and without administrative record-identified MCAs in England. We present results for 12 system-specific MCA subgroups and 25 conditions. We also describe the change of prevalence in recorded SEN provision before and after SEN reforms in 2014, which were implemented to improve and streamline SEN provision.
Methods: We created a birth cohort of 6,180,400 singleton children born in England between 1 September 2003 and 31 August 2013 using linked administrative records from the ECHILD database. MCAs were identified using hospital admission and mortality records during infancy. SEN provision in primary school was defined by one or more recording of SEN provision in state-school records during years 1 to 6 (ages 5/6 years to 10/11 years).
Results: Children with any MCA had a 5-year survival rate of 95.1% (95% confidence interval (CI) 95.0, 95.2) compared with 99.7% (95% CI 99.7, 99.7) among children without an MCA. 41.6% (75,381/181,324) of children with an MCA had any recorded SEN provision in primary school compared with 25.7% (1,285,572/5,008,598) of unaffected children. Of the 12 system-specific MCA subgroups, children with chromosomal, nervous system and eye anomalies had the highest prevalence of recorded SEN provision. The prevalence of recorded SEN provision decreased by 4.8% (99% CI -5.4, -4.3) for children with any MCA compared with a reduction of 4.2% (99% CI -4.3, -4.2) for unaffected children, when comparing pupils in year 1 before and after 2014.
Conclusion: We observed that approximately two fifths of children with MCAs have some type of SEN provision recorded during primary school, but this proportion varied according to condition and declined following the 2014 SEN reforms, similar to children unaffected by MCAs.
背景:患有严重先天性异常(MCAs)的儿童不成比例地经历复杂的健康问题,需要额外的健康和教育支持。目的:描述在英格兰有和没有行政记录的mca儿童中生存到学校开始和记录的特殊教育需求(SEN)提供。我们提出了12个系统特定的MCA亚组和25个条件的结果。我们还描述了2014年为改善和简化SEN提供而实施的SEN改革前后记录SEN提供的流行率的变化。方法:我们使用来自ECHILD数据库的相关行政记录,创建了2003年9月1日至2013年8月31日期间在英格兰出生的6180,400名独生子女的出生队列。根据婴儿时期的住院和死亡率记录确定mca。小学的特殊教育条件是由公立学校1至6年级(5/6岁至10/11岁)的一次或多次特殊教育条件记录来定义的。结果:任何MCA患儿的5年生存率为95.1%(95%可信区间(CI) 95.0, 95.2),而无MCA患儿的5年生存率为99.7% (95% CI 99.7, 99.7)。41.6%(75,381/181,324)的MCA儿童在小学有任何记录的SEN规定,而未受影响的儿童为25.7%(1,285,572/5,008,598)。在12个系统特异性MCA亚组中,有染色体、神经系统和眼睛异常的儿童有最高的SEN提供率。对比2014年前后的一年级学生,任何MCA儿童的SEN提供率下降了4.8% (99% CI -5.4, -4.3),而未受影响儿童的SEN提供率下降了4.2% (99% CI -4.3, -4.2)。结论:我们观察到,大约五分之二的MCAs儿童在小学期间记录了某种类型的SEN规定,但这一比例因情况而异,并在2014年SEN改革后下降,类似于未受MCAs影响的儿童。
{"title":"School-recorded special educational needs provision in children with major congenital anomalies: A linked administrative records study of births in England, 2003-2013.","authors":"Maria Peppa, Kate M Lewis, Bianca De Stavola, Pia Hardelid, Ruth Gilbert","doi":"10.23889/ijpds.v10i1.2519","DOIUrl":"10.23889/ijpds.v10i1.2519","url":null,"abstract":"<p><strong>Background: </strong>Children with major congenital anomalies (MCAs) disproportionately experience complex health problems requiring additional health and educational support.</p><p><strong>Objectives: </strong>To describe survival to the start of school and recorded special educational needs (SEN) provision among children with and without administrative record-identified MCAs in England. We present results for 12 system-specific MCA subgroups and 25 conditions. We also describe the change of prevalence in recorded SEN provision before and after SEN reforms in 2014, which were implemented to improve and streamline SEN provision.</p><p><strong>Methods: </strong>We created a birth cohort of 6,180,400 singleton children born in England between 1 September 2003 and 31 August 2013 using linked administrative records from the ECHILD database. MCAs were identified using hospital admission and mortality records during infancy. SEN provision in primary school was defined by one or more recording of SEN provision in state-school records during years 1 to 6 (ages 5/6 years to 10/11 years).</p><p><strong>Results: </strong>Children with any MCA had a 5-year survival rate of 95.1% (95% confidence interval (CI) 95.0, 95.2) compared with 99.7% (95% CI 99.7, 99.7) among children without an MCA. 41.6% (75,381/181,324) of children with an MCA had any recorded SEN provision in primary school compared with 25.7% (1,285,572/5,008,598) of unaffected children. Of the 12 system-specific MCA subgroups, children with chromosomal, nervous system and eye anomalies had the highest prevalence of recorded SEN provision. The prevalence of recorded SEN provision decreased by 4.8% (99% CI -5.4, -4.3) for children with any MCA compared with a reduction of 4.2% (99% CI -4.3, -4.2) for unaffected children, when comparing pupils in year 1 before and after 2014.</p><p><strong>Conclusion: </strong>We observed that approximately two fifths of children with MCAs have some type of SEN provision recorded during primary school, but this proportion varied according to condition and declined following the 2014 SEN reforms, similar to children unaffected by MCAs.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2519"},"PeriodicalIF":1.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-06eCollection Date: 2025-01-01DOI: 10.23889/ijpds.v10i2.2464
Katherine O'Sullivan, Milan Markovic, Jaroslaw Dymiter, Bernhard Scheliga, Chinasa Odo, Katie Wilde
Introduction: We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE).
Methods: Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow.
Results: The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors.
Conclusion: This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance.
导言:我们提出了一个原型解决方案,通过数据出处跟踪提高数据链接过程的透明度和质量保证,旨在协助数据分析师、研究人员和信息管理团队在可信研究环境(TRE)中验证和审核数据工作流:我们与数据分析师、研究人员和信息管理团队采用参与式设计流程,进行了背景调查、用户需求访谈、共同设计研讨会和低保真原型评估。公众参与活动是这些方法的基础,以确保项目和方法符合公众对半自动化数据处理的信任。这些活动有助于为技术实施方法提供信息,应用PROV-O本体,按照四步关联开放术语方法创建衍生本体,并开发自动脚本,为数据处理工作流收集出处信息:结果:由此产生的可信研究环境出处资源管理器(PE-TRE)交互式工具显示了从使用衍生的SHP本体描述的知识图谱中提取的数据关联信息,以及基于规则的验证检查结果。用户评估证实,PE-TRE 将有助于提高数据关联的质量并减少数据处理错误:本项目展示了在 TRE 中提高透明度和质量保证的下一阶段工作,即在整个数据处理生命周期中通过一个工具实现数据跟踪的半自动化和系统化,从而提高透明度、公开性和质量保证。
{"title":"Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments.","authors":"Katherine O'Sullivan, Milan Markovic, Jaroslaw Dymiter, Bernhard Scheliga, Chinasa Odo, Katie Wilde","doi":"10.23889/ijpds.v10i2.2464","DOIUrl":"10.23889/ijpds.v10i2.2464","url":null,"abstract":"<p><strong>Introduction: </strong>We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE).</p><p><strong>Methods: </strong>Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow.</p><p><strong>Results: </strong>The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors.</p><p><strong>Conclusion: </strong>This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 2","pages":"2464"},"PeriodicalIF":1.6,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11931605/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04eCollection Date: 2025-01-01DOI: 10.23889/ijpds.v10i1.2391
Rosaleen P Cornish, Alison Teyhan, Kate Tilling, John Macleod, Iain Brennan
Introduction: Determining risk factors and consequences of serious violence requires accurate measures of violence. Self-reported and police-recorded offending are subject to different sources of bias.
Objectives: To compare risk of self-reported and police-recorded serious violence perpetration in late adolescence and early adulthood using linked UK birth cohort and police data, to examine the association between cohort participation and police-recorded violence, and to use police-records to impute missing self-reported data on violence.
Methods: We included individuals in the Avon Longitudinal Study of Parents and Children (ALSPAC) who had been informed about the study's use of their linked data and had not opted out of linkage to police records (n = 12,662). We used descriptive statistics and logistic regression to address our objectives. Multiple imputation using chained equations was used to impute self-reported violence data to examine the likely impact of missing data on estimates of prevalence.
Results: Self-reported violence perpetration in the past year ranged from 5.3% (at 25 years) to 12.9% (at 20 years) among males and 3.2% (at 17, 22, 24 and 25 years) to 6.4% (at 18 years) among females. Police-recorded serious violence was lower at all ages, peaking at 17-18 years (1.7% among males, 0.5% among females). Study participation was lower among people who had or went on to have a police record for serious violence; as a result, the prevalence of self-reported violence in the imputed data was higher (compared to observed data) at all ages.
Conclusions: Overall, our study demonstrates the difficulties in measuring violence. While we have shown that a key advantage of linkage to police records is it enables outcomes to be measured irrespective of study participation, police data undercounts serious violence. Further, observational studies may also underestimate violence perpetration as individuals with police-recorded serious violence are less likely to participate in research. Therefore, while record linkage allows the advantages of both official police records and self-reported measures to be exploited, it does not negate their limitations.
{"title":"Measuring serious violence perpetration: comparison of police-recorded and self-reported data in a UK cohort.","authors":"Rosaleen P Cornish, Alison Teyhan, Kate Tilling, John Macleod, Iain Brennan","doi":"10.23889/ijpds.v10i1.2391","DOIUrl":"10.23889/ijpds.v10i1.2391","url":null,"abstract":"<p><strong>Introduction: </strong>Determining risk factors and consequences of serious violence requires accurate measures of violence. Self-reported and police-recorded offending are subject to different sources of bias.</p><p><strong>Objectives: </strong>To compare risk of self-reported and police-recorded serious violence perpetration in late adolescence and early adulthood using linked UK birth cohort and police data, to examine the association between cohort participation and police-recorded violence, and to use police-records to impute missing self-reported data on violence.</p><p><strong>Methods: </strong>We included individuals in the Avon Longitudinal Study of Parents and Children (ALSPAC) who had been informed about the study's use of their linked data and had not opted out of linkage to police records (n = 12,662). We used descriptive statistics and logistic regression to address our objectives. Multiple imputation using chained equations was used to impute self-reported violence data to examine the likely impact of missing data on estimates of prevalence.</p><p><strong>Results: </strong>Self-reported violence perpetration in the past year ranged from 5.3% (at 25 years) to 12.9% (at 20 years) among males and 3.2% (at 17, 22, 24 and 25 years) to 6.4% (at 18 years) among females. Police-recorded serious violence was lower at all ages, peaking at 17-18 years (1.7% among males, 0.5% among females). Study participation was lower among people who had or went on to have a police record for serious violence; as a result, the prevalence of self-reported violence in the imputed data was higher (compared to observed data) at all ages.</p><p><strong>Conclusions: </strong>Overall, our study demonstrates the difficulties in measuring violence. While we have shown that a key advantage of linkage to police records is it enables outcomes to be measured irrespective of study participation, police data undercounts serious violence. Further, observational studies may also underestimate violence perpetration as individuals with police-recorded serious violence are less likely to participate in research. Therefore, while record linkage allows the advantages of both official police records and self-reported measures to be exploited, it does not negate their limitations.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2391"},"PeriodicalIF":1.6,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}