Pub Date : 2024-02-20eCollection Date: 2024-01-01DOI: 10.23889/ijpds.v6i1.2179
Elizabeth Lemmon, Catherine Hanna, Katharina Diernberger, Hugh M Paterson, Sarah H Wild, Holly Ennis, Peter S Hall
Background: Colorectal cancer (CRC) is the fourth most common type of cancer in the United Kingdom and the second leading cause of cancer death. Despite improvements in CRC survival over time, Scotland lags behind its UK and European counterparts. In this study, we carry out an exploratory analysis which aims to provide contemporary, population level evidence on CRC treatment and survival in Scotland.
Methods: We conducted a retrospective population-based analysis of adults with incident CRC registered on the Scottish Cancer Registry (Scottish Morbidity Record 06 (SMR06)) between January 2006 and December 2018. The CRC cohort was linked to hospital inpatient (SMR01) and National Records of Scotland (NRS) deaths records allowing a description of their demographic, diagnostic and treatment characteristics. Cox proportional hazards regression models were used to explore the demographic and clinical factors associated with all-cause mortality and CRC specific mortality after adjusting for patient and tumour characteristics among people identified as early-stage and treated with surgery.
Results: Overall, 32,691 (73%) and 12,184 (27%) patients had a diagnosis of colon and rectal cancer respectively, of whom 55% and 53% were early-stage and treated with surgery. Five year overall survival (CRC specific survival) within this cohort was 72% (82%) and 76% (84%) for patients with colon and rectal cancer respectively. Cox proportional hazards models revealed significant variation in mortality by sex, area-based deprivation and geographic location.
Conclusions: In a Scottish population of patients with early-stage CRC treated with surgery, there was significant variation in risk of death, even after accounting for clinical factors and patient characteristics.
{"title":"Variation in colorectal cancer treatment and outcomes in Scotland: real world evidence from national linked administrative health data.","authors":"Elizabeth Lemmon, Catherine Hanna, Katharina Diernberger, Hugh M Paterson, Sarah H Wild, Holly Ennis, Peter S Hall","doi":"10.23889/ijpds.v6i1.2179","DOIUrl":"10.23889/ijpds.v6i1.2179","url":null,"abstract":"<p><strong>Background: </strong>Colorectal cancer (CRC) is the fourth most common type of cancer in the United Kingdom and the second leading cause of cancer death. Despite improvements in CRC survival over time, Scotland lags behind its UK and European counterparts. In this study, we carry out an exploratory analysis which aims to provide contemporary, population level evidence on CRC treatment and survival in Scotland.</p><p><strong>Methods: </strong>We conducted a retrospective population-based analysis of adults with incident CRC registered on the Scottish Cancer Registry (Scottish Morbidity Record 06 (SMR06)) between January 2006 and December 2018. The CRC cohort was linked to hospital inpatient (SMR01) and National Records of Scotland (NRS) deaths records allowing a description of their demographic, diagnostic and treatment characteristics. Cox proportional hazards regression models were used to explore the demographic and clinical factors associated with all-cause mortality and CRC specific mortality after adjusting for patient and tumour characteristics among people identified as early-stage and treated with surgery.</p><p><strong>Results: </strong>Overall, 32,691 (73%) and 12,184 (27%) patients had a diagnosis of colon and rectal cancer respectively, of whom 55% and 53% were early-stage and treated with surgery. Five year overall survival (CRC specific survival) within this cohort was 72% (82%) and 76% (84%) for patients with colon and rectal cancer respectively. Cox proportional hazards models revealed significant variation in mortality by sex, area-based deprivation and geographic location.</p><p><strong>Conclusions: </strong>In a Scottish population of patients with early-stage CRC treated with surgery, there was significant variation in risk of death, even after accounting for clinical factors and patient characteristics.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"9 1","pages":"2179"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10929767/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140111689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-09DOI: 10.23889/ijpds.v9i1.2137
Richard Silverwood, Nasir Rajah, Lisa Calderwood, Bianca De Stavola, Katie Harron, George Ploubidis
IntroductionRecent years have seen an increase in linkages between survey and administrative data. It is important to evaluate the quality of such data linkages to discern the likely reliability of ensuing research. Evaluation of linkage quality and bias can be conducted using different approaches, but many of these are not possible when there is a separation of processes for linkage and analysis to help preserve privacy, as is typically the case in the UK (and elsewhere). ObjectivesWe aimed to describe a suite of generalisable methods to evaluate linkage quality and population representativeness of linked survey and administrative data which remain tractable when users of the linked data are not party to the linkage process itself. We emphasise issues particular to longitudinal survey data throughout. MethodsOur proposed approaches cover several areas: i) Linkage rates, ii) Selection into response, linkage consent and successful linkage, iii) Linkage quality, and iv) Linked data population representativeness. We illustrate these methods using a recent linkage between the 1958 National Child Development Study (NCDS; a cohort following an initial 17,415 people born in Great Britain in a single week of 1958) and Hospital Episode Statistics (HES) databases (containing important information regarding admissions, accident and emergency attendances and outpatient appointments at NHS hospitals in England). ResultsOur illustrative analyses suggest that the linkage quality of the NCDS-HES data is high and that the linked sample maintains an excellent level of population representativeness with respect to the single dimension we assessed. ConclusionsThrough this work we hope to encourage providers and users of linked data resources to undertake and publish thorough evaluations. We further hope that providing illustrative analyses using linked NCDS-HES data will improve the quality and transparency of research using this particular linked data resource.
{"title":"Examining the quality and population representativeness of linked survey and administrative data: guidance and illustration using linked 1958 National Child Development Study and Hospital Episode Statistics data","authors":"Richard Silverwood, Nasir Rajah, Lisa Calderwood, Bianca De Stavola, Katie Harron, George Ploubidis","doi":"10.23889/ijpds.v9i1.2137","DOIUrl":"https://doi.org/10.23889/ijpds.v9i1.2137","url":null,"abstract":"IntroductionRecent years have seen an increase in linkages between survey and administrative data. It is important to evaluate the quality of such data linkages to discern the likely reliability of ensuing research. Evaluation of linkage quality and bias can be conducted using different approaches, but many of these are not possible when there is a separation of processes for linkage and analysis to help preserve privacy, as is typically the case in the UK (and elsewhere).\u0000ObjectivesWe aimed to describe a suite of generalisable methods to evaluate linkage quality and population representativeness of linked survey and administrative data which remain tractable when users of the linked data are not party to the linkage process itself. We emphasise issues particular to longitudinal survey data throughout.\u0000MethodsOur proposed approaches cover several areas: i) Linkage rates, ii) Selection into response, linkage consent and successful linkage, iii) Linkage quality, and iv) Linked data population representativeness. We illustrate these methods using a recent linkage between the 1958 National Child Development Study (NCDS; a cohort following an initial 17,415 people born in Great Britain in a single week of 1958) and Hospital Episode Statistics (HES) databases (containing important information regarding admissions, accident and emergency attendances and outpatient appointments at NHS hospitals in England).\u0000ResultsOur illustrative analyses suggest that the linkage quality of the NCDS-HES data is high and that the linked sample maintains an excellent level of population representativeness with respect to the single dimension we assessed.\u0000ConclusionsThrough this work we hope to encourage providers and users of linked data resources to undertake and publish thorough evaluations. We further hope that providing illustrative analyses using linked NCDS-HES data will improve the quality and transparency of research using this particular linked data resource.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"50 38","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139441975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-14DOI: 10.23889/ijpds.v8i6.2173
Louise Marryat, Jacqueline Stephen, Jacqueline Mok, Sharon Vincent, Charlotte Kirk, Lindsay Logie, John Devaney, Rachael Wood
IntroductionChild maltreatment affects a substantial number of children. However current evidence relies on either longitudinal studies, which are complex and resource-intensive, or linked data studies based on social services data, which is arguably the tip of the iceberg in terms of children who are maltreated. Reliable, linked, population-level data on children referred to services due to suspected abuse or neglect will increase our ability to examine risk factors for, and outcomes following, abuse and neglect. ObjectiveThe objective of this project was to create a linkable population level dataset, The Edinburgh Child Protection Dataset (ECPD), comprising all children referred to the Edinburgh Child Protection Paediatric healthcare team due to a concern about their welfare between 1995 and 2015. MethodsThe paper presents the process for creating the dataset. The analyses provide examples of available data from the main referrals dataset between 1995 and 2011 (where data quality was highest). Results19,969 referrals were captured, relating to 11,653 children. Of the 19,969 referrals, a higher proportion were girls (54%), although boys were referred for physical abuse more often than girls (41% versus 30%). Younger children were more likely to be referred for physical abuse (35% of 0-4 year olds vs. 27% 15+): older children were more likely to be referred for sexual abuse (48% of 15+ years vs. 18% of 0-4 years). Most referrals came from social workers (46%) or police (31%). ConclusionsThe ECPD offers a unique insight into the characteristics of referrals to child protection paediatric services over a key period in the history of child protection in Scotland. It is hoped that by making these data available to researchers, and able to be easily linked with both mother and child current and future health records, evidence will be created to better support maltreated children and monitor changes over time.
{"title":"Data resource profile: the Edinburgh Child Protection Dataset - a new linked administrative data source of children referred to Child Protection paediatric services in Edinburgh, Scotland","authors":"Louise Marryat, Jacqueline Stephen, Jacqueline Mok, Sharon Vincent, Charlotte Kirk, Lindsay Logie, John Devaney, Rachael Wood","doi":"10.23889/ijpds.v8i6.2173","DOIUrl":"https://doi.org/10.23889/ijpds.v8i6.2173","url":null,"abstract":"IntroductionChild maltreatment affects a substantial number of children. However current evidence relies on either longitudinal studies, which are complex and resource-intensive, or linked data studies based on social services data, which is arguably the tip of the iceberg in terms of children who are maltreated. Reliable, linked, population-level data on children referred to services due to suspected abuse or neglect will increase our ability to examine risk factors for, and outcomes following, abuse and neglect.\u0000ObjectiveThe objective of this project was to create a linkable population level dataset, The Edinburgh Child Protection Dataset (ECPD), comprising all children referred to the Edinburgh Child Protection Paediatric healthcare team due to a concern about their welfare between 1995 and 2015.\u0000MethodsThe paper presents the process for creating the dataset. The analyses provide examples of available data from the main referrals dataset between 1995 and 2011 (where data quality was highest).\u0000Results19,969 referrals were captured, relating to 11,653 children. Of the 19,969 referrals, a higher proportion were girls (54%), although boys were referred for physical abuse more often than girls (41% versus 30%). Younger children were more likely to be referred for physical abuse (35% of 0-4 year olds vs. 27% 15+): older children were more likely to be referred for sexual abuse (48% of 15+ years vs. 18% of 0-4 years). Most referrals came from social workers (46%) or police (31%).\u0000ConclusionsThe ECPD offers a unique insight into the characteristics of referrals to child protection paediatric services over a key period in the history of child protection in Scotland. It is hoped that by making these data available to researchers, and able to be easily linked with both mother and child current and future health records, evidence will be created to better support maltreated children and monitor changes over time.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138972918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-14DOI: 10.23889/ijpds.v8i1.2165
F. Ritchie, Amy Tilbrook, Christian Cole, Emily Jefferson, Susan Krueger, Esma Mansouri-Bensassi, Simon Rogers, Jim Q. Smith
IntroductionTrusted research environments (TREs) provide secure access to very sensitive data for research. All TREs operate manual checks on outputs to ensure there is no residual disclosure risk. Machine learning (ML) models require very large amount of data; if this data is personal, the TRE is a well-established data management solution. However, ML models present novel disclosure risks, in both type and scale. ObjectivesAs part of a series on ML disclosure risk in TREs, this article is intended to introduce TRE managers to the conceptual problems and work being done to address them. MethodsWe demonstrate how ML models present a qualitatively different type of disclosure risk, compared to traditional statistical outputs. These arise from both the nature and the scale of ML modelling. ResultsWe show that there are a large number of unresolved issues, although there is progress in many areas. We show where areas of uncertainty remain, as well as remedial responses available to TREs. ConclusionsAt this stage, disclosure checking of ML models is very much a specialist activity. However, TRE managers need a basic awareness of the potential risk in ML models to enable them to make sensible decisions on using TREs for ML model development.
导言受信任的研究环境(TRE)为研究提供了对非常敏感数据的安全访问。所有 TRE 都会对输出结果进行人工检查,以确保不存在残余披露风险。机器学习 (ML) 模型需要大量数据;如果这些数据是个人数据,则 TRE 是一种成熟的数据管理解决方案。作为 TRE 中的 ML 披露风险系列文章的一部分,本文旨在向 TRE 管理人员介绍概念性问题以及为解决这些问题而开展的工作。这些风险源于 ML 建模的性质和规模。结果我们表明,尽管在许多领域取得了进展,但仍有大量问题尚未解决。结论在现阶段,对 ML 模型进行披露检查在很大程度上是一项专业活动。然而,TRE 管理者需要对 ML 模型的潜在风险有基本的认识,以便在使用 TRE 进行 ML 模型开发时做出明智的决定。
{"title":"Machine learning models in trusted research environments -- understanding operational risks","authors":"F. Ritchie, Amy Tilbrook, Christian Cole, Emily Jefferson, Susan Krueger, Esma Mansouri-Bensassi, Simon Rogers, Jim Q. Smith","doi":"10.23889/ijpds.v8i1.2165","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2165","url":null,"abstract":"IntroductionTrusted research environments (TREs) provide secure access to very sensitive data for research. All TREs operate manual checks on outputs to ensure there is no residual disclosure risk. Machine learning (ML) models require very large amount of data; if this data is personal, the TRE is a well-established data management solution. However, ML models present novel disclosure risks, in both type and scale.\u0000ObjectivesAs part of a series on ML disclosure risk in TREs, this article is intended to introduce TRE managers to the conceptual problems and work being done to address them.\u0000MethodsWe demonstrate how ML models present a qualitatively different type of disclosure risk, compared to traditional statistical outputs. These arise from both the nature and the scale of ML modelling.\u0000ResultsWe show that there are a large number of unresolved issues, although there is progress in many areas. We show where areas of uncertainty remain, as well as remedial responses available to TREs.\u0000ConclusionsAt this stage, disclosure checking of ML models is very much a specialist activity. However, TRE managers need a basic awareness of the potential risk in ML models to enable them to make sensible decisions on using TREs for ML model development.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"261 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138972997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.23889/ijpds.v8i1.2153
Bekelu Negash, Alan Katz, Christine J. Neilson, Moniruzzaman Moni, Marc Nesca, Alexander Singer, J. Enns
IntroductionUsing data in research often requires that the data first be de-identified, particularly in the case of health data, which often include Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHII). There are established procedures for de-identifying structured data, but de-identifying clinical notes, electronic health records, and other records that include free text data is more complex. Several different ways to achieve this are documented in the literature. This scoping review identifies categories of de-identification methods that can be used for free text data. MethodsWe adopted an established scoping review methodology to examine review articles published up to May 9, 2022, in Ovid MEDLINE; Ovid Embase; Scopus; the ACM Digital Library; IEEE Explore; and Compendex. Our research question was: What methods are used to de-identify free text data? Two independent reviewers conducted title and abstract screening and full-text article screening using the online review management tool Covidence. ResultsThe initial literature search retrieved 3,312 articles, most of which focused primarily on structured data. Eighteen publications describing methods of de-identification of free text data met the inclusion criteria for our review. The majority of the included articles focused on removing categories of personal health information identified by the Health Insurance Portability and Accountability Act (HIPAA). The de-identification methods they described combined rule-based methods or machine learning with other strategies such as deep learning. ConclusionOur review identifies and categorises de-identification methods for free text data as rule-based methods, machine learning, deep learning and a combination of these and other approaches. Most of the articles we found in our search refer to de-identification methods that target some or all categories of PHII. Our review also highlights how de-identification systems for free text data have evolved over time and points to hybrid approaches as the most promising approach for the future.
{"title":"De-identification of Free Text Data containing Personal Health Information: A Scoping Review of Reviews","authors":"Bekelu Negash, Alan Katz, Christine J. Neilson, Moniruzzaman Moni, Marc Nesca, Alexander Singer, J. Enns","doi":"10.23889/ijpds.v8i1.2153","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2153","url":null,"abstract":"IntroductionUsing data in research often requires that the data first be de-identified, particularly in the case of health data, which often include Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHII). There are established procedures for de-identifying structured data, but de-identifying clinical notes, electronic health records, and other records that include free text data is more complex. Several different ways to achieve this are documented in the literature. This scoping review identifies categories of de-identification methods that can be used for free text data.\u0000MethodsWe adopted an established scoping review methodology to examine review articles published up to May 9, 2022, in Ovid MEDLINE; Ovid Embase; Scopus; the ACM Digital Library; IEEE Explore; and Compendex. Our research question was: What methods are used to de-identify free text data? Two independent reviewers conducted title and abstract screening and full-text article screening using the online review management tool Covidence.\u0000ResultsThe initial literature search retrieved 3,312 articles, most of which focused primarily on structured data. Eighteen publications describing methods of de-identification of free text data met the inclusion criteria for our review. The majority of the included articles focused on removing categories of personal health information identified by the Health Insurance Portability and Accountability Act (HIPAA). The de-identification methods they described combined rule-based methods or machine learning with other strategies such as deep learning.\u0000ConclusionOur review identifies and categorises de-identification methods for free text data as rule-based methods, machine learning, deep learning and a combination of these and other approaches. Most of the articles we found in our search refer to de-identification methods that target some or all categories of PHII. Our review also highlights how de-identification systems for free text data have evolved over time and points to hybrid approaches as the most promising approach for the future.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"63 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139009912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-04eCollection Date: 2023-01-01DOI: 10.23889/ijpds.v8i2.2159
Amy Hawn Nelson, Sharon Zanti
Introduction: This paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use.
Objectives: While this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context.
Methods: The framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States.
Results: The Four Questions-Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?-should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework.
Conclusions: A robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project.
Highlights: Strong data governance has five qualities: it is purpose-, value-, and principle-driven; strategically located; collaborative; iterative; and transparent.Through a series of public deliberation workgroups and 15 years of field experience, we developed a Four Question Framework to determine whether and how to move forward with building an IDS and at each stage of a data sharing and integration project.The Four Questions-Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?-should be carefully considered within established data governance processes and among core partners.
{"title":"Four questions to guide decision-making for data sharing and integration.","authors":"Amy Hawn Nelson, Sharon Zanti","doi":"10.23889/ijpds.v8i2.2159","DOIUrl":"10.23889/ijpds.v8i2.2159","url":null,"abstract":"<p><strong>Introduction: </strong>This paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use.</p><p><strong>Objectives: </strong>While this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context.</p><p><strong>Methods: </strong>The framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States.</p><p><strong>Results: </strong>The Four Questions-<i>Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?</i>-should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework.</p><p><strong>Conclusions: </strong>A robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project.</p><p><strong>Highlights: </strong>Strong data governance has five qualities: it is purpose-, value-, and principle-driven; strategically located; collaborative; iterative; and transparent.Through a series of public deliberation workgroups and 15 years of field experience, we developed a Four Question Framework to determine whether and how to move forward with building an IDS and at each stage of a data sharing and integration project.The Four Questions-<i>Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)?</i>-should be carefully considered within established data governance processes and among core partners.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 4","pages":"2159"},"PeriodicalIF":2.2,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10900076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139991374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-04DOI: 10.1101/2023.05.30.23290729
H. Brewer, Q. Jiang, S. Sundar, Y. Hirst, J. Flanagan
Objective: Antihistamine use has previously been associated with a reduction in incidence of ovarian cancer, particularly in premenopausal women. Herein, we investigate antihistamine exposure in relation to ovarian cancer risk using a novel data resource by examining purchase histories from retailer loyalty card data. Study Design: A subset of participants from the Cancer Loyalty Card Study (CLOCS) for which purchase histories were available were analysed in this study. Cases (n=153) were women in the UK with a first diagnosis of ovarian cancer between Jan 2018 to Jan 2022. Controls (n=120) were women in the UK without a diagnosis of ovarian cancer. Up to 6 years of purchase history was retrieved from two participating high street retailers from 2014 to 2022. Main outcome measures: Logistic regression was used to estimate the odds ratio (OR) and 95% confidence intervals (CIs) for ovarian cancer associated with antihistamine purchases, ever versus never, adjusting for age and oral contraceptive use. The association was stratified by season of purchase, age over and under 50 years, ovarian cancer histology, and family history. Results: Ever purchasing antihistamines was not significantly associated with ovarian cancer overall in this small study (OR:0.68, 95% CI: 0.39,1.19). However, antihistamine purchases were significantly associated with reduced ovarian cancer risk when purchased only in spring and/or summer (OR: 0.37, 95% CI: 0.17,0.82) compared with purchasing all year (OR: 0.99, 95% CI: 0.51,1.92). In the stratified analysis, the association was strongest in non-serous ovarian cancer (OR: 0.41, 95% CI:0.18,0.93). Conclusions: Antihistamine purchase is associated with reduced ovarian cancer risk when purchased seasonally in spring and summer. However, larger studies and more research is required to understand the mechanisms of reduced ovarian cancer risk related to seasonal purchases of antihistamines and allergies.
{"title":"Seasonal purchase of antihistamines and ovarian cancer risk in the Cancer Loyalty Card Study (CLOCS): results from an observational case-control study","authors":"H. Brewer, Q. Jiang, S. Sundar, Y. Hirst, J. Flanagan","doi":"10.1101/2023.05.30.23290729","DOIUrl":"https://doi.org/10.1101/2023.05.30.23290729","url":null,"abstract":"Objective: Antihistamine use has previously been associated with a reduction in incidence of ovarian cancer, particularly in premenopausal women. Herein, we investigate antihistamine exposure in relation to ovarian cancer risk using a novel data resource by examining purchase histories from retailer loyalty card data. Study Design: A subset of participants from the Cancer Loyalty Card Study (CLOCS) for which purchase histories were available were analysed in this study. Cases (n=153) were women in the UK with a first diagnosis of ovarian cancer between Jan 2018 to Jan 2022. Controls (n=120) were women in the UK without a diagnosis of ovarian cancer. Up to 6 years of purchase history was retrieved from two participating high street retailers from 2014 to 2022. Main outcome measures: Logistic regression was used to estimate the odds ratio (OR) and 95% confidence intervals (CIs) for ovarian cancer associated with antihistamine purchases, ever versus never, adjusting for age and oral contraceptive use. The association was stratified by season of purchase, age over and under 50 years, ovarian cancer histology, and family history. Results: Ever purchasing antihistamines was not significantly associated with ovarian cancer overall in this small study (OR:0.68, 95% CI: 0.39,1.19). However, antihistamine purchases were significantly associated with reduced ovarian cancer risk when purchased only in spring and/or summer (OR: 0.37, 95% CI: 0.17,0.82) compared with purchasing all year (OR: 0.99, 95% CI: 0.51,1.92). In the stratified analysis, the association was strongest in non-serous ovarian cancer (OR: 0.41, 95% CI:0.18,0.93). Conclusions: Antihistamine purchase is associated with reduced ovarian cancer risk when purchased seasonally in spring and summer. However, larger studies and more research is required to understand the mechanisms of reduced ovarian cancer risk related to seasonal purchases of antihistamines and allergies.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43794449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-11eCollection Date: 2023-01-01DOI: 10.23889/ijpds.v8i1.2113
Francesca L Cavallaro, Rebecca Cannings-John, Fiona Lugg-Widger, Ruth Gilbert, Eilis Kennedy, Sally Kendall, Michael Robling, Katie L Harron
Introduction: "Big data" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.
Objectives: We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.
Methods: We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.
Results: Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.
Conclusions: Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.
{"title":"Lessons learned from using linked administrative data to evaluate the Family Nurse Partnership in England and Scotland.","authors":"Francesca L Cavallaro, Rebecca Cannings-John, Fiona Lugg-Widger, Ruth Gilbert, Eilis Kennedy, Sally Kendall, Michael Robling, Katie L Harron","doi":"10.23889/ijpds.v8i1.2113","DOIUrl":"10.23889/ijpds.v8i1.2113","url":null,"abstract":"<p><strong>Introduction: </strong>\"Big data\" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.</p><p><strong>Objectives: </strong>We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.</p><p><strong>Methods: </strong>We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.</p><p><strong>Results: </strong>Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.</p><p><strong>Conclusions: </strong>Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"2113"},"PeriodicalIF":1.6,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/fc/b1/ijpds-08-2113.PMC10476150.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10225318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-06DOI: 10.1101/2023.04.05.23288190
I. Ward, Katie Finning, D. Ayoubkhani, Katie Hendry, E. Sharland, Louis Appleby, V. Nafilyan
Background: Risk of suicide is complex and often a result of multiple interacting factors. It is vital research identifies predictors of suicide to provide a strong evidence base for targeted interventions. Methods: Using linked Census and population level mortality data we estimated rates of suicide across different groups in England and Wales and examine which factors are independently associated with the risk of suicide. Findings: The highest rates of suicide were amongst those who reported an impairment affecting their day-to-day activities, those who were long term unemployed or never had worked, or those who were single or separated. Rates of suicide were highest in the White and Mixed/multiple ethnic groups compared to other ethnicities, and in people who reported a religious affiliation compared with those who had no religion. Comparison of minimally adjusted models (predictor, sex and age) with fully-adjusted models (sex, age, ethnicity, region, partnership status, religious affiliation, day-to-day impairments, armed forces membership and socioeconomic status) identified key predictors which remain important risk factors after accounting for other characteristics; day-to-day impairments were still found to increase the incidence of suicide relative to those whose activities were not impaired after adjusting for employment status. Overall, rates of suicide were higher in men compared to females across all ages, with the highest rates in 40-to-50-year-olds. Interpretation: The findings of this work provide novel population level insights into the risk of suicide by sociodemographic characteristics. Understanding the interaction between key risk factors for suicide has important implications for national suicide prevention strategies.
{"title":"Sociodemographic inequalities of suicide: a population-based cohort study of adults in England and Wales 2011-2021","authors":"I. Ward, Katie Finning, D. Ayoubkhani, Katie Hendry, E. Sharland, Louis Appleby, V. Nafilyan","doi":"10.1101/2023.04.05.23288190","DOIUrl":"https://doi.org/10.1101/2023.04.05.23288190","url":null,"abstract":"Background: Risk of suicide is complex and often a result of multiple interacting factors. It is vital research identifies predictors of suicide to provide a strong evidence base for targeted interventions. Methods: Using linked Census and population level mortality data we estimated rates of suicide across different groups in England and Wales and examine which factors are independently associated with the risk of suicide. Findings: The highest rates of suicide were amongst those who reported an impairment affecting their day-to-day activities, those who were long term unemployed or never had worked, or those who were single or separated. Rates of suicide were highest in the White and Mixed/multiple ethnic groups compared to other ethnicities, and in people who reported a religious affiliation compared with those who had no religion. Comparison of minimally adjusted models (predictor, sex and age) with fully-adjusted models (sex, age, ethnicity, region, partnership status, religious affiliation, day-to-day impairments, armed forces membership and socioeconomic status) identified key predictors which remain important risk factors after accounting for other characteristics; day-to-day impairments were still found to increase the incidence of suicide relative to those whose activities were not impaired after adjusting for employment status. Overall, rates of suicide were higher in men compared to females across all ages, with the highest rates in 40-to-50-year-olds. Interpretation: The findings of this work provide novel population level insights into the risk of suicide by sociodemographic characteristics. Understanding the interaction between key risk factors for suicide has important implications for national suicide prevention strategies.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47342216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Research to date has established that the COVID-19 pandemic has not impacted everyone equitably. Whether this unequitable impact was seen educationally with regards to educator reported barriers to distance learning, concerns and mental health is less clear.
Objective: The objective of this study was to explore the association between the neighbourhood composition of the school and kindergarten educator-reported barriers and concerns regarding children's learning during the first wave of COVID-19 related school closures in Ontario, Canada.
Methods: In the spring of 2020, we collected data from Ontario kindergarten educators (n = 2569; 74.2% kindergarten teachers, 25.8% early childhood educators; 97.6% female) using an online survey asking them about their experiences and challenges with online learning during the first round of school closures. We linked the educator responses to 2016 Canadian Census variables based on schools' postal codes. Bivariate correlations and Poisson regression analyses were used to determine if there was an association between neighbourhood composition and educator mental health, and the number of barriers and concerns reported by kindergarten educators.
Results: There were no significant findings with educator mental health and school neighbourhood characteristics. Educators who taught at schools in neighbourhoods with lower median income reported a greater number of barriers to online learning (e.g., parents/guardians not submitting assignments/providing updates on their child's learning) and concerns regarding the return to school in the fall of 2020 (e.g., students' readjustment to routines). There were no significant associations with educator reported barriers or concerns and any of the other Census neighbourhood variables (proportion of lone parent families, average household size, proportion of population that do no speak official language, proportion of population that are recent immigrants, or proportion of population ages 0-4).
Conclusions: Overall, our study suggests that the neighbourhood composition of the children's school location did not exacerbate the potential negative learning experiences of kindergarten students and educators during the COVID-19 pandemic, although we did find that educators teaching in schools in lower-SES neighbourhoods reported more barriers to online learning during this time. Taken together, our study suggests that remediation efforts should be focused on individual kindergarten children and their families as opposed to school location.
{"title":"Association between neighbourhood composition, kindergarten educator-reported distance learning barriers, and return to school concerns during the first wave of the COVID-19 pandemic in Ontario, Canada.","authors":"Natalie Spadafora, Jade Wang, Caroline Reid-Westoby, Magdalena Janus","doi":"10.23889/ijpds.v7i4.1761","DOIUrl":"10.23889/ijpds.v7i4.1761","url":null,"abstract":"<p><strong>Introduction: </strong>Research to date has established that the COVID-19 pandemic has not impacted everyone equitably. Whether this unequitable impact was seen educationally with regards to educator reported barriers to distance learning, concerns and mental health is less clear.</p><p><strong>Objective: </strong>The objective of this study was to explore the association between the neighbourhood composition of the school and kindergarten educator-reported barriers and concerns regarding children's learning during the first wave of COVID-19 related school closures in Ontario, Canada.</p><p><strong>Methods: </strong>In the spring of 2020, we collected data from Ontario kindergarten educators (<i>n</i> = 2569; 74.2% kindergarten teachers, 25.8% early childhood educators; 97.6% female) using an online survey asking them about their experiences and challenges with online learning during the first round of school closures. We linked the educator responses to 2016 Canadian Census variables based on schools' postal codes. Bivariate correlations and Poisson regression analyses were used to determine if there was an association between neighbourhood composition and educator mental health, and the number of barriers and concerns reported by kindergarten educators.</p><p><strong>Results: </strong>There were no significant findings with educator mental health and school neighbourhood characteristics. Educators who taught at schools in neighbourhoods with lower median income reported a greater number of barriers to online learning (e.g., parents/guardians not submitting assignments/providing updates on their child's learning) and concerns regarding the return to school in the fall of 2020 (e.g., students' readjustment to routines). There were no significant associations with educator reported barriers or concerns and any of the other Census neighbourhood variables (proportion of lone parent families, average household size, proportion of population that do no speak official language, proportion of population that are recent immigrants, or proportion of population ages 0-4).</p><p><strong>Conclusions: </strong>Overall, our study suggests that the neighbourhood composition of the children's school location did not exacerbate the potential negative learning experiences of kindergarten students and educators during the COVID-19 pandemic, although we did find that educators teaching in schools in lower-SES neighbourhoods reported more barriers to online learning during this time. Taken together, our study suggests that remediation efforts should be focused on individual kindergarten children and their families as opposed to school location.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 4","pages":"1761"},"PeriodicalIF":0.0,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/88/13/ijpds-07-1761.PMC10170344.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9845521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}