Pub Date : 2023-11-14DOI: 10.23889/ijpds.v8i1.2150
Samantha J. Lain, Gillian Blue, Bridget O’Malley, David Winlaw, Gary Sholler, Sally Dunwoodie, Natasha Nassar, None The Congenital Heart Disease Synergy Study group
IntroductionContemporary care of congenital heart disease (CHD) is largely standardised, however there is heterogeneity in post-surgical outcomes that may be explained by genetic variation. Data linkage between a CHD biobank and routinely collected administrative datasets is a novel method to identify outcomes to explore the impact of genetic variation. ObjectiveUse data linkage to identify and validate patient outcomes following surgical treatment for CHD. MethodsData linkage between clinical and biobank data of children born from 2001-2014 that had a procedure for CHD in New South Wales, Australia, with hospital discharge data, education and death data. The children were grouped according to CHD lesion type and age at first cardiac surgery. Children in each `lesion/age at surgery group' were classified into 'favourable' and 'unfavourable' cardiovascular outcome groups based on variables identified in linked administrative data including; total time in intensive care, total length of stay in hospital, and mechanical ventilation time up to 5 years following the date of the first cardiac surgery. A blind medical record audit of 200 randomly chosen children from 'favourable' and 'unfavourable' outcome groups was performed to validate the outcome groups. ResultsOf the 1872 children in the dataset that linked to hospital or death data, 483 were identified with a `favourable' cardiovascular outcome and 484 were identified as having a 'unfavourable' cardiovascular outcome. The medical record audit found concordant outcome groups for 182/192 records (95%) compared to the outcome groups categorized using the linked data. ConclusionsThe linkage of a curated biobank dataset with routinely collected administrative data is a reliable method to identify outcomes to facilitate a large-scale study to examine genetic variance. These genetic hallmarks could be used to identify patients who are at risk of unfavourable cardiovascular outcomes, to inform strategies for prevention and changes in clinical care.
{"title":"Using novel data linkage of biobank data with administrative health data to inform genomic analysis for future precision medicine treatment of congenital heart disease","authors":"Samantha J. Lain, Gillian Blue, Bridget O’Malley, David Winlaw, Gary Sholler, Sally Dunwoodie, Natasha Nassar, None The Congenital Heart Disease Synergy Study group","doi":"10.23889/ijpds.v8i1.2150","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2150","url":null,"abstract":"IntroductionContemporary care of congenital heart disease (CHD) is largely standardised, however there is heterogeneity in post-surgical outcomes that may be explained by genetic variation. Data linkage between a CHD biobank and routinely collected administrative datasets is a novel method to identify outcomes to explore the impact of genetic variation. ObjectiveUse data linkage to identify and validate patient outcomes following surgical treatment for CHD. MethodsData linkage between clinical and biobank data of children born from 2001-2014 that had a procedure for CHD in New South Wales, Australia, with hospital discharge data, education and death data. The children were grouped according to CHD lesion type and age at first cardiac surgery. Children in each `lesion/age at surgery group' were classified into 'favourable' and 'unfavourable' cardiovascular outcome groups based on variables identified in linked administrative data including; total time in intensive care, total length of stay in hospital, and mechanical ventilation time up to 5 years following the date of the first cardiac surgery. A blind medical record audit of 200 randomly chosen children from 'favourable' and 'unfavourable' outcome groups was performed to validate the outcome groups. ResultsOf the 1872 children in the dataset that linked to hospital or death data, 483 were identified with a `favourable' cardiovascular outcome and 484 were identified as having a 'unfavourable' cardiovascular outcome. The medical record audit found concordant outcome groups for 182/192 records (95%) compared to the outcome groups categorized using the linked data. ConclusionsThe linkage of a curated biobank dataset with routinely collected administrative data is a reliable method to identify outcomes to facilitate a large-scale study to examine genetic variance. These genetic hallmarks could be used to identify patients who are at risk of unfavourable cardiovascular outcomes, to inform strategies for prevention and changes in clinical care.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"58 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134902619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-09DOI: 10.23889/ijpds.v8i4.2164
Fatemeh Torabi, Chris Orton, Emma Squires, Sharon Heys, Richard Hier, Ronan A. Lyons, Simon Thompson
BackgroundTrusted Research Environments provide a legitimate basis for data access along with a set of technologies to support implementation of the "five-safes" framework for privacy protection. Lack of standard approaches in achieving compliance with the "five-safes" framework results in a diversity of approaches across different TREs. Data access and analysis across multiple TREs has a range of benefits including improved precision of analysis due to larger sample sizes and broader availability of out-of-sample records, particularly in the study of rare conditions. Knowledge of governance approaches used across UK-TREs is limited. ObjectiveTo document key governance features in major UK-TRE contributing to UK wide analysis and to identify elements that would directly facilitate multi TRE collaborations and federated analysis in future. MethodWe summarised three main characteristics across 15 major UK-based TREs: 1) data access environment; 2) data access requests and disclosure control procedures; and 3) governance models. We undertook case studies of collaborative analyses conducted in more than one TRE. We identified an array of TREs operating on an equivalent level of governance. We further identify commonly governed TREs with architectural considerations for achieving an equivalent level of information security management system standards to facilitate multi TRE functionality and federated analytics. ResultsAll 15 UK-TREs allow pooling and analysis of aggregated research outputs only when they have passed human-operated disclosure control checks. Data access requests procedures are unique to each TRE. We also observed a variability in disclosure control procedures across various TREs with no or minimal researcher guidance on best practices for file out request procedures. In 2023, six TREs (40.0%) held ISO 20071 accreditation, while 9 TREs (56.2%) participated in four-nation analyses. ConclusionSecure analysis of individual-level data from multiple TREs is possible through existing technical solutions but requires development of a well-established governance framework meeting all stakeholder requirements and addressing public and patient concerns. Formation of a standard model could act as the catalyst for evolution of current TREs governance models to a multi TRE ecosystem within the UK and beyond.
{"title":"Common governance model: a way to avoid data segregation between existing trusted research environment","authors":"Fatemeh Torabi, Chris Orton, Emma Squires, Sharon Heys, Richard Hier, Ronan A. Lyons, Simon Thompson","doi":"10.23889/ijpds.v8i4.2164","DOIUrl":"https://doi.org/10.23889/ijpds.v8i4.2164","url":null,"abstract":"BackgroundTrusted Research Environments provide a legitimate basis for data access along with a set of technologies to support implementation of the \"five-safes\" framework for privacy protection. Lack of standard approaches in achieving compliance with the \"five-safes\" framework results in a diversity of approaches across different TREs. Data access and analysis across multiple TREs has a range of benefits including improved precision of analysis due to larger sample sizes and broader availability of out-of-sample records, particularly in the study of rare conditions. Knowledge of governance approaches used across UK-TREs is limited. ObjectiveTo document key governance features in major UK-TRE contributing to UK wide analysis and to identify elements that would directly facilitate multi TRE collaborations and federated analysis in future. MethodWe summarised three main characteristics across 15 major UK-based TREs: 1) data access environment; 2) data access requests and disclosure control procedures; and 3) governance models. We undertook case studies of collaborative analyses conducted in more than one TRE. We identified an array of TREs operating on an equivalent level of governance. We further identify commonly governed TREs with architectural considerations for achieving an equivalent level of information security management system standards to facilitate multi TRE functionality and federated analytics. ResultsAll 15 UK-TREs allow pooling and analysis of aggregated research outputs only when they have passed human-operated disclosure control checks. Data access requests procedures are unique to each TRE. We also observed a variability in disclosure control procedures across various TREs with no or minimal researcher guidance on best practices for file out request procedures. In 2023, six TREs (40.0%) held ISO 20071 accreditation, while 9 TREs (56.2%) participated in four-nation analyses. ConclusionSecure analysis of individual-level data from multiple TREs is possible through existing technical solutions but requires development of a well-established governance framework meeting all stakeholder requirements and addressing public and patient concerns. Formation of a standard model could act as the catalyst for evolution of current TREs governance models to a multi TRE ecosystem within the UK and beyond.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135242225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.23889/ijpds.v8i1.2158
Claire Little, Mark Elliot, Richard Allmendinger
IntroductionFederated Learning (FL) is a decentralised approach to training statistical models, where training is performed across multiple clients, producing one global model. Since the training data remains with each local client and is not shared or exchanged with other clients the use of FL may reduce privacy and security risks (compared to methods where multiple data sources are pooled) and can also address data access and heterogeneity problems. Synthetic data is artificially generated data that has the same structure and statistical properties as the original but that does not contain any of the original data records, therefore minimising disclosure risk. Using FL to produce synthetic data (which we refer to as "federated synthesis") has the potential to combine data from multiple clients without compromising privacy, allowing access to data that may otherwise be inaccessible in its raw format. ObjectivesThe objective was to review current research and practices for using FL to generate synthetic data and determine the extent to which research has been undertaken, the methods and evaluation practices used, and any research gaps. MethodsA scoping review was conducted to systematically map and describe the published literature on the use of FL to generate synthetic data. Relevant studies were identified through online databases and the findings are described, grouped, and summarised. Information extracted included article characteristics, documenting the type of data that is synthesised, the model architecture and the methods (if any) used to evaluate utility and privacy risk. ResultsA total of 69 articles were included in the scoping review; all were published between 2018 and 2023 with two thirds (46) in 2022. 30% (21) were focussed on synthetic data generation as the main model output (with 6 of these generating tabular data), whereas 59% (41) focussed on data augmentation. Of the 21 performing federated synthesis, all used deep learning methods (predominantly Generative Adversarial Networks) to generate the synthetic data. ConclusionsFederated synthesis is in its early days but shows promise as a method that can construct a global synthetic dataset without sharing any of the local client data. As a field in its infancy there are areas to explore in terms of the privacy risk associated with the various methods proposed, and more generally in how we measure those risks.
{"title":"Federated learning for generating synthetic data: a scoping review","authors":"Claire Little, Mark Elliot, Richard Allmendinger","doi":"10.23889/ijpds.v8i1.2158","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2158","url":null,"abstract":"IntroductionFederated Learning (FL) is a decentralised approach to training statistical models, where training is performed across multiple clients, producing one global model. Since the training data remains with each local client and is not shared or exchanged with other clients the use of FL may reduce privacy and security risks (compared to methods where multiple data sources are pooled) and can also address data access and heterogeneity problems. Synthetic data is artificially generated data that has the same structure and statistical properties as the original but that does not contain any of the original data records, therefore minimising disclosure risk. Using FL to produce synthetic data (which we refer to as \"federated synthesis\") has the potential to combine data from multiple clients without compromising privacy, allowing access to data that may otherwise be inaccessible in its raw format. ObjectivesThe objective was to review current research and practices for using FL to generate synthetic data and determine the extent to which research has been undertaken, the methods and evaluation practices used, and any research gaps. MethodsA scoping review was conducted to systematically map and describe the published literature on the use of FL to generate synthetic data. Relevant studies were identified through online databases and the findings are described, grouped, and summarised. Information extracted included article characteristics, documenting the type of data that is synthesised, the model architecture and the methods (if any) used to evaluate utility and privacy risk. ResultsA total of 69 articles were included in the scoping review; all were published between 2018 and 2023 with two thirds (46) in 2022. 30% (21) were focussed on synthetic data generation as the main model output (with 6 of these generating tabular data), whereas 59% (41) focussed on data augmentation. Of the 21 performing federated synthesis, all used deep learning methods (predominantly Generative Adversarial Networks) to generate the synthetic data. ConclusionsFederated synthesis is in its early days but shows promise as a method that can construct a global synthetic dataset without sharing any of the local client data. As a field in its infancy there are areas to explore in terms of the privacy risk associated with the various methods proposed, and more generally in how we measure those risks.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"342 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135868555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.23889/ijpds.v8i4.2160
Namneet Sandhu, Sarah Whittle, Danielle Southern, Bing Li, Erik Youngson, Jeffery Bakal, Christie Mcleod, Lexi Hilderman, Tyler Williamson, Cheligeer Cheligeer, Robin Walker, Padma Kaul, Hude Quan, Catherine Eastwood
Alberta has rich clinical and health services data held under the custodianship of Alberta Health and Alberta Health Services (AHS), which is not only used for clinical and administrative purposes but also disease surveillance and epidemiological research. Alberta is the largest province in Canada with a single payer centralised health system, AHS, and a consolidated data and analytics team supporting researchers across the province. This paper describes Alberta's data custodians, data governance mechanisms, and streamlined processes followed for research data access. AHS has created a centralised data repository from multiple sources, including practitioner claims data, hospital discharge data, and medications dispensed, available for research use through the provincial Data and Research Services (DRS) team. The DRS team is integrated within AHS to support researchers across the province with their data extraction and linkage requests. Furthermore, streamlined processes have been established, including: 1) ethics approval from a research ethics board, 2) any necessary operational approvals from AHS, and 3) a tripartite legal agreement dictating terms and conditions for data use, disclosure, and retention. This allows researchers to gain timely access to data. To meet the evolving and ever-expanding big-data needs, the University of Calgary, in partnership with AHS, has built high-performance computing (HPC) infrastructure to facilitate storage and processing of large datasets. When releasing data to researchers, the analytics team ensures that Alberta's Health Information Act's guiding principles are followed. The principal investigator also ensures data retention and disposition are according to the plan specified in ethics and per the terms set out by funding agencies. Even though there are disparities and variations in the data protection laws across the different provinces in Canada, the streamlined processes for research data access in Alberta are highly efficient.
{"title":"Health Data Governance for Research Use in Alberta","authors":"Namneet Sandhu, Sarah Whittle, Danielle Southern, Bing Li, Erik Youngson, Jeffery Bakal, Christie Mcleod, Lexi Hilderman, Tyler Williamson, Cheligeer Cheligeer, Robin Walker, Padma Kaul, Hude Quan, Catherine Eastwood","doi":"10.23889/ijpds.v8i4.2160","DOIUrl":"https://doi.org/10.23889/ijpds.v8i4.2160","url":null,"abstract":"Alberta has rich clinical and health services data held under the custodianship of Alberta Health and Alberta Health Services (AHS), which is not only used for clinical and administrative purposes but also disease surveillance and epidemiological research. Alberta is the largest province in Canada with a single payer centralised health system, AHS, and a consolidated data and analytics team supporting researchers across the province. This paper describes Alberta's data custodians, data governance mechanisms, and streamlined processes followed for research data access. AHS has created a centralised data repository from multiple sources, including practitioner claims data, hospital discharge data, and medications dispensed, available for research use through the provincial Data and Research Services (DRS) team. The DRS team is integrated within AHS to support researchers across the province with their data extraction and linkage requests. Furthermore, streamlined processes have been established, including: 1) ethics approval from a research ethics board, 2) any necessary operational approvals from AHS, and 3) a tripartite legal agreement dictating terms and conditions for data use, disclosure, and retention. This allows researchers to gain timely access to data. To meet the evolving and ever-expanding big-data needs, the University of Calgary, in partnership with AHS, has built high-performance computing (HPC) infrastructure to facilitate storage and processing of large datasets. When releasing data to researchers, the analytics team ensures that Alberta's Health Information Act's guiding principles are followed. The principal investigator also ensures data retention and disposition are according to the plan specified in ethics and per the terms set out by funding agencies. Even though there are disparities and variations in the data protection laws across the different provinces in Canada, the streamlined processes for research data access in Alberta are highly efficient.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"47 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136381580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IntroductionBy linking datasets, electronic records can be used to build large birth-cohorts, enabling researchers to cost-effectively answer questions relevant to populations over the life-course. Currently, around 5.8 million Palestinian refugees live in five settings: Jordan, Lebanon, Syria, West Bank, and Gaza Strip. The United Nations Relief and Works Agency for Palestine Refugees in the Near East (UNRWA) provides them with free primary health and elementary-school services. It maintains electronic records to do so. We aimed to establish a birth cohort of Palestinian refugees born between 1st January 2010 and 31st December 2020 living in five settings by linking mother obstetric records with child health and education records and to describe some of the cohort characteristics. In future, we plan to assess effects of size-at-birth on growth, health and educational attainment, among other questions. MethodsWe extracted all available data from 140 health centres and 702 schools across five settings, i.e. all UNRWA service users. Creating the cohort involved examining IDs and other data, preparing data, de-duplicating records, and identifying live-births, linking the mothers' and children's data using different deterministic linking algorithms, and understanding reasons for non-linkage. ResultsWe established a birth cohort of Palestinian refugees using electronic records of 972,743 live births. We found high levels of linkage to health records overall (83%), which improved over time (from 73% to 86%), and variations in linkage rates by setting: these averaged 93% in Gaza, 89% in Lebanon, 75% in Jordan, 73% in West Bank and 68% in Syria. Of the 423,580 children age-eligible to go to school, 47% went to UNRWA schools and comprised of 197,479 children with both health and education records, and 2,447 children with only education records. In addition to year and setting, other factors associated with non-linkage included mortality and having a non-refugee mother. Misclassification errors were minimal. ConclusionThis linked open birth-cohort is unique for refugees and the Arab region and forms the basis for many future studies, including to elucidate pathways for improved health and education in this vulnerable, understudied population. Our characterization of the cohort leads us to recommend using different sub-sets of the cohort depending on the research question and analytic purposes.
{"title":"Establishment of a birth-to-education cohort of 1 million Palestinian refugees using electronic medical records and electronic education records","authors":"Zeina Jamaluddine, Akihiro Seita, Ghada Ballout, Husam Al-Fudoli, Gloria Paolucci, Shatha Albaik, Rami Ibrahim, Miho Sato, Hala Ghattas, Oona Campbell","doi":"10.23889/ijpds.v8i1.2156","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2156","url":null,"abstract":"IntroductionBy linking datasets, electronic records can be used to build large birth-cohorts, enabling researchers to cost-effectively answer questions relevant to populations over the life-course. Currently, around 5.8 million Palestinian refugees live in five settings: Jordan, Lebanon, Syria, West Bank, and Gaza Strip. The United Nations Relief and Works Agency for Palestine Refugees in the Near East (UNRWA) provides them with free primary health and elementary-school services. It maintains electronic records to do so. We aimed to establish a birth cohort of Palestinian refugees born between 1st January 2010 and 31st December 2020 living in five settings by linking mother obstetric records with child health and education records and to describe some of the cohort characteristics. In future, we plan to assess effects of size-at-birth on growth, health and educational attainment, among other questions. MethodsWe extracted all available data from 140 health centres and 702 schools across five settings, i.e. all UNRWA service users. Creating the cohort involved examining IDs and other data, preparing data, de-duplicating records, and identifying live-births, linking the mothers' and children's data using different deterministic linking algorithms, and understanding reasons for non-linkage. ResultsWe established a birth cohort of Palestinian refugees using electronic records of 972,743 live births. We found high levels of linkage to health records overall (83%), which improved over time (from 73% to 86%), and variations in linkage rates by setting: these averaged 93% in Gaza, 89% in Lebanon, 75% in Jordan, 73% in West Bank and 68% in Syria. Of the 423,580 children age-eligible to go to school, 47% went to UNRWA schools and comprised of 197,479 children with both health and education records, and 2,447 children with only education records. In addition to year and setting, other factors associated with non-linkage included mortality and having a non-refugee mother. Misclassification errors were minimal. ConclusionThis linked open birth-cohort is unique for refugees and the Arab region and forms the basis for many future studies, including to elucidate pathways for improved health and education in this vulnerable, understudied population. Our characterization of the cohort leads us to recommend using different sub-sets of the cohort depending on the research question and analytic purposes.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135274259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-18DOI: 10.23889/ijpds.v8i6.2170
Kaleen Hayes, Daniel Harris, Andrew Zullo, Djeneba Audrey Djibo, Renae L. Smith-Ray, Michael S. Taitel, Tanya G. Singh, Cheryl McMahill-Walraven, Preeti Chachlani, Katherine Wen, Ellen P. McCarthy, Stefan Gravenstein, Sean McCurdy, Kristina E. Baird, Daniel Moran, Derek Fenson, Yalin Deng, Vincent Mor
BackgroundTo improve the assessment of COVID-19 vaccine use, safety, and effectiveness in older adults and persons with complex multimorbidity, the COVid VAXines Effects on the Aged (COVVAXAGE) database was established by linking CVS Health and Walgreens pharmacy customers to Medicare claims. MethodsWe deterministically linked CVS Health and Walgreens customers who had a pharmacy dispensation/encounter paid for by Medicare to Medicare enrollment and claims records. Linked data include U.S. Medicare claims, Medicare enrollment files, and community pharmacy records. The data currently span 01/01/2016 to 08/31/2022. "Research-ready" files were created, with weekly indicators for vaccinations, censoring, death, enrollment, demographics, and comorbidities. Data are updated quarterly. ResultsAs of November 2022, records for 27,086,723 CVS Health and 23,510,025 Walgreens unique customer IDs were identified for potential linkage. Approximately 91% of customers were matched to a Medicare beneficiary ID (95% for those aged 65 years or older). In the final linked cohort, there were 38,250,873 unique beneficiaries representing ~60% of the Medicare population. Among those alive and enrolled in Medicare as of January 1, 2020 (n = 33,721,568; average age = 73 years, 74% White, 51% Medicare Fee-for-Service, and 11% dual-eligible for Medicaid), the average follow-up time was 130 weeks. The cohort contains 16,021,055 beneficiaries with evidence a first COVID-19 vaccine dose. Data are stored on the secure Medicare & Medicaid Resource Information Center Health & Aging Data Enclave. Data accessInvestigators with funded or in-progress funding applications to the National Institute on Aging who are interested in learning more about the database should contact Dr Vincent Mor [Vincent_mor@brown.edu] and Dr Kaleen Hayes [kaley_hayes@brown.edu]. A data dictionary can be provided under reasonable request. ConclusionsThe COVVAXAGE cohort is a large and diverse cohort that can be used for the ongoing evaluation of COVID-19 vaccine use and other research questions relevant to the Medicare population.
{"title":"Data Resource Profile: COVid VAXines Effects on the Aged (COVVAXAGE)","authors":"Kaleen Hayes, Daniel Harris, Andrew Zullo, Djeneba Audrey Djibo, Renae L. Smith-Ray, Michael S. Taitel, Tanya G. Singh, Cheryl McMahill-Walraven, Preeti Chachlani, Katherine Wen, Ellen P. McCarthy, Stefan Gravenstein, Sean McCurdy, Kristina E. Baird, Daniel Moran, Derek Fenson, Yalin Deng, Vincent Mor","doi":"10.23889/ijpds.v8i6.2170","DOIUrl":"https://doi.org/10.23889/ijpds.v8i6.2170","url":null,"abstract":"BackgroundTo improve the assessment of COVID-19 vaccine use, safety, and effectiveness in older adults and persons with complex multimorbidity, the COVid VAXines Effects on the Aged (COVVAXAGE) database was established by linking CVS Health and Walgreens pharmacy customers to Medicare claims. MethodsWe deterministically linked CVS Health and Walgreens customers who had a pharmacy dispensation/encounter paid for by Medicare to Medicare enrollment and claims records. Linked data include U.S. Medicare claims, Medicare enrollment files, and community pharmacy records. The data currently span 01/01/2016 to 08/31/2022. \"Research-ready\" files were created, with weekly indicators for vaccinations, censoring, death, enrollment, demographics, and comorbidities. Data are updated quarterly. ResultsAs of November 2022, records for 27,086,723 CVS Health and 23,510,025 Walgreens unique customer IDs were identified for potential linkage. Approximately 91% of customers were matched to a Medicare beneficiary ID (95% for those aged 65 years or older). In the final linked cohort, there were 38,250,873 unique beneficiaries representing ~60% of the Medicare population. Among those alive and enrolled in Medicare as of January 1, 2020 (n = 33,721,568; average age = 73 years, 74% White, 51% Medicare Fee-for-Service, and 11% dual-eligible for Medicaid), the average follow-up time was 130 weeks. The cohort contains 16,021,055 beneficiaries with evidence a first COVID-19 vaccine dose. Data are stored on the secure Medicare & Medicaid Resource Information Center Health & Aging Data Enclave. Data accessInvestigators with funded or in-progress funding applications to the National Institute on Aging who are interested in learning more about the database should contact Dr Vincent Mor [Vincent_mor@brown.edu] and Dr Kaleen Hayes [kaley_hayes@brown.edu]. A data dictionary can be provided under reasonable request. ConclusionsThe COVVAXAGE cohort is a large and diverse cohort that can be used for the ongoing evaluation of COVID-19 vaccine use and other research questions relevant to the Medicare population.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135884320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-16DOI: 10.23889/ijpds.v8i1.2123
Alan Katz, Hannah Owczar, Carole Taylor, John-Micheal Bowes, Ruth-Ann Soodeen
BackgroundThe healthcare system in Manitoba, Canada has faced long wait times for many surgical procedures and investigations, including orthopedic and ophthalmology surgeries. Wait times for surgical procedures is considered a significant barrier to accessing healthcare in Canada and can have negative health outcomes for patients. We developed models to forecast anticipated surgical procedure demands up to 2027. This paper explores the opportunities and challenges of using administrative data to describe forecasts of surgical service delivery. MethodsThis study used whole population linked administrative health data to predict future orthopedic and ophthalmology surgical procedure demands up to 2027. Procedure codes (CCI) from hospital discharge abstracts and medical claims data were used in the modelling. A Seasonal Autoregressive Integrated Moving Average model provided the best fit to the data from April 1, 2004 to March 31, 2020. ResultsInitial analyses of only hospital-based procedures excluded a significant portion of provider workload, namely those services provided in clinics. We identified 500,732 orthopedic procedures completed between April 1, 2004 and March 31, 2020 (349,171 procedures identified from hospital discharge abstracts and 151,561 procedures from medical claims). Procedure volumes for these services are expected to rise 17.7% from 2020 (36,542) to 2027 (43,011), including the forecasted 43.9% increase in clinic-based procedures. Of the 660,127 ophthalmology procedures completed between April 1, 2004 and March 31, 2020, 230,717 procedures were identified from hospital discharge abstracts and 429,410 from medical claims. Models forecasted a 27.7% increase from 2020 (69,598) to 2027 (88,893) with most procedures being performed in clinics. ConclusionResearchers should consider including multiple datasets to add information that may have been missing from the presumed data source in their research approach. Confirming the completeness of the data is critical in modelling accurate predictions. Forecast modelling techniques have evolved but still require validation.
{"title":"Orthopedic and ophthalmology surgical service projection modelling in Manitoba: Research approach for a data linkage study","authors":"Alan Katz, Hannah Owczar, Carole Taylor, John-Micheal Bowes, Ruth-Ann Soodeen","doi":"10.23889/ijpds.v8i1.2123","DOIUrl":"https://doi.org/10.23889/ijpds.v8i1.2123","url":null,"abstract":"BackgroundThe healthcare system in Manitoba, Canada has faced long wait times for many surgical procedures and investigations, including orthopedic and ophthalmology surgeries. Wait times for surgical procedures is considered a significant barrier to accessing healthcare in Canada and can have negative health outcomes for patients. We developed models to forecast anticipated surgical procedure demands up to 2027. This paper explores the opportunities and challenges of using administrative data to describe forecasts of surgical service delivery. MethodsThis study used whole population linked administrative health data to predict future orthopedic and ophthalmology surgical procedure demands up to 2027. Procedure codes (CCI) from hospital discharge abstracts and medical claims data were used in the modelling. A Seasonal Autoregressive Integrated Moving Average model provided the best fit to the data from April 1, 2004 to March 31, 2020. ResultsInitial analyses of only hospital-based procedures excluded a significant portion of provider workload, namely those services provided in clinics. We identified 500,732 orthopedic procedures completed between April 1, 2004 and March 31, 2020 (349,171 procedures identified from hospital discharge abstracts and 151,561 procedures from medical claims). Procedure volumes for these services are expected to rise 17.7% from 2020 (36,542) to 2027 (43,011), including the forecasted 43.9% increase in clinic-based procedures. Of the 660,127 ophthalmology procedures completed between April 1, 2004 and March 31, 2020, 230,717 procedures were identified from hospital discharge abstracts and 429,410 from medical claims. Models forecasted a 27.7% increase from 2020 (69,598) to 2027 (88,893) with most procedures being performed in clinics. ConclusionResearchers should consider including multiple datasets to add information that may have been missing from the presumed data source in their research approach. Confirming the completeness of the data is critical in modelling accurate predictions. Forecast modelling techniques have evolved but still require validation.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136113011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-11DOI: 10.23889/ijpds.v8i6.2182
Tanja Mueller, Euan Proud, Amanj Kurdi, Lynne Jarvis, Kat Reid, Stuart McTaggart, Marion Bennie
IntroductionTo support both electronic prescribing and documentation of medicines administration in secondary care, hospitals in Scotland are currently implementing the Hospital Electronic Prescribing and Medicines Administration (HEPMA) software. Driven by the COVID-19 pandemic, agreements have been put in place to centrally collate data stemming from the operational HEPMA system. The aim was to develop a national data resource based on records created in secondary care, in line with pre-existing collections of data from primary care. MethodsHEPMA is a live clinical system and updated on a continuous basis. Data is automatically extracted from local systems at least weekly and, in most cases, on a nightly basis, and integrated into the national HEPMA dataset. Subsequently, the data are subject to quality checks including data consistency and completeness. Records contain a unique patient identified (Community Health Index number), enabling linkage to other routinely collected data including primary care prescriptions, hospital admission episodes, and death records. ResultsThe HEPMA data resource captures and compiles information on all medicines prescribed within the ward/hospital covered by the system; this includes medicine name, formulation, strength, dose, route, and frequency of administration, and dates and times of prescribing. In addition, the HEPMA dataset also captures information on medicines administration, including dates and time of administration. Data is available from January 2019 onwards and held by Public Health Scotland. ConclusionThe national HEPMA data resource supports cross-sectional/point-prevalence studies including drug utilisation studies, and also offers scope to conduct longitudinal studies, e.g., cohort and case-control studies. With the possibility to link to other relevant datasets, additional areas of interest may include health policy evaluations and health economics studies. Access to data is subject to approval; researchers need to contact the electronic Data Research and Innovation Service (eDRIS) in the first instance.
{"title":"Data Resource Profile: The Hospital Electronic Prescribing and Medicines Administration (HEPMA) National Data Collection in Scotland","authors":"Tanja Mueller, Euan Proud, Amanj Kurdi, Lynne Jarvis, Kat Reid, Stuart McTaggart, Marion Bennie","doi":"10.23889/ijpds.v8i6.2182","DOIUrl":"https://doi.org/10.23889/ijpds.v8i6.2182","url":null,"abstract":"IntroductionTo support both electronic prescribing and documentation of medicines administration in secondary care, hospitals in Scotland are currently implementing the Hospital Electronic Prescribing and Medicines Administration (HEPMA) software. Driven by the COVID-19 pandemic, agreements have been put in place to centrally collate data stemming from the operational HEPMA system. The aim was to develop a national data resource based on records created in secondary care, in line with pre-existing collections of data from primary care. MethodsHEPMA is a live clinical system and updated on a continuous basis. Data is automatically extracted from local systems at least weekly and, in most cases, on a nightly basis, and integrated into the national HEPMA dataset. Subsequently, the data are subject to quality checks including data consistency and completeness. Records contain a unique patient identified (Community Health Index number), enabling linkage to other routinely collected data including primary care prescriptions, hospital admission episodes, and death records. ResultsThe HEPMA data resource captures and compiles information on all medicines prescribed within the ward/hospital covered by the system; this includes medicine name, formulation, strength, dose, route, and frequency of administration, and dates and times of prescribing. In addition, the HEPMA dataset also captures information on medicines administration, including dates and time of administration. Data is available from January 2019 onwards and held by Public Health Scotland. ConclusionThe national HEPMA data resource supports cross-sectional/point-prevalence studies including drug utilisation studies, and also offers scope to conduct longitudinal studies, e.g., cohort and case-control studies. With the possibility to link to other relevant datasets, additional areas of interest may include health policy evaluations and health economics studies. Access to data is subject to approval; researchers need to contact the electronic Data Research and Innovation Service (eDRIS) in the first instance.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136211198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-09DOI: 10.23889/ijpds.v8i4.2169
Rachel Brophy, Ester Bellavia, Maeve Groot Bluemink, Katharine Evans, Munisa Hashimi, Yemi Macaulay, Edel McNamara, Allison Noble, Paola Quattroni, Amanda Rudczenko, Andrew D Morris, Cassie Smith, Andy Boyd
IntroductionTrusted Research Environments (TREs) are secure computing environments that provide access to data for approved researchers to use in studies that can save and improve lives. TREs rely on Data Access Agreements (DAAs) to bind researchers and their organisations to the terms and conditions of accessing the infrastructure and data use. However, DAAs can be overly lengthy, complex, and can contain outdated terms from historical data sharing agreements for physical exchange of data. This is often cited as a cause of significant delays to legal review and research projects starting. ObjectivesThe aim was to develop a standardised DAA optimised for data science in TREs across the UK and framed around the `Five Safes framework' for trustworthy data use. The DAA is underpinned by principles of data access in TREs, the development of which is described in this paper. MethodsThe Pan-UK Data Governance Steering Group of the UK Health Data Research Alliance led the development of a core set of data access principles. This was informed by a benchmarking exercise of DAAs used by established TREs and consultation with public members and stakeholders. ResultsWe have defined a core set of principles for TRE data access that can be mapped to a common set of DAA terms for UK-based TREs. Flexibility will be ensured by including terms specific to TREs or specific data/data owners in customisable annexes. Public views obtained through public involvement and engagement (PIE) activities are also reported. ConclusionsThese principles provide the foundation for a standardised UK TRE DAA template, designed to support the growing ecosystem of TREs. By providing a familiar structure and terms, this template aims to build trust among data owners and the UK public and to provide clarity to researchers on their obligations to protect the data. Widespread adoption is intended to accelerate health data research by enabling faster approval of projects, ultimately enabling more timely and effective research.
{"title":"Towards a standardised cross-sectoral data access agreement template for research: a core set of principles for data access within trusted research environments","authors":"Rachel Brophy, Ester Bellavia, Maeve Groot Bluemink, Katharine Evans, Munisa Hashimi, Yemi Macaulay, Edel McNamara, Allison Noble, Paola Quattroni, Amanda Rudczenko, Andrew D Morris, Cassie Smith, Andy Boyd","doi":"10.23889/ijpds.v8i4.2169","DOIUrl":"https://doi.org/10.23889/ijpds.v8i4.2169","url":null,"abstract":"IntroductionTrusted Research Environments (TREs) are secure computing environments that provide access to data for approved researchers to use in studies that can save and improve lives. TREs rely on Data Access Agreements (DAAs) to bind researchers and their organisations to the terms and conditions of accessing the infrastructure and data use. However, DAAs can be overly lengthy, complex, and can contain outdated terms from historical data sharing agreements for physical exchange of data. This is often cited as a cause of significant delays to legal review and research projects starting. ObjectivesThe aim was to develop a standardised DAA optimised for data science in TREs across the UK and framed around the `Five Safes framework' for trustworthy data use. The DAA is underpinned by principles of data access in TREs, the development of which is described in this paper. MethodsThe Pan-UK Data Governance Steering Group of the UK Health Data Research Alliance led the development of a core set of data access principles. This was informed by a benchmarking exercise of DAAs used by established TREs and consultation with public members and stakeholders. ResultsWe have defined a core set of principles for TRE data access that can be mapped to a common set of DAA terms for UK-based TREs. Flexibility will be ensured by including terms specific to TREs or specific data/data owners in customisable annexes. Public views obtained through public involvement and engagement (PIE) activities are also reported. ConclusionsThese principles provide the foundation for a standardised UK TRE DAA template, designed to support the growing ecosystem of TREs. By providing a familiar structure and terms, this template aims to build trust among data owners and the UK public and to provide clarity to researchers on their obligations to protect the data. Widespread adoption is intended to accelerate health data research by enabling faster approval of projects, ultimately enabling more timely and effective research.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-04DOI: 10.23889/ijpds.v8i4.2159
Amy Hawn Nelson, Sharon Zanti
IntroductionThis paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use. ObjectivesWhile this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context. MethodsThe framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States. ResultsThe Four Questions - Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)? - should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework. ConclusionsA robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project. Highlights Strong data governance has five qualities: it is purpose-, value-, and principle-driven; strategically located; collaborative; iterative; and transparent. Through a series of public deliberation workgroups and 15 years of field experience, we developed a Four Question Framework to determine whether and how to move forward with building an IDS and at each stage of a data sharing and integration project. The Four Questions—Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)? —should be carefully considered within established data governance processes and among core partners.
{"title":"Four Questions to Guide Decision-Making for Data Sharing and Integration","authors":"Amy Hawn Nelson, Sharon Zanti","doi":"10.23889/ijpds.v8i4.2159","DOIUrl":"https://doi.org/10.23889/ijpds.v8i4.2159","url":null,"abstract":"IntroductionThis paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use. ObjectivesWhile this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context. MethodsThe framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States. ResultsThe Four Questions - Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)? - should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework. ConclusionsA robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project. Highlights Strong data governance has five qualities: it is purpose-, value-, and principle-driven; strategically located; collaborative; iterative; and transparent. Through a series of public deliberation workgroups and 15 years of field experience, we developed a Four Question Framework to determine whether and how to move forward with building an IDS and at each stage of a data sharing and integration project. The Four Questions—Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)? —should be carefully considered within established data governance processes and among core partners.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135590831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}