Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1905
J. Hampton, Nick Webster, Sophie Jordan, S. Morrison-Rees, O. Bateson, James Watson, S. McFarlane, Alastair McAlpine, L. Cavin
ObjectivesThe AD|ARC Administrative Data: Agriculture Research Collection is an ambitious and original linkage project, bringing together information about farmers and farming households from several sources. When complete, this research-ready dataset will assist in addressing three broad themes: health and well-being, prosperity and resilience, and engagement with agri-environment. ApproachThe dataset is being constructed from information drawn from survey, census, and administrative sources. Necessarily, this includes working across government departments to ensure comprehensive coverage of farm, business, education, and health data. Similarly, data owners, processors, and researchers are working closely to ensure the resultant dataset meets expectations. Alongside this cross-sectoral aspect, the work is also cross-jurisdictional, with the intention being for the data to capture information about farms, farmers and farming households from across the UK. ResultsRather than focus on the detail of the substantive research that AD|ARC will enable, this paper discusses some of the challenges and successes of this linkage project to date. Drawing on the experience of the teams from across the UK (England, Northern Ireland, Scotland, and Wales), the first part will discuss challenges faced in linkage of this multi-faceted project, alongside how the population census is being utilised to better understand farming communities, through the identification of both farming households and workers. Secondly, a broader discussion of the challenges and sensitivities of working across government departments and administrations will be presented, alongside ways of working developed to recognise and overcome these. ConclusionThe AD|ARC project will result in an invaluable resource to better understand the farming community, which in turn will help to better inform policy debate and decision making. Alongside this, the process of creating the dataset has offered opportunities for learning and insight across a range of issues.
{"title":"AD|ARC: Construction of a research ready dataset to better understand farmers and farming households.","authors":"J. Hampton, Nick Webster, Sophie Jordan, S. Morrison-Rees, O. Bateson, James Watson, S. McFarlane, Alastair McAlpine, L. Cavin","doi":"10.23889/ijpds.v7i3.1905","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1905","url":null,"abstract":"ObjectivesThe AD|ARC Administrative Data: Agriculture Research Collection is an ambitious and original linkage project, bringing together information about farmers and farming households from several sources. When complete, this research-ready dataset will assist in addressing three broad themes: health and well-being, prosperity and resilience, and engagement with agri-environment. \u0000ApproachThe dataset is being constructed from information drawn from survey, census, and administrative sources. Necessarily, this includes working across government departments to ensure comprehensive coverage of farm, business, education, and health data. Similarly, data owners, processors, and researchers are working closely to ensure the resultant dataset meets expectations. Alongside this cross-sectoral aspect, the work is also cross-jurisdictional, with the intention being for the data to capture information about farms, farmers and farming households from across the UK. \u0000ResultsRather than focus on the detail of the substantive research that AD|ARC will enable, this paper discusses some of the challenges and successes of this linkage project to date. Drawing on the experience of the teams from across the UK (England, Northern Ireland, Scotland, and Wales), the first part will discuss challenges faced in linkage of this multi-faceted project, alongside how the population census is being utilised to better understand farming communities, through the identification of both farming households and workers. Secondly, a broader discussion of the challenges and sensitivities of working across government departments and administrations will be presented, alongside ways of working developed to recognise and overcome these. \u0000ConclusionThe AD|ARC project will result in an invaluable resource to better understand the farming community, which in turn will help to better inform policy debate and decision making. Alongside this, the process of creating the dataset has offered opportunities for learning and insight across a range of issues.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48988975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1935
E. Ross, D. O’Reilly, M. Aideen
ObjectivesThere is clear evidence from the USA that access to firearms increases suicide risk, but little equivalent evidence exists in the UK. The aim of the current study is to examine the risk of suicide and all-cause mortality for people in Northern Ireland (NI) who hold a licenced firearm. ApproachWe link information on all registrations from the Firearms Certificate (FAC) Register between 2010-2020 to the health service population spine for NI residents born before 1st January 2005. Further linkage includes prescription medication data and death records with follow-up until 31st December 2020. Results68,831 individuals held a FAC during the study period. FAC holders were more likely to be older, to reside in rural areas (OR 4.99, 4.89-7.83), and to come from more affluent areas (ORmost deprived 0.46, 0.43-0.50). During follow-up, 3,704 FAC holders died. 36 deaths were due to suicide, of which 16 were suicides by firearm. Only 23% of those who died by firearm suicide in NI were FAC holders. Preliminary findings indicate that after adjustment for age, area-level deprivation, and urbanicity, FAC holders had a lower risk of all-cause mortality (HR 0.64, 0.61-0.66) and death by suicide (HR 0.54, 0.39-0.76). ConclusionIn contrast to findings from previous studies, individuals with a licensed firearm were less likely to die by suicide. AcknowledgementThe authors would like to acknowledge the help provided by the staff of the Honest Broker Service (HBS) within the Business Services Organisation Northern Ireland (BSO). The HBS is funded by the BSO and the Department of Health (DoH). The authors alone are responsible for the interpretation of the data and any views or opinions presented are solely those of the author and do not necessarily represent those of the BSO.
{"title":"Mental health, firearm ownership, and risk of death by suicide: a population-wide data linkage study.","authors":"E. Ross, D. O’Reilly, M. Aideen","doi":"10.23889/ijpds.v7i3.1935","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1935","url":null,"abstract":"ObjectivesThere is clear evidence from the USA that access to firearms increases suicide risk, but little equivalent evidence exists in the UK. The aim of the current study is to examine the risk of suicide and all-cause mortality for people in Northern Ireland (NI) who hold a licenced firearm. \u0000ApproachWe link information on all registrations from the Firearms Certificate (FAC) Register between 2010-2020 to the health service population spine for NI residents born before 1st January 2005. Further linkage includes prescription medication data and death records with follow-up until 31st December 2020. \u0000Results68,831 individuals held a FAC during the study period. FAC holders were more likely to be older, to reside in rural areas (OR 4.99, 4.89-7.83), and to come from more affluent areas (ORmost deprived 0.46, 0.43-0.50). During follow-up, 3,704 FAC holders died. 36 deaths were due to suicide, of which 16 were suicides by firearm. Only 23% of those who died by firearm suicide in NI were FAC holders. Preliminary findings indicate that after adjustment for age, area-level deprivation, and urbanicity, FAC holders had a lower risk of all-cause mortality (HR 0.64, 0.61-0.66) and death by suicide (HR 0.54, 0.39-0.76). \u0000ConclusionIn contrast to findings from previous studies, individuals with a licensed firearm were less likely to die by suicide. \u0000AcknowledgementThe authors would like to acknowledge the help provided by the staff of the Honest Broker Service (HBS) within the Business Services Organisation Northern Ireland (BSO). The HBS is funded by the BSO and the Department of Health (DoH). The authors alone are responsible for the interpretation of the data and any views or opinions presented are solely those of the author and do not necessarily represent those of the BSO.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48869345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1949
Nadine E. Andrew, R. Beare, Tanya Ravipati, E. Parker, T. Collyer, David Ung, V. Srikanth
ObjectivesElectronic Health Record (EHR) data have created unique opportunities for research. However, these data are: not curated, siloed and poorly integrated. We describe linkage of EHR data from an entire health service with government datasets to establish a linked geographic cohort within the Australian National Centre for Healthy Ageing (NCHA). ApproachResearch suitable EHR items were identified from Peninsula Health (NCHA partner) data systems based on: published research, availability and quality. Items underwent end-user Delphi processes to identify core research items (consensus=70%). Approvals were obtained from the Australian Institute of Health and Welfare (AIHW) for linkage with: Medicare, medication dispensings, Aged Care and death registry data through the AIHW spine, created using identifiers from the Medicare Consumer Directory (MCD); and from the Centre for Victorian Data Linkage for linkage to state-wide hospital data. Identifiers for local residents aged ≥60 years who attended Peninsula Health were submitted for probabilistic data linkage. ResultsDelphi participants included 10 researchers from 8 fields/departments and 13 clinicians from 11 clinical areas. To date 7 of the 11 datasets have been reviewed. N=107 potentially suitable data items were identified and 96 gained consensus for inclusion in the core dataset. Of the 49,767 Health Service users (episodes: Jan 2010-Dec May 2021) submitted for linkage, 98.4% were successfully linked to the MCD (Median age 72.2 years, 52.2% female, 1.8% regional residence). An additional 172,290 individuals living within the geographic region but not contained within the EHR dataset were identified in the MCD for linkage to the government datasets. Linkage accuracy was impacted by inaccurate/incomplete address fields (~30%) and lack of adherence to naming conventions within the EHR data. ConclusionLinking with EHR data is complex. Having an established EHR research dataset will improve the feasibility of data linkage and potential for future expansion of linkages within the NCHA. Once merged, the data will be used to underpin a range of research activities related to ageing and dementia.
{"title":"The National Centre for Healthy Ageing data platform: establishing an Electronic Health Record derived linked geographic cohort.","authors":"Nadine E. Andrew, R. Beare, Tanya Ravipati, E. Parker, T. Collyer, David Ung, V. Srikanth","doi":"10.23889/ijpds.v7i3.1949","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1949","url":null,"abstract":"ObjectivesElectronic Health Record (EHR) data have created unique opportunities for research. However, these data are: not curated, siloed and poorly integrated. We describe linkage of EHR data from an entire health service with government datasets to establish a linked geographic cohort within the Australian National Centre for Healthy Ageing (NCHA). \u0000ApproachResearch suitable EHR items were identified from Peninsula Health (NCHA partner) data systems based on: published research, availability and quality. Items underwent end-user Delphi processes to identify core research items (consensus=70%). Approvals were obtained from the Australian Institute of Health and Welfare (AIHW) for linkage with: Medicare, medication dispensings, Aged Care and death registry data through the AIHW spine, created using identifiers from the Medicare Consumer Directory (MCD); and from the Centre for Victorian Data Linkage for linkage to state-wide hospital data. Identifiers for local residents aged ≥60 years who attended Peninsula Health were submitted for probabilistic data linkage. \u0000ResultsDelphi participants included 10 researchers from 8 fields/departments and 13 clinicians from 11 clinical areas. To date 7 of the 11 datasets have been reviewed. N=107 potentially suitable data items were identified and 96 gained consensus for inclusion in the core dataset. Of the 49,767 Health Service users (episodes: Jan 2010-Dec May 2021) submitted for linkage, 98.4% were successfully linked to the MCD (Median age 72.2 years, 52.2% female, 1.8% regional residence). An additional 172,290 individuals living within the geographic region but not contained within the EHR dataset were identified in the MCD for linkage to the government datasets. Linkage accuracy was impacted by inaccurate/incomplete address fields (~30%) and lack of adherence to naming conventions within the EHR data. \u0000ConclusionLinking with EHR data is complex. Having an established EHR research dataset will improve the feasibility of data linkage and potential for future expansion of linkages within the NCHA. Once merged, the data will be used to underpin a range of research activities related to ageing and dementia.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46337735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1975
M. Palfy, Christopher Radbone
ObjectivesThe purpose of this analytical activity was to ensure confidence in the technical capability for extracting, linking, and integrating public hospital inpatient data, public pathology blood transfusions records and blood tests, to optimise records linkage allowing patterns and trends to be then analysed with confidence. ApproachThe SURE secure data platform was essential to ensure data governance and security requirements were met while integrating health data spanning 18 months (January 2018 - June 2019). Data sources came in multiple formats of varying quality. R was chosen for its data wrangling abilities and reproducibility. The phases were: Source data loading and cleaning Linking hospital inpatient and blood transfusions records Summarising linked transfusion data Linking inpatient and blood tests data Summarising linked tests data Integrating hospital data with summarised transfusion and summarised tests data Deriving additional variables based on summarised data ResultsFrom 143,192 transfusion records, 55,053 (38.4%) were excluded as they did not meet the inclusion criteria (e.g., hospital or blood product out-of-scope). From 7,897,451 blood test records, 238,013 (3.0%) were excluded, mostly of poor quality (missing/invalid hospital code). Initially 91.4% of transfusion records were matched with hospital inpatient records. The linkage rate for state-wide blood test records was 62.3% for tests records, noting the low match rate was attributed to tests not performed on public hospital patients, as the blood test data was statewide. Linkage process was improved by adding additional patient codes from public pathology’s internal patient identifiers. The linkage rate improved to 95.5% for transfusion records and 64.4% for test records. Conclusion12 different data sources, with differing file types and formats, needed coding to achieve standardised results, enabling future reproducibility. Over one hundred business rules were implemented to produce a robust solution for future data updates. End results were analysed, and it was determined that linkage and integration quality exceeded previous similar attempts in terms of match rate and accuracy.
{"title":"Understanding South Australia’s blood products usage patterns and outcomes, using data linkage.","authors":"M. Palfy, Christopher Radbone","doi":"10.23889/ijpds.v7i3.1975","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1975","url":null,"abstract":"ObjectivesThe purpose of this analytical activity was to ensure confidence in the technical capability for extracting, linking, and integrating public hospital inpatient data, public pathology blood transfusions records and blood tests, to optimise records linkage allowing patterns and trends to be then analysed with confidence. \u0000ApproachThe SURE secure data platform was essential to ensure data governance and security requirements were met while integrating health data spanning 18 months (January 2018 - June 2019). Data sources came in multiple formats of varying quality. R was chosen for its data wrangling abilities and reproducibility. \u0000The phases were: \u0000 \u0000Source data loading and cleaning \u0000Linking hospital inpatient and blood transfusions records \u0000Summarising linked transfusion data \u0000Linking inpatient and blood tests data \u0000Summarising linked tests data \u0000Integrating hospital data with summarised transfusion and summarised tests data \u0000Deriving additional variables based on summarised data \u0000 \u0000ResultsFrom 143,192 transfusion records, 55,053 (38.4%) were excluded as they did not meet the inclusion criteria (e.g., hospital or blood product out-of-scope). \u0000From 7,897,451 blood test records, 238,013 (3.0%) were excluded, mostly of poor quality (missing/invalid hospital code). \u0000Initially 91.4% of transfusion records were matched with hospital inpatient records. The linkage rate for state-wide blood test records was 62.3% for tests records, noting the low match rate was attributed to tests not performed on public hospital patients, as the blood test data was statewide. \u0000Linkage process was improved by adding additional patient codes from public pathology’s internal patient identifiers. The linkage rate improved to 95.5% for transfusion records and 64.4% for test records. \u0000Conclusion12 different data sources, with differing file types and formats, needed coding to achieve standardised results, enabling future reproducibility. Over one hundred business rules were implemented to produce a robust solution for future data updates. End results were analysed, and it was determined that linkage and integration quality exceeded previous similar attempts in terms of match rate and accuracy.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46417778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1854
C. Nanayakkara, P. Christen
ObjectivesPopulation databases containing birth, death, and marriage certificates or census records, are increasingly used for studies in a variety of research domains. Their large scale and complexity make linking such databases highly challenging. We present a scalable blocking and linking technique that exploits temporal and spatial constraints in personal data. ApproachBased on a state-of-the-art blocking method using locality sensitive hashing (LSH), we incorporate (a) attribute similarities, (b) temporal constraints (for example, a mother cannot give birth to two babies less than nine months apart, besides a multiple birth), and (c) spatial constraints (two births by the same mother are more likely to happen in the same location than far apart). In an iterative fashion, we identify highly confident matches first, and use these matches to further refine our constraints. We adopt a block size and frequency-based filtering approach to further enhance the efficiency of the record linkage comparison step. ResultsWe conducted experiments on a Scottish data set containing 17,613 birth certificates from 1861 to 1901, where the application of standard LSH blocking generated approximately 15 million candidate record pairs, with a recall of 0.999 and a precision of 0.003. With the application of our block size and frequency-based filtering approach we obtained a ten-fold and hundred-fold reduction of this candidate record pair set with a small reduction of recall to 0.984 and 0.962, respectively. The comparison of record pairs in the hundred-fold reduction using our iterative linking technique achieved up-to 0.961 precision and 0.811 recall. This means that our method can achieve a reduction in computational efforts, and improvement in precision of over 99% at the cost of a decline in recall below 19%. ConclusionWe presented a method to reduce the computational complexity of linking large and complex population databases while ensuring high linkage quality. Our method can be generalised to population databases where temporal and spatial constraints can be defined. We plan to apply our method on a Scottish database with 24 million records.
{"title":"Efficient population record linkage with temporal and spatial constraints.","authors":"C. Nanayakkara, P. Christen","doi":"10.23889/ijpds.v7i3.1854","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1854","url":null,"abstract":"ObjectivesPopulation databases containing birth, death, and marriage certificates or census records, are increasingly used for studies in a variety of research domains. Their large scale and complexity make linking such databases highly challenging. We present a scalable blocking and linking technique that exploits temporal and spatial constraints in personal data. \u0000ApproachBased on a state-of-the-art blocking method using locality sensitive hashing (LSH), we incorporate (a) attribute similarities, (b) temporal constraints (for example, a mother cannot give birth to two babies less than nine months apart, besides a multiple birth), and (c) spatial constraints (two births by the same mother are more likely to happen in the same location than far apart). In an iterative fashion, we identify highly confident matches first, and use these matches to further refine our constraints. We adopt a block size and frequency-based filtering approach to further enhance the efficiency of the record linkage comparison step. \u0000ResultsWe conducted experiments on a Scottish data set containing 17,613 birth certificates from 1861 to 1901, where the application of standard LSH blocking generated approximately 15 million candidate record pairs, with a recall of 0.999 and a precision of 0.003. With the application of our block size and frequency-based filtering approach we obtained a ten-fold and hundred-fold reduction of this candidate record pair set with a small reduction of recall to 0.984 and 0.962, respectively. The comparison of record pairs in the hundred-fold reduction using our iterative linking technique achieved up-to 0.961 precision and 0.811 recall. This means that our method can achieve a reduction in computational efforts, and improvement in precision of over 99% at the cost of a decline in recall below 19%. \u0000ConclusionWe presented a method to reduce the computational complexity of linking large and complex population databases while ensuring high linkage quality. Our method can be generalised to population databases where temporal and spatial constraints can be defined. We plan to apply our method on a Scottish database with 24 million records.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46697832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2050
Shelley Gammon, R. Shipsey, Charlie Tomlin, Josie Plachta
In early 2020 there was intense media speculation that ethnicity and Covid-19 deaths were correlated. However, the existing method of adding ethnicity to death records resulted in low linkage rates for very recent deaths. We designed and implemented a bespoke linkage in three days enabling accurate reporting to the nation. We linked the 2011 England and Wales Census to death records using a range of personal identifiers. Due to time pressure, we focused on executing a single linkage method well. Deterministic linkage was chosen, using a variety of matchkeys which were tested via clerical review. To overcome the issue of addresses changing since 2011, we also linked 2020 death record residuals to the 2019 Patient Register (PR) and then made use of the 2011 PR address where it existed. This additionally provided an indication of whether unmatched death records might be attributable to migration into England and Wales post-2011. The prior linking method used NHS Number only. Although the overall linkage rate was approximately 90%, the rate for recent deaths (2nd March 2020 to 10th April 2020 in the first iteration of the linkage) was closer to 30% due to an administrative lag in adding NHS Numbers to death records. Our novel bespoke linkage method linked over 39,000 extra death records. Whilst this had minimal impact on the overall linkage rate, it improved the linkage rate for recent deaths to approximately 90%. This was without an impact on accuracy: clerical review demonstrated that the false positive rate was approximately 0.2%. A report was published using this data showing that the risk of death involving Covid-19 among some ethnic groups was significantly higher than others. Determining whether Covid-19 disproportionally affected certain ethnicities was of crucial importance in the early phase of the pandemic to enable appropriate government strategies to be developed. We delivered a bespoke linkage under an exceptional time-limit without compromising on accuracy, enabling this impactful analysis with nation-wide interest and impact.
{"title":"Bespoke automated linkage to enable analysis of covid deaths by ethnicity.","authors":"Shelley Gammon, R. Shipsey, Charlie Tomlin, Josie Plachta","doi":"10.23889/ijpds.v7i3.2050","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2050","url":null,"abstract":"In early 2020 there was intense media speculation that ethnicity and Covid-19 deaths were correlated. However, the existing method of adding ethnicity to death records resulted in low linkage rates for very recent deaths. We designed and implemented a bespoke linkage in three days enabling accurate reporting to the nation. \u0000We linked the 2011 England and Wales Census to death records using a range of personal identifiers. Due to time pressure, we focused on executing a single linkage method well. Deterministic linkage was chosen, using a variety of matchkeys which were tested via clerical review. To overcome the issue of addresses changing since 2011, we also linked 2020 death record residuals to the 2019 Patient Register (PR) and then made use of the 2011 PR address where it existed. This additionally provided an indication of whether unmatched death records might be attributable to migration into England and Wales post-2011. \u0000The prior linking method used NHS Number only. Although the overall linkage rate was approximately 90%, the rate for recent deaths (2nd March 2020 to 10th April 2020 in the first iteration of the linkage) was closer to 30% due to an administrative lag in adding NHS Numbers to death records. Our novel bespoke linkage method linked over 39,000 extra death records. Whilst this had minimal impact on the overall linkage rate, it improved the linkage rate for recent deaths to approximately 90%. This was without an impact on accuracy: clerical review demonstrated that the false positive rate was approximately 0.2%. A report was published using this data showing that the risk of death involving Covid-19 among some ethnic groups was significantly higher than others. \u0000Determining whether Covid-19 disproportionally affected certain ethnicities was of crucial importance in the early phase of the pandemic to enable appropriate government strategies to be developed. We delivered a bespoke linkage under an exceptional time-limit without compromising on accuracy, enabling this impactful analysis with nation-wide interest and impact.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46706686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1792
E. Jefferson, Aziz Sheik, S. Hopkins, P. Quinlan
ObjectivesCO-CONNECT is making UK COVID-19 data Findable, Accessible, Interoperable and Reusable (FAIR) through a federated platform, which supports secure, anonymised research at scale and pace. This interdisciplinary project, spanning 22 organisations, is connecting data from >50 large research cohorts and data collected through routine healthcare provision across the UK. ApproachAcross the UK, data has been collected that can help us answer key questions about COVID-19. As the data are in many places with many different processes it is difficult and complex for public health groups, researchers, policymakers, and government to find and access lots of high-quality data quickly and efficiently to make decisions. In collaboration with Health Data Research UK, CO-CONNECT is streamlining processes of accessing data for research. Results1) Discovering data and meta-analysis: CO-CONNECT enables researchers to determine how many people meet their research criteria within the various datasets across the UK through the Health Data Research Innovation Gateway Cohort Discovery tool e.g. “How many people in each dataset have had a PCR test which was positive and were under the age of 40?” Only summary level, anonymous data are provided so researchers can answer such questions rapidly without requiring multiple data governance permissions and directly contacting each data source. The tool also supports aggregate level meta-analysis of the data. 2) Detailed analysis: With data governance approvals, researchers can analyse detailed level, standardised, linked, pseudonymised data in a Trusted Research Environment. The common format reduces the effort on each research project, supporting rapid research. ConclusionProviding data in this de-identifiable, safe way enables rapid, robust research e.g., COVID-19 results from a test centre can be linked to hospital records along with prescriptions from pharmacies enabling researchers to understand whether people with different existing health conditions are more or less susceptible to COVID-19. If you want to know more visit https://co-connect.ac.uk.
{"title":"The COVID - Curated and Open aNalysis aNd rEsearCh plaTform (CO-CONNECT).","authors":"E. Jefferson, Aziz Sheik, S. Hopkins, P. Quinlan","doi":"10.23889/ijpds.v7i3.1792","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1792","url":null,"abstract":"ObjectivesCO-CONNECT is making UK COVID-19 data Findable, Accessible, Interoperable and Reusable (FAIR) through a federated platform, which supports secure, anonymised research at scale and pace. This interdisciplinary project, spanning 22 organisations, is connecting data from >50 large research cohorts and data collected through routine healthcare provision across the UK.\u0000ApproachAcross the UK, data has been collected that can help us answer key questions about COVID-19. As the data are in many places with many different processes it is difficult and complex for public health groups, researchers, policymakers, and government to find and access lots of high-quality data quickly and efficiently to make decisions. In collaboration with Health Data Research UK, CO-CONNECT is streamlining processes of accessing data for research.\u0000Results1) Discovering data and meta-analysis: CO-CONNECT enables researchers to determine how many people meet their research criteria within the various datasets across the UK through the Health Data Research Innovation Gateway Cohort Discovery tool e.g. “How many people in each dataset have had a PCR test which was positive and were under the age of 40?” Only summary level, anonymous data are provided so researchers can answer such questions rapidly without requiring multiple data governance permissions and directly contacting each data source. The tool also supports aggregate level meta-analysis of the data.\u00002) Detailed analysis: With data governance approvals, researchers can analyse detailed level, standardised, linked, pseudonymised data in a Trusted Research Environment. The common format reduces the effort on each research project, supporting rapid research.\u0000ConclusionProviding data in this de-identifiable, safe way enables rapid, robust research e.g., COVID-19 results from a test centre can be linked to hospital records along with prescriptions from pharmacies enabling researchers to understand whether people with different existing health conditions are more or less susceptible to COVID-19. If you want to know more visit https://co-connect.ac.uk.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47588335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2005
E. Jefferson, Christian Cole, Alba Crespi i Boixader, Simon Rogers, Maeve Malone, F. Ritchie, Jim Q. Smith, Francesco Tava, A. Daly, J. Beggs, Antony Chuter
ObjectivesTo assess a range of tools and methods to support Trusted Research Environments (TREs) to assess output from AI methods for potentially identifiable information, investigate the legal and ethical implications and controls, and produce a set of guidelines and recommendations to support all TREs with export controls of AI algorithms. ApproachTREs provide secure facilities to analyse confidential personal data, with staff checking outputs for disclosure risk before publication. Artificial intelligence (AI) has high potential to improve the linking and analysis of population data, and TREs are well suited to supporting AI modelling. However, TRE governance focuses on classical statistical data analysis. The size and complexity of AI models presents significant challenges for the disclosure-checking process. Models may be susceptible to external hacking: complicated methods to reverse engineer the learning process to find out about the data used for training, with more potential to lead to re-identification than conventional statistical methods. ResultsGRAIMatter is: Quantitatively assessing the risk of disclosure from different AI models exploring different models, hyper-parameter settings and training algorithms over common data types Evaluating a range of tools to determine effectiveness for disclosure control Assessing the legal and ethical implications of TREs supporting AI development and identifying aspects of existing legal and regulatory frameworks requiring reform. Running 4 PPIE workshops to understand their priorities and beliefs around safeguarding and securing data Developing a set of recommendations including suggested open-source toolsets for TREs to use to measure and reduce disclosure risk descriptions of the technical and legal controls and policies TREs should implement across the 5 Safes to support AI algorithm disclosure control training implications for both TRE staff and how they validate researchers ConclusionGRAIMatter is developing a set of usable recommendations for TREs to use to guard against the additional risks when disclosing trained AI models from TREs.
{"title":"GRAIMatter: Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMatter).","authors":"E. Jefferson, Christian Cole, Alba Crespi i Boixader, Simon Rogers, Maeve Malone, F. Ritchie, Jim Q. Smith, Francesco Tava, A. Daly, J. Beggs, Antony Chuter","doi":"10.23889/ijpds.v7i3.2005","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2005","url":null,"abstract":"ObjectivesTo assess a range of tools and methods to support Trusted Research Environments (TREs) to assess output from AI methods for potentially identifiable information, investigate the legal and ethical implications and controls, and produce a set of guidelines and recommendations to support all TREs with export controls of AI algorithms. \u0000ApproachTREs provide secure facilities to analyse confidential personal data, with staff checking outputs for disclosure risk before publication. Artificial intelligence (AI) has high potential to improve the linking and analysis of population data, and TREs are well suited to supporting AI modelling. However, TRE governance focuses on classical statistical data analysis. The size and complexity of AI models presents significant challenges for the disclosure-checking process. Models may be susceptible to external hacking: complicated methods to reverse engineer the learning process to find out about the data used for training, with more potential to lead to re-identification than conventional statistical methods. \u0000ResultsGRAIMatter is: \u0000 \u0000Quantitatively assessing the risk of disclosure from different AI models exploring different models, hyper-parameter settings and training algorithms over common data types \u0000Evaluating a range of tools to determine effectiveness for disclosure control \u0000Assessing the legal and ethical implications of TREs supporting AI development and identifying aspects of existing legal and regulatory frameworks requiring reform. \u0000Running 4 PPIE workshops to understand their priorities and beliefs around safeguarding and securing data \u0000Developing a set of recommendations including \u0000 \u0000suggested open-source toolsets for TREs to use to measure and reduce disclosure risk \u0000descriptions of the technical and legal controls and policies TREs should implement across the 5 Safes to support AI algorithm disclosure control \u0000training implications for both TRE staff and how they validate researchers \u0000 \u0000 \u0000 \u0000ConclusionGRAIMatter is developing a set of usable recommendations for TREs to use to guard against the additional risks when disclosing trained AI models from TREs.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49346277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1859
Tanya Ravipati, N. Andrew, V. Srikanth, R. Beare
ObjectivesPublic health service organisations use multiple patient administration and electronic health record systems. We describe the implementation of a data warehouse automation tool within the National Centre for Healthy Ageing (NCHA) data platform to operationalise a research data warehouse to optimise data quality and data provision for health services research. ApproachThe traditional data warehouse life cycle comprises repetitive manual tasks and dependency on specialist developers. Automation tools overcome most of these inefficiencies. We conducted an internal risk benefit analysis which was validated by published literature containing data warehouse optimisation and automation. Industry-based data warehouse automation tools were reviewed to align the NCHA requirements with the tool’s functionality. Tools were then shortlisted and evaluated over a six-week period: (1) automation of standard tasks; (2) data pipeline alignment with the World Health Organization’s (WHO) Data Quality Review Framework; and (3) resource dependency risk mitigation through a Proof of Concept (PoC). ResultsThe priority areas identified by the risk benefit analysis included: end-to-end data warehouse automation; auto scripting; connectivity/linkage with multiple sources, reverse/forward engineering, audit trail conformance, scalability, multiple data warehouse architectures support, automated documentation; data management including data quality; and post-subscription independence. Twenty scientific publications were included in the final literature review (10% within healthcare) and supported the majority of identified priority areas. The industry-based review identified 11 suitable data warehouse/Extract-Transform-Load (ETL) automation tools. Five tools demonstrated adequate performance for task automation, data quality management, reduced dependency on specialist developers and on-premise linkage compatibility. Two automation tools were tested each for 6 weeks through PoC development. One automation tool met 8 out of the 10 automation requirements and was selected for implementation. ConclusionData warehouse development processes are complex and time consuming. Tools that offer automation of repetitive tasks and scripting increase the consistency while reducing the dependency on specialist staff. Integrated data quality management minimises the time researchers spend in pre-processing patient level data sourced through a semi-automated data warehouse.
{"title":"Challenges in public healthcare research data warehouse integration and operationalisation.","authors":"Tanya Ravipati, N. Andrew, V. Srikanth, R. Beare","doi":"10.23889/ijpds.v7i3.1859","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1859","url":null,"abstract":"ObjectivesPublic health service organisations use multiple patient administration and electronic health record systems. We describe the implementation of a data warehouse automation tool within the National Centre for Healthy Ageing (NCHA) data platform to operationalise a research data warehouse to optimise data quality and data provision for health services research. \u0000ApproachThe traditional data warehouse life cycle comprises repetitive manual tasks and dependency on specialist developers. Automation tools overcome most of these inefficiencies. We conducted an internal risk benefit analysis which was validated by published literature containing data warehouse optimisation and automation. Industry-based data warehouse automation tools were reviewed to align the NCHA requirements with the tool’s functionality. Tools were then shortlisted and evaluated over a six-week period: (1) automation of standard tasks; (2) data pipeline alignment with the World Health Organization’s (WHO) Data Quality Review Framework; and (3) resource dependency risk mitigation through a Proof of Concept (PoC). \u0000ResultsThe priority areas identified by the risk benefit analysis included: end-to-end data warehouse automation; auto scripting; connectivity/linkage with multiple sources, reverse/forward engineering, audit trail conformance, scalability, multiple data warehouse architectures support, automated documentation; data management including data quality; and post-subscription independence. Twenty scientific publications were included in the final literature review (10% within healthcare) and supported the majority of identified priority areas. The industry-based review identified 11 suitable data warehouse/Extract-Transform-Load (ETL) automation tools. Five tools demonstrated adequate performance for task automation, data quality management, reduced dependency on specialist developers and on-premise linkage compatibility. Two automation tools were tested each for 6 weeks through PoC development. One automation tool met 8 out of the 10 automation requirements and was selected for implementation. \u0000ConclusionData warehouse development processes are complex and time consuming. Tools that offer automation of repetitive tasks and scripting increase the consistency while reducing the dependency on specialist staff. Integrated data quality management minimises the time researchers spend in pre-processing patient level data sourced through a semi-automated data warehouse.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41317681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1957
M. Chartier, G. Munro, D. Jiang, Scott C McCulloch, Wendy Au, M. Brownell, Rob Santos, F. Turner, Leanne Boyd, Nora Murdock, J. Bolton, J. Sareen
ObjectivesPAX, a mental health promotion approach, has been shown to decrease negative mental health outcomes and improve academic achievement. These effects have yet to be shown among Indigenous children. We evaluated PAX for improving First Nations children’s outcomes following a research process wherein community members and researchers work more collaboratively. ApproachBuilding on a long-term relationship with Swampy Cree Tribal Council, community members, First Nations leaders and researchers worked together through all phases of the project. This cluster randomized controlled trial used population-based health, social services, and education administrative data that allowed de-identified individual-level linkages across all databases through a scrambled health number. Our cohort of 725 children from 20 First Nations schools were randomized to PAX (n=469, 11 schools) or wait-list control (n=256, 9 schools). We used propensity score weighting and multi-level modeling to estimate the differences over time (2011 up to 2020) between children exposed to PAX and those who were not. ResultsDifferences in baseline characteristics were found between the two groups of children, despite the cluster randomization. After applying propensity score weights, children in the PAX group had significantly greater decreases in conduct problems (β:-1.08, standard error(se):0.2505, p<.0001), hyperactivity (β:-1.13, se:0.3617, p=.0018 ), and peer problems (β:-1.10, se:0.3043, p=.0003) and a greater increase in prosocial scores (β:2.68, se:0.4139, p<.0001) than control group children. The percentage of children in the PAX group who met academic expectations was higher than those in the control group, however, only grade 3 numeracy (odds ratio (OR):4.30, confidence interval (CI):1.34 – 13.77) and grade 8 reading and writing (OR:2.78, CI:1.01 – 7.67) met statistical significance. We found no evidence that PAX was associated with less emotional problems, diagnosed mental disorders or better student engagement. ConclusionThese findings suggest that PAX was effective in improving First Nations children’s mental health and academic outcomes in First Nations communities. Examining what works in Indigenous communities is crucial because approaches that are effective in some populations may not necessarily be culturally appropriate for remote Indigenous communities.
{"title":"Is PAX-Good Behaviour Game (PAX) Associated with Better Mental Health and Educational Outcomes for First Nations Children?","authors":"M. Chartier, G. Munro, D. Jiang, Scott C McCulloch, Wendy Au, M. Brownell, Rob Santos, F. Turner, Leanne Boyd, Nora Murdock, J. Bolton, J. Sareen","doi":"10.23889/ijpds.v7i3.1957","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1957","url":null,"abstract":"ObjectivesPAX, a mental health promotion approach, has been shown to decrease negative mental health outcomes and improve academic achievement. These effects have yet to be shown among Indigenous children. We evaluated PAX for improving First Nations children’s outcomes following a research process wherein community members and researchers work more collaboratively. \u0000ApproachBuilding on a long-term relationship with Swampy Cree Tribal Council, community members, First Nations leaders and researchers worked together through all phases of the project. This cluster randomized controlled trial used population-based health, social services, and education administrative data that allowed de-identified individual-level linkages across all databases through a scrambled health number. Our cohort of 725 children from 20 First Nations schools were randomized to PAX (n=469, 11 schools) or wait-list control (n=256, 9 schools). We used propensity score weighting and multi-level modeling to estimate the differences over time (2011 up to 2020) between children exposed to PAX and those who were not. \u0000ResultsDifferences in baseline characteristics were found between the two groups of children, despite the cluster randomization. After applying propensity score weights, children in the PAX group had significantly greater decreases in conduct problems (β:-1.08, standard error(se):0.2505, p<.0001), hyperactivity (β:-1.13, se:0.3617, p=.0018 ), and peer problems (β:-1.10, se:0.3043, p=.0003) and a greater increase in prosocial scores (β:2.68, se:0.4139, p<.0001) than control group children. The percentage of children in the PAX group who met academic expectations was higher than those in the control group, however, only grade 3 numeracy (odds ratio (OR):4.30, confidence interval (CI):1.34 – 13.77) and grade 8 reading and writing (OR:2.78, CI:1.01 – 7.67) met statistical significance. We found no evidence that PAX was associated with less emotional problems, diagnosed mental disorders or better student engagement. \u0000ConclusionThese findings suggest that PAX was effective in improving First Nations children’s mental health and academic outcomes in First Nations communities. Examining what works in Indigenous communities is crucial because approaches that are effective in some populations may not necessarily be culturally appropriate for remote Indigenous communities.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46963793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}