Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1991
Elisa Jones, L. Frith, A. Chiumento, S. Rodgers, Alan Clarke, S. Markham
ObjectivesPublic involvement and engagement (PIE)) is playing an increasingly important role in big data initiatives and projects. It is therefore important to gain a deeper understanding of the different approaches used. ApproachThis study explores PIE using ethnographically-informed qualitative case studies. The case studies include: three citizen juries, each one carried out over eight days and that asked jurors to consider different real-world health data initiatives; and a public panel set up by a regional databank that carries out data linking. Data collection is ongoing and I will be continuing to carry out close observations of activities, and conducting semi-structured 1:1 interviews with those that organise and have taken part in the activities. ResultsData collection so far comprises completed observations at the citizen juries (~96 hours), ongoing observations of the public panel meetings (~15 hours), and thirty semi-structured 1:1 interviews with public contributors and other stakeholders about their experiences of the activities they were involved in. Early data analysis indicates key themes of: jurors feeling heard, but unsure whether anybody was listening; stakeholders being impressed by informed jurors, but raising concerns over contributors becoming too ‘expert’; how who is at the table and what information is presented impacts what is discussed; differences between online and in-person participation; and public involvement not being a substitute for informing the public about how their data is used. Conclusion‘Who’ is involved, and ‘how’ PPIE activities are designed and run can facilitate or constrain discussion, enhancing or limiting public contributions. If public involvement is to achieve its aims, including increasing trustworthiness, deeper consideration of these factors by those who seek the public’s views in their data projects is recommended.
{"title":"Public involvement in big data projects: an ethnographically-informed study.","authors":"Elisa Jones, L. Frith, A. Chiumento, S. Rodgers, Alan Clarke, S. Markham","doi":"10.23889/ijpds.v7i3.1991","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1991","url":null,"abstract":"ObjectivesPublic involvement and engagement (PIE)) is playing an increasingly important role in big data initiatives and projects. It is therefore important to gain a deeper understanding of the different approaches used. \u0000ApproachThis study explores PIE using ethnographically-informed qualitative case studies. The case studies include: three citizen juries, each one carried out over eight days and that asked jurors to consider different real-world health data initiatives; and a public panel set up by a regional databank that carries out data linking. Data collection is ongoing and I will be continuing to carry out close observations of activities, and conducting semi-structured 1:1 interviews with those that organise and have taken part in the activities. \u0000ResultsData collection so far comprises completed observations at the citizen juries (~96 hours), ongoing observations of the public panel meetings (~15 hours), and thirty semi-structured 1:1 interviews with public contributors and other stakeholders about their experiences of the activities they were involved in. Early data analysis indicates key themes of: jurors feeling heard, but unsure whether anybody was listening; stakeholders being impressed by informed jurors, but raising concerns over contributors becoming too ‘expert’; how who is at the table and what information is presented impacts what is discussed; differences between online and in-person participation; and public involvement not being a substitute for informing the public about how their data is used. \u0000Conclusion‘Who’ is involved, and ‘how’ PPIE activities are designed and run can facilitate or constrain discussion, enhancing or limiting public contributions. If public involvement is to achieve its aims, including increasing trustworthiness, deeper consideration of these factors by those who seek the public’s views in their data projects is recommended.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43560839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1852
M. Chartier, W. Phillips-Beck, M. Brownell, L. Star, Nora Murdock, Wendy Au, J. Bowes, Brooke Cochrane, R. Campbell
ObjectivesGiven the impact of colonization and responding to Canada’s Truth and Reconciliation Commission, we aimed to provide baseline measures of First Nations children’s health and social outcomes in Manitoba, Canada. We also aimed to create a research process where Indigenous and non-Indigenous researchers work collaboratively and in culturally safe ways. ApproachWe formed a team consisting of members of First Nation organizations and academic researchers. Knowledge Keepers from Anishinaabe, Cree, Anishininew, Dakota and Dene Nations guided the study, interpreted results and ensured meaningful knowledge translation. This retrospective cohort study utilized population-based health, social services, education and justice administrative data that allowed de-identified individual-level linkages across all databases through a scrambled health number. Adjusted rates and rate ratios were calculated using a generalized liner modeling approach to compare First Nations children (n=61,726) and all other Manitoba children (n=279,087) and comparing First Nations children living on and off-reserve. ResultsLarge disparities between First Nations and other Manitoba children were found in birth outcomes, physical and mental health, health services, education, social services, justice system involvement and mortality. First Nations infants had higher rates of preterm births, large-for-gestational-age births, newborn readmissions to hospital and lower rates of breastfeeding initiation compared with other Manitoba infants. Suicide rates among First Nations adolescents were ten times higher than among other adolescents in Manitoba, yet we found few differences in diagnosis of mood and anxiety disorders between the groups. First Nations children were also seven times more likely to apprehended by child protection services and youth were ten times more likely to be criminally accused. Knowledge Keepers offered their perspectives on these findings. ConclusionThese findings demonstrate that an enormous amount of work is required in virtually every area – health, social, education and justice – to improve First Nations children’s lives. There is an urgent need for equitable access to services, and these services should be self-determined, planned and implemented by First Nations people.
{"title":"Our Children, Our Future: The Health and Well-being of First Nations Children in Manitoba, Canada.","authors":"M. Chartier, W. Phillips-Beck, M. Brownell, L. Star, Nora Murdock, Wendy Au, J. Bowes, Brooke Cochrane, R. Campbell","doi":"10.23889/ijpds.v7i3.1852","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1852","url":null,"abstract":"ObjectivesGiven the impact of colonization and responding to Canada’s Truth and Reconciliation Commission, we aimed to provide baseline measures of First Nations children’s health and social outcomes in Manitoba, Canada. We also aimed to create a research process where Indigenous and non-Indigenous researchers work collaboratively and in culturally safe ways. \u0000ApproachWe formed a team consisting of members of First Nation organizations and academic researchers. Knowledge Keepers from Anishinaabe, Cree, Anishininew, Dakota and Dene Nations guided the study, interpreted results and ensured meaningful knowledge translation. This retrospective cohort study utilized population-based health, social services, education and justice administrative data that allowed de-identified individual-level linkages across all databases through a scrambled health number. Adjusted rates and rate ratios were calculated using a generalized liner modeling approach to compare First Nations children (n=61,726) and all other Manitoba children (n=279,087) and comparing First Nations children living on and off-reserve. \u0000ResultsLarge disparities between First Nations and other Manitoba children were found in birth outcomes, physical and mental health, health services, education, social services, justice system involvement and mortality. First Nations infants had higher rates of preterm births, large-for-gestational-age births, newborn readmissions to hospital and lower rates of breastfeeding initiation compared with other Manitoba infants. Suicide rates among First Nations adolescents were ten times higher than among other adolescents in Manitoba, yet we found few differences in diagnosis of mood and anxiety disorders between the groups. First Nations children were also seven times more likely to apprehended by child protection services and youth were ten times more likely to be criminally accused. Knowledge Keepers offered their perspectives on these findings. \u0000ConclusionThese findings demonstrate that an enormous amount of work is required in virtually every area – health, social, education and justice – to improve First Nations children’s lives. There is an urgent need for equitable access to services, and these services should be self-determined, planned and implemented by First Nations people.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45279857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2037
Jason W. Flindall, Saiganesh Dhannewar, Mikhail Skrigitil, Siddharth Chadda, Samantha Magnus, Heather Richards, L. Corscadden
ObjectiveWhile overall health service use declined following the start of the pandemic, the aim of this analysis is to generate insights to inform public health priorities by identifying higher-than-expected patterns of health care service use for some health condition and population groups. ApproachHealth care encounters for hospital, emergency department, and primary care encounters between 2011 and 2021 were categorized into condition groups according to the CIHI Population Grouping Methodology (British Columbia version). Actual health condition encounters were compared with ARIMA-based encounter forecasts to identify conditions with different-from-expected encounter rates in 2020 and 2021. For each of 225 CIHI-defined health conditions, we identified health conditions for which service use was higher-than-expected. Area-based socioeconomic status and virtual care visit data are examined to further explore conditions that continue to differ from their pre-pandemic encounter patterns. ResultsThis analysis demonstrates that some health condition groups have seen dramatic increases in service use. The three most impacted groups with higher-than-expected encounters are hypercholesterolaemia/high cholesterol [47.8% increase in average monthly encounters since 2019], emotional and behavioural disorder (w/onset generally in childhood) [+37.3%] and neurotic/anxiety/obsessive compulsive disorder [+28.0%]. Since the start of the pandemic in British Columbia, the health condition groups with both the highest volumes of services and higher than expected service use included: hypercholesterolemia & hypothyroidism, mental health conditions (eating disorder, depression, and others), hypertension and heart failure, and diabetes. Additional descriptive analysis explores potential inequities in encounters by socio-economic status and how virtual care has changed service patterns. ConclusionIncreased service use may reflect greater need, better access to virtual care or potential changes in diagnoses. Identifying patterns of higher-than-expected use can support program planning to address growing need in certain regions or populations. Additional exploration will be undertaken to examine lower-than-expected service use as potential unmet need.
{"title":"Pandemic effects on health condition specific healthcare encounters in British Columbia, Canada.","authors":"Jason W. Flindall, Saiganesh Dhannewar, Mikhail Skrigitil, Siddharth Chadda, Samantha Magnus, Heather Richards, L. Corscadden","doi":"10.23889/ijpds.v7i3.2037","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2037","url":null,"abstract":"ObjectiveWhile overall health service use declined following the start of the pandemic, the aim of this analysis is to generate insights to inform public health priorities by identifying higher-than-expected patterns of health care service use for some health condition and population groups. \u0000ApproachHealth care encounters for hospital, emergency department, and primary care encounters between 2011 and 2021 were categorized into condition groups according to the CIHI Population Grouping Methodology (British Columbia version). Actual health condition encounters were compared with ARIMA-based encounter forecasts to identify conditions with different-from-expected encounter rates in 2020 and 2021. For each of 225 CIHI-defined health conditions, we identified health conditions for which service use was higher-than-expected. Area-based socioeconomic status and virtual care visit data are examined to further explore conditions that continue to differ from their pre-pandemic encounter patterns. \u0000ResultsThis analysis demonstrates that some health condition groups have seen dramatic increases in service use. The three most impacted groups with higher-than-expected encounters are hypercholesterolaemia/high cholesterol [47.8% increase in average monthly encounters since 2019], emotional and behavioural disorder (w/onset generally in childhood) [+37.3%] and neurotic/anxiety/obsessive compulsive disorder [+28.0%]. Since the start of the pandemic in British Columbia, the health condition groups with both the highest volumes of services and higher than expected service use included: hypercholesterolemia & hypothyroidism, mental health conditions (eating disorder, depression, and others), hypertension and heart failure, and diabetes. Additional descriptive analysis explores potential inequities in encounters by socio-economic status and how virtual care has changed service patterns. \u0000ConclusionIncreased service use may reflect greater need, better access to virtual care or potential changes in diagnoses. Identifying patterns of higher-than-expected use can support program planning to address growing need in certain regions or populations. Additional exploration will be undertaken to examine lower-than-expected service use as potential unmet need.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44434002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1801
R. Urquhart, C. Kendell, Julia Kaal, J. Vickery, L. Lethbridge
ObjectivesTo link population-based survey data to routinely collected administrative health data to enable investigation of how cancer survivors' ongoing physical, emotional, and practical needs and experiences after completing cancer treatment impact their healthcare utilization, including discharge from oncology to primary care. ApproachThe "Cancer Transitions Survey" is a population-based survey examining survivors' experiences and needs after completing cancer treatment. The survey was administered by the Nova Scotia Cancer Registry (NSCR) as part of a national study, the largest of its kind in Canada. Respondents included Nova Scotian survivors of breast, melanoma, colorectal, prostate, hematologic, and young adult cancers who were 1-3 years after treatment. Survey responses were linked to cancer registry, physicians' claims, hospitalization, and ambulatory care data. The data linkage provided a full four years of healthcare utilization data for each cancer survivor, beginning one year after their cancer diagnosis. Results1557 survivors responded to the survey and subsequently had their data linked. Collectively, breast, colorectal, and prostate cancer survivors represented 78.5% of survey respondents. Most respondents (65.3%) were 65 years of age or older and 69.8% had an existing co-morbid condition. Regression analyses are now being conducted to investigate whether the type and magnitude of post-treatment care needs, and the interventions (services and supports) received, impact health care utilization in the survivorship period, including discharge to primary care. ConclusionThis study represents a unique opportunity to link data unavailable in administrative health data: namely, self-reported needs and use of non-physician services and supports (e.g., support groups, counselling). As such, this dataset permits investigation of healthcare utilization and patterns of care that cannot be accomplished using administrative health data alone.
{"title":"Understanding how cancer survivors’ needs and experiences after treatment impact their health care utilization: a survey-administrative health data linkage study.","authors":"R. Urquhart, C. Kendell, Julia Kaal, J. Vickery, L. Lethbridge","doi":"10.23889/ijpds.v7i3.1801","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1801","url":null,"abstract":"ObjectivesTo link population-based survey data to routinely collected administrative health data to enable investigation of how cancer survivors' ongoing physical, emotional, and practical needs and experiences after completing cancer treatment impact their healthcare utilization, including discharge from oncology to primary care. \u0000ApproachThe \"Cancer Transitions Survey\" is a population-based survey examining survivors' experiences and needs after completing cancer treatment. The survey was administered by the Nova Scotia Cancer Registry (NSCR) as part of a national study, the largest of its kind in Canada. Respondents included Nova Scotian survivors of breast, melanoma, colorectal, prostate, hematologic, and young adult cancers who were 1-3 years after treatment. Survey responses were linked to cancer registry, physicians' claims, hospitalization, and ambulatory care data. The data linkage provided a full four years of healthcare utilization data for each cancer survivor, beginning one year after their cancer diagnosis. \u0000Results1557 survivors responded to the survey and subsequently had their data linked. Collectively, breast, colorectal, and prostate cancer survivors represented 78.5% of survey respondents. Most respondents (65.3%) were 65 years of age or older and 69.8% had an existing co-morbid condition. Regression analyses are now being conducted to investigate whether the type and magnitude of post-treatment care needs, and the interventions (services and supports) received, impact health care utilization in the survivorship period, including discharge to primary care. \u0000ConclusionThis study represents a unique opportunity to link data unavailable in administrative health data: namely, self-reported needs and use of non-physician services and supports (e.g., support groups, counselling). As such, this dataset permits investigation of healthcare utilization and patterns of care that cannot be accomplished using administrative health data alone.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44581340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1851
V. Harish, Mathieu Ravaut, S. Yi, Jahir M. Gutierrez, H. Sadeghi, Kin Kwan Leung, T. Watson, K. Kornas, T. Poutanen, M. Volkovs, L. Rosella
There has been considerable growth in the development of machine learning models for clinical applications; however, less attention has been paid to applications at the health systems level. Here, we survey recent models developed using provincial administrative health data holdings in Ontario, Canada to synthesize key learnings across use cases. We have developed four models in the areas of diabetes incidence and complications, hospitalization due to ambulatory care sensitive conditions, and hospitalization due to SARS-CoV-2 infection. Our team was highly multidisciplinary with expertise across clinical medicine, administrative health data, epidemiology, and computer science. We used a “sliding window” approach to aggregate healthcare events across multiple health administrative data sets chronologically and map them dynamically onto a patient timeline. Tree-based algorithms, specifically gradient boosted decision trees, are well suited for the underlying tabular structure of administrative data and were used for each prediction task. Our models achieved excellent discrimination, measured by the area under the receiver operating characteristic curve, between 0.77-0.85 at prediction windows between 30 days and 3 years in advance. They were also well-calibrated, both in-the-large and in population subgroups such as older adults, those living in rural areas, and the materially deprived. Measures of feature importance revealed that our models were leveraging predictors across administrative datasets (e.g. demographics, interactions with the healthcare system, medications) in intuitive and defensible ways. Finally, we demonstrated the utility of our models with “recall at top k” metrics - for example, the top 1% of patients predicted at risk of diabetes complications represented a cost of over $400 million to the healthcare system. We identify three key learnings needed for the successful application of machine learning methods to health administrative data: synergy between nature of training data and intended algorithm use, adherence to methodological best practices for rigour and transparency, and multidisciplinary teams with expertise across data provenance, methodological approach, and impact assessment.
{"title":"Developing Machine Learning Algorithms on Routinely Collected Administrative Health Data - Lessons from Ontario, Canada.","authors":"V. Harish, Mathieu Ravaut, S. Yi, Jahir M. Gutierrez, H. Sadeghi, Kin Kwan Leung, T. Watson, K. Kornas, T. Poutanen, M. Volkovs, L. Rosella","doi":"10.23889/ijpds.v7i3.1851","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1851","url":null,"abstract":"There has been considerable growth in the development of machine learning models for clinical applications; however, less attention has been paid to applications at the health systems level. Here, we survey recent models developed using provincial administrative health data holdings in Ontario, Canada to synthesize key learnings across use cases. \u0000We have developed four models in the areas of diabetes incidence and complications, hospitalization due to ambulatory care sensitive conditions, and hospitalization due to SARS-CoV-2 infection. Our team was highly multidisciplinary with expertise across clinical medicine, administrative health data, epidemiology, and computer science. We used a “sliding window” approach to aggregate healthcare events across multiple health administrative data sets chronologically and map them dynamically onto a patient timeline. Tree-based algorithms, specifically gradient boosted decision trees, are well suited for the underlying tabular structure of administrative data and were used for each prediction task. \u0000Our models achieved excellent discrimination, measured by the area under the receiver operating characteristic curve, between 0.77-0.85 at prediction windows between 30 days and 3 years in advance. They were also well-calibrated, both in-the-large and in population subgroups such as older adults, those living in rural areas, and the materially deprived. Measures of feature importance revealed that our models were leveraging predictors across administrative datasets (e.g. demographics, interactions with the healthcare system, medications) in intuitive and defensible ways. Finally, we demonstrated the utility of our models with “recall at top k” metrics - for example, the top 1% of patients predicted at risk of diabetes complications represented a cost of over $400 million to the healthcare system. \u0000We identify three key learnings needed for the successful application of machine learning methods to health administrative data: synergy between nature of training data and intended algorithm use, adherence to methodological best practices for rigour and transparency, and multidisciplinary teams with expertise across data provenance, methodological approach, and impact assessment.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44624157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1865
R. Trubey, I. Thomas, R. Cannings‐John, Peter Mackie
ObjectivesAdministrative data linkage is relatively under-utilised as a way of generating evidence to guide homelessness policy and service delivery in the UK. Our objective is to contribute insight into the ethical, legal, and practical challenges of using data linkage with data from people experiencing homelessness (PEH). ApproachWe outline the data collection and linkage methodologies for two UK-based studies related to PEH. The first design aimed to explore the acceptability and feasibility of consented linkage of trial data (‘Moving On’ trial) to NHS Digital records in a cohort of recruited PEH in two English local authorities (n=50). The second design used administrative data originating from a local authority homelessness service in Wales (n=17,000 cases) to explore educational outcomes of children in homeless households. The resultant data linkage rates are contrasted and discussed in relation to the mechanisms for obtaining and linking personal data. ResultsThe Moving On trial demonstrated high rates of consent for data linkage and the ability to collect sufficient personal identifiable data to increase the chance of successful matching. Aggregate match rates will be discussed. Of the roughly 17,000 cases included in the local authority administrative data, 75% could be linked to unique individuals using probabilistic matching and were therefor ‘useable’ in linkage research. The proportion of useable cases rapidly decreased as the cut-off for matching quality was increased, to roughly 50% of cases being useable when a 99% match probability cut-off was used. Matching rates were higher amongst priority need homeless cases, possibly reflecting business need to identify and work closely with these people. ConclusionWhere homelessness administrative data systems are not designed to enable data linkage, low matching rates can result, reducing study sample sizes and potentially leading to bias towards more extreme cases of homelessness if missed-matches are not random. Consented linkage within large-scale trials offers one possibility for generating long-term evidence.
{"title":"Linkage of people experiencing homeless using two consent models.","authors":"R. Trubey, I. Thomas, R. Cannings‐John, Peter Mackie","doi":"10.23889/ijpds.v7i3.1865","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1865","url":null,"abstract":"ObjectivesAdministrative data linkage is relatively under-utilised as a way of generating evidence to guide homelessness policy and service delivery in the UK. Our objective is to contribute insight into the ethical, legal, and practical challenges of using data linkage with data from people experiencing homelessness (PEH). \u0000ApproachWe outline the data collection and linkage methodologies for two UK-based studies related to PEH. The first design aimed to explore the acceptability and feasibility of consented linkage of trial data (‘Moving On’ trial) to NHS Digital records in a cohort of recruited PEH in two English local authorities (n=50). The second design used administrative data originating from a local authority homelessness service in Wales (n=17,000 cases) to explore educational outcomes of children in homeless households. The resultant data linkage rates are contrasted and discussed in relation to the mechanisms for obtaining and linking personal data. \u0000ResultsThe Moving On trial demonstrated high rates of consent for data linkage and the ability to collect sufficient personal identifiable data to increase the chance of successful matching. Aggregate match rates will be discussed. Of the roughly 17,000 cases included in the local authority administrative data, 75% could be linked to unique individuals using probabilistic matching and were therefor ‘useable’ in linkage research. The proportion of useable cases rapidly decreased as the cut-off for matching quality was increased, to roughly 50% of cases being useable when a 99% match probability cut-off was used. Matching rates were higher amongst priority need homeless cases, possibly reflecting business need to identify and work closely with these people. \u0000ConclusionWhere homelessness administrative data systems are not designed to enable data linkage, low matching rates can result, reducing study sample sizes and potentially leading to bias towards more extreme cases of homelessness if missed-matches are not random. Consented linkage within large-scale trials offers one possibility for generating long-term evidence.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44647974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1872
R. Borschmann, C. Keen, J. Young, S. Kinner
ObjectivesPeople released from incarceration are at increased risk of death from diverse causes. We aimed to calculate the incidence of all-cause and cause-specific death after release from incarceration and identify individual-level risk factors for death. ApproachWe conducted a series of individual participant data meta-analyses using data from >1.3 million adults released from incarceration in eight countries from 1980-2018. We used random effects meta-analysis to estimate the pooled all-cause and cause-specific crude mortality rates (CMRs), with 95% confidence intervals (CI) for the entire follow-up period, and for specific time periods after release from incarceration, overall and stratified by age, sex, and region. ResultsWe included 1,395,318 people, 10,164,341 person-years of follow-up time, and 72,920 deaths in our analyses. The overall pooled CMR was 727 (95%CI: 623-840) per 100,000 person-years, with no difference between males and females. The risk of death was highest during the first week following release (all-cause CMR: 1,612, 95%CI: 1048-2,287, I2=91.5%), and the three most common causes of death across the entire follow-up period were 1) alcohol and other drug poisoning (CMR=144, 95%CI: 99-197); 2) cardiovascular disease (CMR: 102, 95%CI: 85-121); and 3) cancer and other neoplasms (CMR=74, 95%CI: 85-121). Leading causes of death varied across time periods following release from incarceration. ConclusionOur findings indicate the need for routine monitoring of mortality following release from incarceration. The distribution of cause of death varies over time, such that clinical decision-making needs to be informed by the proximity to release from incarceration. The elevated risk of death in first 7 days following release highlights the urgent need for coordinated transitional care – including substance use and mental health treatment – and injury prevention initiatives.
{"title":"Increased risk of death following release from incarceration: an individual participant data meta-analysis of 1,314,568 adults in eight countries.","authors":"R. Borschmann, C. Keen, J. Young, S. Kinner","doi":"10.23889/ijpds.v7i3.1872","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1872","url":null,"abstract":"ObjectivesPeople released from incarceration are at increased risk of death from diverse causes. We aimed to calculate the incidence of all-cause and cause-specific death after release from incarceration and identify individual-level risk factors for death. \u0000ApproachWe conducted a series of individual participant data meta-analyses using data from >1.3 million adults released from incarceration in eight countries from 1980-2018. We used random effects meta-analysis to estimate the pooled all-cause and cause-specific crude mortality rates (CMRs), with 95% confidence intervals (CI) for the entire follow-up period, and for specific time periods after release from incarceration, overall and stratified by age, sex, and region. \u0000ResultsWe included 1,395,318 people, 10,164,341 person-years of follow-up time, and 72,920 deaths in our analyses. The overall pooled CMR was 727 (95%CI: 623-840) per 100,000 person-years, with no difference between males and females. The risk of death was highest during the first week following release (all-cause CMR: 1,612, 95%CI: 1048-2,287, I2=91.5%), and the three most common causes of death across the entire follow-up period were 1) alcohol and other drug poisoning (CMR=144, 95%CI: 99-197); 2) cardiovascular disease (CMR: 102, 95%CI: 85-121); and 3) cancer and other neoplasms (CMR=74, 95%CI: 85-121). Leading causes of death varied across time periods following release from incarceration. \u0000ConclusionOur findings indicate the need for routine monitoring of mortality following release from incarceration. The distribution of cause of death varies over time, such that clinical decision-making needs to be informed by the proximity to release from incarceration. The elevated risk of death in first 7 days following release highlights the urgent need for coordinated transitional care – including substance use and mental health treatment – and injury prevention initiatives.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42381383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1848
Maya Murmann, Douglas Manuel
Population covering educational attainment registers have been proven helpful for planning and research concerning educational efforts. Regular linking of different databases is needed to build and update such a register. Without unique national identification numbers, record linkage must be based on quasi-identifiers such as names, date of birth and sex. High-quality record linkage require the unique identification of persons. Therefore, available identifiers should be sufficient for unique identification despite missing identifiers for some cases. Redundant identifiers can achieve this goal. However, the data protection principle of data minimization, as recommended in the European General Data Protection Regulation, aims to avoid additional data if possible for the given purpose. Therefore, a ministry commissioned a simulation study to inform legislators on the minimum set of identifiers needed for a national register. A microsimulation of the population consisting of nearly 20 million people was implemented to generate data on accumulating changes and errors in identifiers over ten simulated years. The simulation covered, for example, international migration, regional mobility, marriages, school careers and mortality. Each event triggered changes of identifiers according to specified error probability models. The resulting data were linked by different record-linkage procedures. Linkage quality and linkage bias dependent on the available identifiers were assessed. We report on the design of the simulation study, the linkage results and recommendations for the minimum set of identifiers. The results may be helpful for the design of other population covering registers.
{"title":"Microsimulation of an educational attainment register to study record linkage quality.","authors":"Maya Murmann, Douglas Manuel","doi":"10.23889/ijpds.v7i3.1848","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1848","url":null,"abstract":"Population covering educational attainment registers have been proven helpful for planning and research concerning educational efforts. Regular linking of different databases is needed to build and update such a register. Without unique national identification numbers, record linkage must be based on quasi-identifiers such as names, date of birth and sex. High-quality record linkage require the unique identification of persons. Therefore, available identifiers should be sufficient for unique identification despite missing identifiers for some cases. Redundant identifiers can achieve this goal. However, the data protection principle of data minimization, as recommended in the European General Data Protection Regulation, aims to avoid additional data if possible for the given purpose. Therefore, a ministry commissioned a simulation study to inform legislators on the minimum set of identifiers needed for a national register. A microsimulation of the population consisting of nearly 20 million people was implemented to generate data on accumulating changes and errors in identifiers over ten simulated years. The simulation covered, for example, international migration, regional mobility, marriages, school careers and mortality. Each event triggered changes of identifiers according to specified error probability models. The resulting data were linked by different record-linkage procedures. Linkage quality and linkage bias dependent on the available identifiers were assessed. We report on the design of the simulation study, the linkage results and recommendations for the minimum set of identifiers. The results may be helpful for the design of other population covering registers.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47137022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1942
M. Janus, Jeanne Sinclair, J. Hove, Scott Davies
ObjectivesThe objective of this study was to establish a partnership between a university and a jurisdictional education body (Education Quality and Assessment Organization, EQAO) which would allow creation of a linked dataset from kindergarten to later grades in order to examine educational trajectory in mathematics in Ontario. ApproachBuilding on mutual goals of improving the understanding of children’s learning trajectories, we developed a project with an investigator team that included university researchers and representatives of the provincial educational assessment body, to link a database of child development status in kindergarten (Early Development Instrument/EDI data, including neighbourhood socioeconomic/SES index) with academic assessment EQAO data, and received research funding. A deterministic matching process was employed to match the datasets. We examined differences between the unmatched and fully matched cases and constructed a growth mixture model of math scores in grades 3, 6 and 9, with key EDI/SES variables as covariates. ResultsDespite lacking a common identifier, we successfully matched approximately 50% of the EDI cases from 2002-2014 (n=183,771). Effect sizes indicated negligible differences between matched and unmatched, except for SES and child development status, which were poorer for unmatched group. A 3-class solution was the best fit for a 20,000-person subsample of math trajectories based on AIC, BIC, ICL, and entropy values as well as sufficiently high proportions of posterior probabilities, which indicate confidence in class membership. 61% of sample showed steady moderate-high achievement; 9% started high, but declined, and 30% deteriorated then improved. Males, children in low SES, and those with adequate kindergarten EDI outcomes had better math achievement trajectories than females, children in high SES, and those with poor kindergarten outcomes. ConclusionGiven the two datasets were collected without explicit linkage plan, the matching was only 50%, nevertheless resulting in a large database that allows study of early development antecedents of students’ educational trajectories. The partnership between university and EQAO ensures a wide dissemination of results in both academia and policy worlds.
{"title":"Building partnerships, capacity, and knowledge through a use of newly linked child development and education datasets in Ontario, Canada.","authors":"M. Janus, Jeanne Sinclair, J. Hove, Scott Davies","doi":"10.23889/ijpds.v7i3.1942","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1942","url":null,"abstract":"ObjectivesThe objective of this study was to establish a partnership between a university and a jurisdictional education body (Education Quality and Assessment Organization, EQAO) which would allow creation of a linked dataset from kindergarten to later grades in order to examine educational trajectory in mathematics in Ontario. \u0000ApproachBuilding on mutual goals of improving the understanding of children’s learning trajectories, we developed a project with an investigator team that included university researchers and representatives of the provincial educational assessment body, to link a database of child development status in kindergarten (Early Development Instrument/EDI data, including neighbourhood socioeconomic/SES index) with academic assessment EQAO data, and received research funding. A deterministic matching process was employed to match the datasets. We examined differences between the unmatched and fully matched cases and constructed a growth mixture model of math scores in grades 3, 6 and 9, with key EDI/SES variables as covariates. \u0000ResultsDespite lacking a common identifier, we successfully matched approximately 50% of the EDI cases from 2002-2014 (n=183,771). Effect sizes indicated negligible differences between matched and unmatched, except for SES and child development status, which were poorer for unmatched group. A 3-class solution was the best fit for a 20,000-person subsample of math trajectories based on AIC, BIC, ICL, and entropy values as well as sufficiently high proportions of posterior probabilities, which indicate confidence in class membership. 61% of sample showed steady moderate-high achievement; 9% started high, but declined, and 30% deteriorated then improved. Males, children in low SES, and those with adequate kindergarten EDI outcomes had better math achievement trajectories than females, children in high SES, and those with poor kindergarten outcomes. \u0000ConclusionGiven the two datasets were collected without explicit linkage plan, the matching was only 50%, nevertheless resulting in a large database that allows study of early development antecedents of students’ educational trajectories. The partnership between university and EQAO ensures a wide dissemination of results in both academia and policy worlds.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43306843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2000
Robin Flaig, Jacqui Oakley, Kirsteen Campbell, Katharine Evans, S. McLachlan, Richard Thomas, E. Turner, A. Boyd
ObjectivesThe UK Longitudinal Linkage Collaboration (UK LLC) is a new, unprecedented infrastructure enabling research into the COVID-19 pandemic. The UK LLC integrates data from >20 UK longitudinal studies with systematically linked health, administrative and environmental records to facilitate cross-disciplinary COVID-19 research for accredited UK based researchers. ApproachBringing together all of the key components that form the UK LLC was a huge challenge that may have only been possible in the midst of the pandemic. First, we collaborated with the Longitudinal Population Studies (LPS) to create and agree how data linkage, data provision and applications to access the UK LLC would work. In parallel, public contributors helped to create fair processing materials. Finally, we worked closely with NHS Digital and other key national data providers to organise approvals for all studies to be linked, and for the UK LLC to have delegated decision-making for research applications. ResultsWe faced a myriad of challenges creating the UK LLC including: Short timeframe and short-term funding structure – initial funding for six months with an 18-month extension. Working across >20 different LPS and four nations with different structures for access, consent and data provision. Lack of capacity at various points in the data pipeline due to the volume of COVID-19 research required and underway across the involved organisations. Data processing complexities – split data method means no one can see the entire process therefore catching linkage errors requires working across four different organisations. With such complex data flows it is challenging to find the balance with communications about data to the public – being accurate about what we are doing, but expressing the complexity in lay terms. ConclusionCreating the UK LLC required collaboration with LPS, data providers and researchers. An iterative approach to creating the data application and data provision pipelines was crucial in developing these processes. The UK LLC was built quickly, from initial funding in October 2020 to provisioning data to researchers in December 2021.
{"title":"Longitudinal study of diabetes prevalence and hospitalisations among care experienced and general population children in Scotland: evidence of an end of care “cliff edge”?","authors":"Robin Flaig, Jacqui Oakley, Kirsteen Campbell, Katharine Evans, S. McLachlan, Richard Thomas, E. Turner, A. Boyd","doi":"10.23889/ijpds.v7i3.2000","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2000","url":null,"abstract":"ObjectivesThe UK Longitudinal Linkage Collaboration (UK LLC) is a new, unprecedented infrastructure enabling research into the COVID-19 pandemic. The UK LLC integrates data from >20 UK longitudinal studies with systematically linked health, administrative and environmental records to facilitate cross-disciplinary COVID-19 research for accredited UK based researchers. \u0000ApproachBringing together all of the key components that form the UK LLC was a huge challenge that may have only been possible in the midst of the pandemic. First, we collaborated with the Longitudinal Population Studies (LPS) to create and agree how data linkage, data provision and applications to access the UK LLC would work. In parallel, public contributors helped to create fair processing materials. Finally, we worked closely with NHS Digital and other key national data providers to organise approvals for all studies to be linked, and for the UK LLC to have delegated decision-making for research applications. \u0000ResultsWe faced a myriad of challenges creating the UK LLC including: \u0000 \u0000Short timeframe and short-term funding structure – initial funding for six months with an 18-month extension. \u0000Working across >20 different LPS and four nations with different structures for access, consent and data provision. \u0000Lack of capacity at various points in the data pipeline due to the volume of COVID-19 research required and underway across the involved organisations. \u0000Data processing complexities – split data method means no one can see the entire process therefore catching linkage errors requires working across four different organisations. \u0000With such complex data flows it is challenging to find the balance with communications about data to the public – being accurate about what we are doing, but expressing the complexity in lay terms. \u0000 \u0000ConclusionCreating the UK LLC required collaboration with LPS, data providers and researchers. An iterative approach to creating the data application and data provision pipelines was crucial in developing these processes. The UK LLC was built quickly, from initial funding in October 2020 to provisioning data to researchers in December 2021.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68930083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}