Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2024
Lucy Carty, M. Cortina-Borja, Rachel Plachcinsk, C. Grollman, A. Macfarlane
ObjectivesPotential ‘weekend effects’ in healthcare prompt concerns that care could be of lower quality during non-working hours, but may reflect differences in case mix or other factors This research aimed to compare neonatal mortality in English hospitals from 2005 to 2014 by time of day and day of the week. ApproachWe analysed data from a retrospective cohort of 6,054,536 singleton births in England 2005—2014, created by linking ONS birth and death registration and birth notification data with Hospital Episode Statistics. Working hours were defined as 07:00—19:00 on weekdays, and non-working hours were all other times on weekdays and all weekends and public holidays. The primary outcome was all-cause neonatal mortality unattributed to congenital anomaly. We also modelled cause-specific neonatal mortality attributed to asphyxia, anoxia or trauma (AAT). On advice through our public involvement and strategy, analysis was stratified by mode of onset of labour and method of delivery. ResultsAfter adjustment for confounders, the odds of all-cause neonatal mortality outside of working hours were similar to those during working hours for spontaneous births, instrumental births and emergency caesareans. Planned caesareans occurring in non-working hours had a high crude risk compared to planned caesareans in working hours, but were considered to be unreliably recorded and likely to reflect emergency caesarean delivery of babies originally scheduled for planned caesarean birth. Further stratification of emergency caesareans by onset of labour showed higher odds of cause-specific neonatal mortality (AAT) during non-working compared with working hours for emergency caesareans without labour recorded but not for emergency caesareans after spontaneous or induced onset of labour. ConclusionIt may be that the apparent ‘weekend effect’ is caused by deaths among the relatively small number of babies who were born by caesarean section apparently without labour outside normal working hours. Obstetric staffing should be planned to allow for these relatively unusual emergencies.
{"title":"Neonatal mortality in NHS maternity units by timing of birth and method of delivery: a retrospective linked cohort study.","authors":"Lucy Carty, M. Cortina-Borja, Rachel Plachcinsk, C. Grollman, A. Macfarlane","doi":"10.23889/ijpds.v7i3.2024","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2024","url":null,"abstract":"ObjectivesPotential ‘weekend effects’ in healthcare prompt concerns that care could be of lower quality during non-working hours, but may reflect differences in case mix or other factors This research aimed to compare neonatal mortality in English hospitals from 2005 to 2014 by time of day and day of the week. \u0000ApproachWe analysed data from a retrospective cohort of 6,054,536 singleton births in England 2005—2014, created by linking ONS birth and death registration and birth notification data with Hospital Episode Statistics. \u0000Working hours were defined as 07:00—19:00 on weekdays, and non-working hours were all other times on weekdays and all weekends and public holidays. \u0000The primary outcome was all-cause neonatal mortality unattributed to congenital anomaly. We also modelled cause-specific neonatal mortality attributed to asphyxia, anoxia or trauma (AAT). On advice through our public involvement and strategy, analysis was stratified by mode of onset of labour and method of delivery. \u0000ResultsAfter adjustment for confounders, the odds of all-cause neonatal mortality outside of working hours were similar to those during working hours for spontaneous births, instrumental births and emergency caesareans. Planned caesareans occurring in non-working hours had a high crude risk compared to planned caesareans in working hours, but were considered to be unreliably recorded and likely to reflect emergency caesarean delivery of babies originally scheduled for planned caesarean birth. \u0000Further stratification of emergency caesareans by onset of labour showed higher odds of cause-specific neonatal mortality (AAT) during non-working compared with working hours for emergency caesareans without labour recorded but not for emergency caesareans after spontaneous or induced onset of labour. \u0000ConclusionIt may be that the apparent ‘weekend effect’ is caused by deaths among the relatively small number of babies who were born by caesarean section apparently without labour outside normal working hours. Obstetric staffing should be planned to allow for these relatively unusual emergencies.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68930235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1924
J. Enns, A. Katz, M. Yogendran, Marcelo L. Urquia, S. Muthukumarana, Surani Matharaarachchi, A. Singer, Nathan C. Nickel, L. Star, Teresa Cavett, Y. Keynan, L. Lix, D. Sanchez-Ramirez
ObjectivePost-acute COVID-19 (or ‘long COVID’) manifests as a wide range of long-lasting symptoms affecting multiple organ systems. We are developing criteria for identifying long COVID cases using administrative, clinical, survey and other data from Manitoba, Canada, with the ultimate goal of examining long COVID prevalence, risk factors, prognosis and recovery. ApproachGiven the lack of an accepted clinical definition and resulting lack of diagnostic codes, we are adopting several different creative and complementary strategies to identify long COVID cases. We are examining administrative and clinical data sources (laboratory data, physician claims, drug prescriptions, and electronic medical records) for information on positive COVID tests, common symptoms and complaints, and treatment provided. To identify people with long COVID who may not have sought healthcare, we are collecting survey data from a convenience community sample (members of a medical health fitness facility) and mining data on long COVID symptoms from Twitter. ResultsThe combination of approaches we have adopted and the expanding scientific literature on long COVID are contributing to a more comprehensive understanding of the impacts of long COVID in Manitoba. Through preliminary work on the laboratory data (positive COVID tests March 2020-June 2021), we have developed and characterized a COVID-positive cohort (n=47,515). Work is now underway to develop an algorithm for long COVID using symptoms from free text in electronic medical records, ICD-9 codes, and changes in health-seeking behaviour (compared to the pre-positive COVID test period and a matched sample). This population data-driven approach will then allow us to examine how multiple underlying health conditions, COVID illness severity, COVID vaccination status, and various socio-demographic factors are related to risk of long COVID. ConclusionThis research is generating actionable information by identifying risk factors to support clinical diagnosis of long COVID, making it easier for clinicians to recognize this new illness and develop plans to manage it, and will inform healthcare system planning by quantifying the burden of long COVID at the population level.
{"title":"A population data-driven approach to identifying ‘Long COVID’ cases in support of diagnosis and treatment.","authors":"J. Enns, A. Katz, M. Yogendran, Marcelo L. Urquia, S. Muthukumarana, Surani Matharaarachchi, A. Singer, Nathan C. Nickel, L. Star, Teresa Cavett, Y. Keynan, L. Lix, D. Sanchez-Ramirez","doi":"10.23889/ijpds.v7i3.1924","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1924","url":null,"abstract":"ObjectivePost-acute COVID-19 (or ‘long COVID’) manifests as a wide range of long-lasting symptoms affecting multiple organ systems. We are developing criteria for identifying long COVID cases using administrative, clinical, survey and other data from Manitoba, Canada, with the ultimate goal of examining long COVID prevalence, risk factors, prognosis and recovery. \u0000ApproachGiven the lack of an accepted clinical definition and resulting lack of diagnostic codes, we are adopting several different creative and complementary strategies to identify long COVID cases. We are examining administrative and clinical data sources (laboratory data, physician claims, drug prescriptions, and electronic medical records) for information on positive COVID tests, common symptoms and complaints, and treatment provided. To identify people with long COVID who may not have sought healthcare, we are collecting survey data from a convenience community sample (members of a medical health fitness facility) and mining data on long COVID symptoms from Twitter. \u0000ResultsThe combination of approaches we have adopted and the expanding scientific literature on long COVID are contributing to a more comprehensive understanding of the impacts of long COVID in Manitoba. Through preliminary work on the laboratory data (positive COVID tests March 2020-June 2021), we have developed and characterized a COVID-positive cohort (n=47,515). Work is now underway to develop an algorithm for long COVID using symptoms from free text in electronic medical records, ICD-9 codes, and changes in health-seeking behaviour (compared to the pre-positive COVID test period and a matched sample). This population data-driven approach will then allow us to examine how multiple underlying health conditions, COVID illness severity, COVID vaccination status, and various socio-demographic factors are related to risk of long COVID. \u0000ConclusionThis research is generating actionable information by identifying risk factors to support clinical diagnosis of long COVID, making it easier for clinicians to recognize this new illness and develop plans to manage it, and will inform healthcare system planning by quantifying the burden of long COVID at the population level.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48630806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2096
M. Ho, Stefana Jovanovska, Jennafer Novess, Dina Skvirsky, R. Saskin, J. C. Victor
ObjectiveIn March 2014, ICES launched Data & Analytic Services (DAS), expanding the access to ICES data and analytics beyond ICES scientists and analytic staff. In eight years, DAS has grown and evolved to increase high quality services offered to an expanding client base of external researchers. ApproachAt the inception of DAS, two services were offered to public sector researchers: data access and analytics. Data access enabled researchers to analyze coded record-level data through a secure virtual environment. Analytics, conducted by DAS staff in ICES analytic environment, provided researchers with risk-cleared summary level reports. In response to growing demand from an increasingly diverse range of researchers, ICES engaged in extensive consultations with internal and external stakeholders to re-evaluate and operationalize new services. Compliance with contractual obligations and Ontario law, organizational capacity to scale up, alignment with ICES’ mission, vision and values, were cornerstones in establishing new offerings. ResultsAnalytic services became available to private sector researchers in June 2016. In March 2017, support for cohort and longitudinal follow-up studies became the newest service offering (researchers provided with a list of applicable individuals defined for the purposes of conducting publicly funded research). As more data assets become available to researchers, requests continue to increase in volume and complexity, particularly of projects seeking to import external data for linkage to ICES data. A second high performance computing virtual environment onboarded researchers September 2021 while the original analytic environment has undergone multiple upgrades, and will soon be fully refreshed. Regular solicitation of feedback has enabled DAS to increase staffing and diversify resources available which improves the client experience at all stages. ConclusionsSince its inception, DAS has expanded from five to thirty personnel, grown and diversified its new and returning client base and has responded to demand for new services. DAS continues to provide high quality services which enable impactful research and is responsive to new opportunities for collaboration and service provision.
{"title":"ICES Data and Analytic Services: Eight Years Young.","authors":"M. Ho, Stefana Jovanovska, Jennafer Novess, Dina Skvirsky, R. Saskin, J. C. Victor","doi":"10.23889/ijpds.v7i3.2096","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2096","url":null,"abstract":"ObjectiveIn March 2014, ICES launched Data & Analytic Services (DAS), expanding the access to ICES data and analytics beyond ICES scientists and analytic staff. In eight years, DAS has grown and evolved to increase high quality services offered to an expanding client base of external researchers. \u0000ApproachAt the inception of DAS, two services were offered to public sector researchers: data access and analytics. Data access enabled researchers to analyze coded record-level data through a secure virtual environment. Analytics, conducted by DAS staff in ICES analytic environment, provided researchers with risk-cleared summary level reports. In response to growing demand from an increasingly diverse range of researchers, ICES engaged in extensive consultations with internal and external stakeholders to re-evaluate and operationalize new services. Compliance with contractual obligations and Ontario law, organizational capacity to scale up, alignment with ICES’ mission, vision and values, were cornerstones in establishing new offerings. \u0000ResultsAnalytic services became available to private sector researchers in June 2016. In March 2017, support for cohort and longitudinal follow-up studies became the newest service offering (researchers provided with a list of applicable individuals defined for the purposes of conducting publicly funded research). As more data assets become available to researchers, requests continue to increase in volume and complexity, particularly of projects seeking to import external data for linkage to ICES data. A second high performance computing virtual environment onboarded researchers September 2021 while the original analytic environment has undergone multiple upgrades, and will soon be fully refreshed. Regular solicitation of feedback has enabled DAS to increase staffing and diversify resources available which improves the client experience at all stages. \u0000ConclusionsSince its inception, DAS has expanded from five to thirty personnel, grown and diversified its new and returning client base and has responded to demand for new services. DAS continues to provide high quality services which enable impactful research and is responsive to new opportunities for collaboration and service provision.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48761540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1928
Hannah Dickson, G. Vamvakas, N. Blackwood
ObjectivesTotal annual costs of crime in England and Wales is estimated at £50bn.The age-crime curve indicates that criminal behavioural peaks in adolescence and decreases in adulthood. Life-course persistent offenders begin to behave antisocially early in childhood and continue this behaviour into adulthood. By contrast, adolescent-limited offenders exhibit most of their antisocial behaviour during adolescence, with a minority continuing to offend into adulthood. However, evidence suggests that this curve conceals distinct developmental trajectories. Prospective cohort study data has highlighted distinct risk factors for these offending trajectories, but this research is limited because of small sample sizes for disadvantaged groups, selection bias and infrequency of data collection. ApproachThe current study began in February 2022 and is one of the first to use UK linked national crime and education records. The aim is to: (1) establish the offending trajectories of individuals between the ages of 10 and 32 years following their first recorded conviction or caution using national crime records; and (2) develop prediction models of these offending trajectories using administrative education and social care data. ResultsIn my talk, I will share findings on the offending trajectories identified and present some early results on the key education and social care drivers of the offending trajectories. ConclusionsFindings from the project have the potential to identify previously unknown, or confirm lesser known, offending trajectories using real world data based on the UK population. It may also lead to the detection of previously unknown risk or protective factors for offending, which has implications for early intervention and could help inform criminal justice system responses to early antisocial behaviour.
{"title":"Education and social care predictors of offending trajectories: A UK administrative data linkage study.","authors":"Hannah Dickson, G. Vamvakas, N. Blackwood","doi":"10.23889/ijpds.v7i3.1928","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1928","url":null,"abstract":"ObjectivesTotal annual costs of crime in England and Wales is estimated at £50bn.The age-crime curve indicates that criminal behavioural peaks in adolescence and decreases in adulthood. Life-course persistent offenders begin to behave antisocially early in childhood and continue this behaviour into adulthood. By contrast, adolescent-limited offenders exhibit most of their antisocial behaviour during adolescence, with a minority continuing to offend into adulthood. However, evidence suggests that this curve conceals distinct developmental trajectories. Prospective cohort study data has highlighted distinct risk factors for these offending trajectories, but this research is limited because of small sample sizes for disadvantaged groups, selection bias and infrequency of data collection. \u0000ApproachThe current study began in February 2022 and is one of the first to use UK linked national crime and education records. The aim is to: (1) establish the offending trajectories of individuals between the ages of 10 and 32 years following their first recorded conviction or caution using national crime records; and (2) develop prediction models of these offending trajectories using administrative education and social care data. \u0000ResultsIn my talk, I will share findings on the offending trajectories identified and present some early results on the key education and social care drivers of the offending trajectories. \u0000ConclusionsFindings from the project have the potential to identify previously unknown, or confirm lesser known, offending trajectories using real world data based on the UK population. It may also lead to the detection of previously unknown risk or protective factors for offending, which has implications for early intervention and could help inform criminal justice system responses to early antisocial behaviour.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44324426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1793
Lara M. Greaves, Cinnamon-Jo Lindsay, Eileen Li, Emerald Muriwai, A. Sporle
Linked data presents different social and ethical issues for different contexts and communities. The Statistics New Zealand Integrated Data Infrastructure (IDI) is a collection of de-identified whole-population administrative datasets that researchers are increasingly using to answer pressing social and policy research questions. Our work seeks to provide an overview of the IDI, associated issues for Māori (the Indigenous peoples of New Zealand), and steps to realise Māori data aspirations. In this paper, we first introduce the IDI including what it is and how it developed. We then move to an overview of Māori Data Sovereignty. Our paper then turns to examples of organisations, agreements, and frameworks which seek to make the IDI and data better for Māori communities. We then discuss the main issues with the IDI for Māori including technical issues, deficit-framed work, involvement from communities, consent, social license, further data linkage, and barriers to access for Māori. We finish with a set of recommendations around how to improve the IDI for Māori, making sure that Māori can get the most out of administrative data for our communities. These include the need to build data researcher capacity and capability for Māori, Māori data co-governance and accountability, reducing practical and skill barriers for access by Māori and Māori organisations, providing robust, consistent and transparent practice exemplars for best practice, and potentially even abolishing the IDI and starting again. These issues are being worked through via Indigenous engagement and co-governance processes that could provide useful exemplars for Indigenous and community engagement with linked data resources.
{"title":"Māori and Linked Administrative Data: A Critical Review of the Literature and Suggestions to Realise Māori Data Aspirations.","authors":"Lara M. Greaves, Cinnamon-Jo Lindsay, Eileen Li, Emerald Muriwai, A. Sporle","doi":"10.23889/ijpds.v7i3.1793","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1793","url":null,"abstract":"Linked data presents different social and ethical issues for different contexts and communities. The Statistics New Zealand Integrated Data Infrastructure (IDI) is a collection of de-identified whole-population administrative datasets that researchers are increasingly using to answer pressing social and policy research questions. Our work seeks to provide an overview of the IDI, associated issues for Māori (the Indigenous peoples of New Zealand), and steps to realise Māori data aspirations. In this paper, we first introduce the IDI including what it is and how it developed. We then move to an overview of Māori Data Sovereignty. Our paper then turns to examples of organisations, agreements, and frameworks which seek to make the IDI and data better for Māori communities. We then discuss the main issues with the IDI for Māori including technical issues, deficit-framed work, involvement from communities, consent, social license, further data linkage, and barriers to access for Māori. We finish with a set of recommendations around how to improve the IDI for Māori, making sure that Māori can get the most out of administrative data for our communities. These include the need to build data researcher capacity and capability for Māori, Māori data co-governance and accountability, reducing practical and skill barriers for access by Māori and Māori organisations, providing robust, consistent and transparent practice exemplars for best practice, and potentially even abolishing the IDI and starting again. These issues are being worked through via Indigenous engagement and co-governance processes that could provide useful exemplars for Indigenous and community engagement with linked data resources.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43088793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2038
Sarah Collyer, Josie Plachta
ObjectivesThe Demographic Index (DI) comprises of five linked administrative datasets, used for population estimation. Current linkage methods are not ideal to utilise the power of this asset. Using the 2021 England and Wales Census, we are developing an innovative composite linkage method to fully utilise the power of the DI. ApproachUsing non-greedy deterministic and probabilistic linkage methods, we will link the DI to the Census at a composite level where we believe links exist – i.e., linking a Census cluster (consisting of linked Census and Census Coverage Survey (CCS) records) with a DI cluster (consisting of linked records from the data sources used to make the DI). We will then conduct a pairwise linkage of records from these linked clusters to link individual source records to the Census. We will utilise clerical review to resolve uncertain and conflicting links and to inform the quality of our linkage. ResultsWe anticipate producing a high-quality linkage that will inform how the coverage of the DI compares to Census (through the composite-level linkage) and the quality of the DI itself (through the pairwise-level linkage). We have developed a clerical matching system that can display composite-level linkage, i.e., candidate cluster-pairs. We will tailor our clerical review and quality assessment to records that fall within carefully chosen postcode areas, to ensure all hard-to-count groups and geographical areas are sampled. Working with large datasets is a challenge we are overcoming by using distributed computing and search space reduction. The 2021 Census has been previously linked to the CCS with high accuracy; these records are considered intrinsically linked. ConclusionTo assess national population estimates’ quality and the policy decisions based upon them, we are linking a key composite population-level dataset to the 2021 England and Wales Census. The presentation will showcase the methods we are developing and how we are ensuring the highest quality possible.
{"title":"Using linkage to assess coverage of population estimates.","authors":"Sarah Collyer, Josie Plachta","doi":"10.23889/ijpds.v7i3.2038","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2038","url":null,"abstract":"ObjectivesThe Demographic Index (DI) comprises of five linked administrative datasets, used for population estimation. Current linkage methods are not ideal to utilise the power of this asset. Using the 2021 England and Wales Census, we are developing an innovative composite linkage method to fully utilise the power of the DI. \u0000ApproachUsing non-greedy deterministic and probabilistic linkage methods, we will link the DI to the Census at a composite level where we believe links exist – i.e., linking a Census cluster (consisting of linked Census and Census Coverage Survey (CCS) records) with a DI cluster (consisting of linked records from the data sources used to make the DI). We will then conduct a pairwise linkage of records from these linked clusters to link individual source records to the Census. We will utilise clerical review to resolve uncertain and conflicting links and to inform the quality of our linkage. \u0000ResultsWe anticipate producing a high-quality linkage that will inform how the coverage of the DI compares to Census (through the composite-level linkage) and the quality of the DI itself (through the pairwise-level linkage). We have developed a clerical matching system that can display composite-level linkage, i.e., candidate cluster-pairs. We will tailor our clerical review and quality assessment to records that fall within carefully chosen postcode areas, to ensure all hard-to-count groups and geographical areas are sampled. Working with large datasets is a challenge we are overcoming by using distributed computing and search space reduction. \u0000The 2021 Census has been previously linked to the CCS with high accuracy; these records are considered intrinsically linked. \u0000ConclusionTo assess national population estimates’ quality and the policy decisions based upon them, we are linking a key composite population-level dataset to the 2021 England and Wales Census. The presentation will showcase the methods we are developing and how we are ensuring the highest quality possible.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43453716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2057
J. Cooper, D. O’Reilly, Richard Kirk, Trish Kelly, Rachel Gibbs, M. Donnelly
A project designed to examine, for the first time, the health records of adult prisoners in Northern Ireland and their linkage to other available health data: the test case of prisoner post-release mortality risk ObjectivesThe linkage of routinely collected administrative data for research purposes has the potential to improve knowledge and public benefit. We describe a novel data linkage study between the Northern Ireland (NI) Healthcare in Prisons and Business Services Organisation (BSO). This work is undertaken within the Administrative Data Research Centre-NI (ADRC-NI). ApproachThis joint project between ADRC-NI Queen’s University Belfast and NI Healthcare in Prisons (South Eastern Health and Social Care Trust) will test linkage of prisoner health records to health data held in the BSO and the potential to generate a population-based cohort for a retrospective analysis of prisoner health (2012-2021) that will attempt to characterise prisoners according to socio-demographic, health and committal factors, compare post-release mortality rates with a reference group from the NI population using indirect standardisation and estimate post-release mortality risk using Cox proportional hazards models. ResultsUsing novel data-linkages, a dataset will be created to examine the health of prisoners (and former prisoners) in NI. Ethics and governance approvals are in place for this data-linkage. The linkage will be undertaken via the Honest Broker Service (HBS) in NI and the dataset will be accessed in the safe setting at the BSO. The processes involved, experiences including significant delays or difficulties, and recommendations for future data-linkage studies will be discussed. In addition, a key deliverable of this project will be an assessment of access and linkage capabilities of the prisoner health data, with metadata created and made available to future researchers. In addition, we plan to present preliminary results relating to the test research question. ConclusionWe will describe the processes involved and first-hand research experience in the development of a novel data-linkage project, in addition we will detail access and linkage capabilities in relation to this new dataset to examine health in prisoners (and former prisoners) in NI.
{"title":"A project designed to examine, for the first time, the health records of adult prisoners in Northern Ireland and their linkage to other available health data: the test case of prisoner post-release mortality risk.","authors":"J. Cooper, D. O’Reilly, Richard Kirk, Trish Kelly, Rachel Gibbs, M. Donnelly","doi":"10.23889/ijpds.v7i3.2057","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2057","url":null,"abstract":"A project designed to examine, for the first time, the health records of adult prisoners in Northern Ireland and their linkage to other available health data: the test case of prisoner post-release mortality risk \u0000ObjectivesThe linkage of routinely collected administrative data for research purposes has the potential to improve knowledge and public benefit. We describe a novel data linkage study between the Northern Ireland (NI) Healthcare in Prisons and Business Services Organisation (BSO). This work is undertaken within the Administrative Data Research Centre-NI (ADRC-NI). \u0000ApproachThis joint project between ADRC-NI Queen’s University Belfast and NI Healthcare in Prisons (South Eastern Health and Social Care Trust) will test linkage of prisoner health records to health data held in the BSO and the potential to generate a population-based cohort for a retrospective analysis of prisoner health (2012-2021) that will attempt to characterise prisoners according to socio-demographic, health and committal factors, compare post-release mortality rates with a reference group from the NI population using indirect standardisation and estimate post-release mortality risk using Cox proportional hazards models. \u0000ResultsUsing novel data-linkages, a dataset will be created to examine the health of prisoners (and former prisoners) in NI. Ethics and governance approvals are in place for this data-linkage. The linkage will be undertaken via the Honest Broker Service (HBS) in NI and the dataset will be accessed in the safe setting at the BSO. The processes involved, experiences including significant delays or difficulties, and recommendations for future data-linkage studies will be discussed. In addition, a key deliverable of this project will be an assessment of access and linkage capabilities of the prisoner health data, with metadata created and made available to future researchers. In addition, we plan to present preliminary results relating to the test research question. \u0000ConclusionWe will describe the processes involved and first-hand research experience in the development of a novel data-linkage project, in addition we will detail access and linkage capabilities in relation to this new dataset to examine health in prisoners (and former prisoners) in NI.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49534857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1956
Yang Lu
IntroductionSharing aggregated electronic health records (EHRs) for integrated health care and public health studies is increasingly demanded. Patient privacy demands that anonymisation procedures are in place for data sharing. ObjectiveTraditional methods such as k-anonymity and its derivations are often overgeneralising resulting in lower data accuracy. To tackle this issue, we proposed the Semantic Linkage K-Anonymity (SLKA) approach to balance the privacy and utility preservation through detecting risky combinations hidden in the record linkage releases. ApproachK-anonymity processing quasi-identifiers of data may lead to ‘over generalisation’ when dealing with linkage data sets. As most linkage cases do not include all local patients and thus not all modifying data for privacy-preserving purposes needs to be used, we proposed the linkage k-anonymity (LKA) by which only obfuscated individuals in a released linkage set are required to be indistinguishable from at least k-1 other individuals in the local dataset. Considering the inference disclosure issue, we further designed the semantic-based linkage k-anonymity (SLKA) method through extending with a semantic-rule base for automatic detection of (and ruling out) risky associations from previous linked data releases. Specially, associations identified from the “previous releases” of the linkage dataset can become the input of semantic reasoning for the “next release”. ResultsThe approach is evaluated based on a linkage scenario where researchers apply to link data from an Australia-wide national type-1 diabetes platform with survey results from 25,000+ Victorians about their health and wellbeing. In comparing the information loss of three methods, we find that extra cost can be incurred in SLKA for dealing with risky individuals, e.g., 13.7% vs 5.9% (LKA, k=4) however it performs much better than k-anonymity, which can cause 24% information loss (k=4). Besides, the k values can affect the level of distortion in SLKA, such as 11.5% (k=2) vs 12.9% (k=3). ConclusionThe SLKA framework provides dynamic protection for repeated linkage releases while preserving data utility by avoiding unnecessary generalisation as typified by k-anonymity.
{"title":"Semantic-based Privacy-preserving Record Linkage.","authors":"Yang Lu","doi":"10.23889/ijpds.v7i3.1956","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1956","url":null,"abstract":"IntroductionSharing aggregated electronic health records (EHRs) for integrated health care and public health studies is increasingly demanded. Patient privacy demands that anonymisation procedures are in place for data sharing. \u0000ObjectiveTraditional methods such as k-anonymity and its derivations are often overgeneralising resulting in lower data accuracy. To tackle this issue, we proposed the Semantic Linkage K-Anonymity (SLKA) approach to balance the privacy and utility preservation through detecting risky combinations hidden in the record linkage releases. \u0000ApproachK-anonymity processing quasi-identifiers of data may lead to ‘over generalisation’ when dealing with linkage data sets. As most linkage cases do not include all local patients and thus not all modifying data for privacy-preserving purposes needs to be used, we proposed the linkage k-anonymity (LKA) by which only obfuscated individuals in a released linkage set are required to be indistinguishable from at least k-1 other individuals in the local dataset. Considering the inference disclosure issue, we further designed the semantic-based linkage k-anonymity (SLKA) method through extending with a semantic-rule base for automatic detection of (and ruling out) risky associations from previous linked data releases. Specially, associations identified from the “previous releases” of the linkage dataset can become the input of semantic reasoning for the “next release”. \u0000ResultsThe approach is evaluated based on a linkage scenario where researchers apply to link data from an Australia-wide national type-1 diabetes platform with survey results from 25,000+ Victorians about their health and wellbeing. In comparing the information loss of three methods, we find that extra cost can be incurred in SLKA for dealing with risky individuals, e.g., 13.7% vs 5.9% (LKA, k=4) however it performs much better than k-anonymity, which can cause 24% information loss (k=4). Besides, the k values can affect the level of distortion in SLKA, such as 11.5% (k=2) vs 12.9% (k=3). \u0000ConclusionThe SLKA framework provides dynamic protection for repeated linkage releases while preserving data utility by avoiding unnecessary generalisation as typified by k-anonymity.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45845651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1963
P. Nair, Michael Smith, M. Theochari
ObjectiveDevelop a digital solution for automated data ingestion and rapid update of the large-scale Human Services Dataset (HSDS) which brings together data from across government to take a powerful view of the service usage to improve outcomes of communities. ApproachThe Centre for Health Record Linkage (CHeReL) hosts a secure, high-performing data linkage system, including a Master Linkage Key (MLK) of administrative health datasets, and generates linked data to inform policy decisions. Since 2018, CHeReL has also been annually linking over 70 frontline datasets to create a large-scale longitudinal linked dataset of over 2.5 billion records. Over the course of 2021, the CHeReL led a project to incrementally improve the currency of the HSDS in compressed timeframes. This provided opportunity to assess value and feasibility of more frequent updates to the dataset within the evaluation and investment context. ResultsThe automated data Ingestion and validation led to a significant reduction in the data processing timeframes for the Accelerated linkage. We observed 80% reduction in Data ingestion and 75% reduction in data validation. The digital solution also allows asset owners to register and approve new data providers, monitor their data provision in real-time and report on data sourcing. This provides transparency to the Asset Owner and reduces the need for time-intensive and manual processes to jointly monitor data provision with the Data Linkage Centre. The digital solution also has the capability to support Data Providers automate their data feeds and provide on a regular basis through a secure non- touch process. This reduces on-going workload and ensures on-time provision. ConclusionThe process requires a systematic change in the upstream data source, and we requested participating agencies to send us data in an agreed format. The receipt of files in standard format is pivotal for reducing the overall timeframes of HSDS creation and leverage it for policy and investment purpose.
{"title":"Accelerate the Creation of the cross agency Human Services Dataset.","authors":"P. Nair, Michael Smith, M. Theochari","doi":"10.23889/ijpds.v7i3.1963","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1963","url":null,"abstract":"ObjectiveDevelop a digital solution for automated data ingestion and rapid update of the large-scale Human Services Dataset (HSDS) which brings together data from across government to take a powerful view of the service usage to improve outcomes of communities. \u0000ApproachThe Centre for Health Record Linkage (CHeReL) hosts a secure, high-performing data linkage system, including a Master Linkage Key (MLK) of administrative health datasets, and generates linked data to inform policy decisions. Since 2018, CHeReL has also been annually linking over 70 frontline datasets to create a large-scale longitudinal linked dataset of over 2.5 billion records. \u0000Over the course of 2021, the CHeReL led a project to incrementally improve the currency of the HSDS in compressed timeframes. This provided opportunity to assess value and feasibility of more frequent updates to the dataset within the evaluation and investment context. \u0000ResultsThe automated data Ingestion and validation led to a significant reduction in the data processing timeframes for the Accelerated linkage. We observed 80% reduction in Data ingestion and 75% reduction in data validation. \u0000The digital solution also allows asset owners to register and approve new data providers, monitor their data provision in real-time and report on data sourcing. This provides transparency to the Asset Owner and reduces the need for time-intensive and manual processes to jointly monitor data provision with the Data Linkage Centre. \u0000The digital solution also has the capability to support Data Providers automate their data feeds and provide on a regular basis through a secure non- touch process. This reduces on-going workload and ensures on-time provision. \u0000ConclusionThe process requires a systematic change in the upstream data source, and we requested participating agencies to send us data in an agreed format. The receipt of files in standard format is pivotal for reducing the overall timeframes of HSDS creation and leverage it for policy and investment purpose.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46019220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1953
R. Beare, Adam Morris, Tanya Ravipati, Elizabeth Le, T. Collyer, Helene Roberts, V. Srikanth, Nadine E. Andrew
ObjectivesTo develop a flexible platform for creating, reviewing and adjudicating annotation of unstructured text. Natural Language Processing models and statistical classifiers use the results for analysis of large databases of text, such as electronic health records, that are curated by the National Centre for Healthy Ageing (NCHA) Data Platform. ApproachAutomated approaches are essential for large scale extraction of structured data from unstructured documents. We applied the CogStack suite to annotate clinical text from hospital inpatient records based on the Unified Medical Language System (UMLS) for classifying dementia status. We trained a logistic regression classifier to determine dementia/non-dementia status within two cohorts based on frequency of occurrence of a set of terms provided by experts - one with confirmed dementia based on clinical assessment and the other confirmed non-dementia based on telephone cognitive interview. We used our annotation platform to review the accuracy of concepts assigned by CogStack. ResultsThere were 368 people with clinically confirmed dementia and 218 screen-negative for dementia. Of these, 259 with dementia and 195 without dementia had documents in the inpatient electronic health record system, 84045 inpatient documents 16950 for the dementia and non-dementia cohort respectively. A set of key words pertaining to dementia was generated by a specialist neurologist and a health information manager, and matched to UMLS concepts. The NCHA data platform holds a copy of the inpatient text records (>13million documents) that has been annotated using CogStack. Annotated documents corresponding to the study cohort were extracted. We tested true positive rates of annotation against 50 concepts judged by a neurologist and health information manager to be relevant to dementia patients by manually review of 100 documents. ConclusionAutomated annotations must be validated. The platform we have developed allows efficient review and correction of annotations to allow models to be trained further or provide confidence that accuracy is sufficient for subsequent analysis. Implementation within our linked NCHA data platform will allow incorporation of text based data at scale.
{"title":"A configurable software platform for creating, reviewing and adjudicating annotation of unstructured text.","authors":"R. Beare, Adam Morris, Tanya Ravipati, Elizabeth Le, T. Collyer, Helene Roberts, V. Srikanth, Nadine E. Andrew","doi":"10.23889/ijpds.v7i3.1953","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1953","url":null,"abstract":"ObjectivesTo develop a flexible platform for creating, reviewing and adjudicating annotation of unstructured text. Natural Language Processing models and statistical classifiers use the results for analysis of large databases of text, such as electronic health records, that are curated by the National Centre for Healthy Ageing (NCHA) Data Platform. \u0000ApproachAutomated approaches are essential for large scale extraction of structured data from unstructured documents. We applied the CogStack suite to annotate clinical text from hospital inpatient records based on the Unified Medical Language System (UMLS) for classifying dementia status. We trained a logistic regression classifier to determine dementia/non-dementia status within two cohorts based on frequency of occurrence of a set of terms provided by experts - one with confirmed dementia based on clinical assessment and the other confirmed non-dementia based on telephone cognitive interview. We used our annotation platform to review the accuracy of concepts assigned by CogStack. \u0000ResultsThere were 368 people with clinically confirmed dementia and 218 screen-negative for dementia. Of these, 259 with dementia and 195 without dementia had documents in the inpatient electronic health record system, 84045 inpatient documents 16950 for the dementia and non-dementia cohort respectively. A set of key words pertaining to dementia was generated by a specialist neurologist and a health information manager, and matched to UMLS concepts. The NCHA data platform holds a copy of the inpatient text records (>13million documents) that has been annotated using CogStack. Annotated documents corresponding to the study cohort were extracted. \u0000We tested true positive rates of annotation against 50 concepts judged by a neurologist and health information manager to be relevant to dementia patients by manually review of 100 documents. \u0000ConclusionAutomated annotations must be validated. The platform we have developed allows efficient review and correction of annotations to allow models to be trained further or provide confidence that accuracy is sufficient for subsequent analysis. Implementation within our linked NCHA data platform will allow incorporation of text based data at scale.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45739094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}