Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1848
Maya Murmann, Douglas Manuel
Population covering educational attainment registers have been proven helpful for planning and research concerning educational efforts. Regular linking of different databases is needed to build and update such a register. Without unique national identification numbers, record linkage must be based on quasi-identifiers such as names, date of birth and sex. High-quality record linkage require the unique identification of persons. Therefore, available identifiers should be sufficient for unique identification despite missing identifiers for some cases. Redundant identifiers can achieve this goal. However, the data protection principle of data minimization, as recommended in the European General Data Protection Regulation, aims to avoid additional data if possible for the given purpose. Therefore, a ministry commissioned a simulation study to inform legislators on the minimum set of identifiers needed for a national register. A microsimulation of the population consisting of nearly 20 million people was implemented to generate data on accumulating changes and errors in identifiers over ten simulated years. The simulation covered, for example, international migration, regional mobility, marriages, school careers and mortality. Each event triggered changes of identifiers according to specified error probability models. The resulting data were linked by different record-linkage procedures. Linkage quality and linkage bias dependent on the available identifiers were assessed. We report on the design of the simulation study, the linkage results and recommendations for the minimum set of identifiers. The results may be helpful for the design of other population covering registers.
{"title":"Microsimulation of an educational attainment register to study record linkage quality.","authors":"Maya Murmann, Douglas Manuel","doi":"10.23889/ijpds.v7i3.1848","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1848","url":null,"abstract":"Population covering educational attainment registers have been proven helpful for planning and research concerning educational efforts. Regular linking of different databases is needed to build and update such a register. Without unique national identification numbers, record linkage must be based on quasi-identifiers such as names, date of birth and sex. High-quality record linkage require the unique identification of persons. Therefore, available identifiers should be sufficient for unique identification despite missing identifiers for some cases. Redundant identifiers can achieve this goal. However, the data protection principle of data minimization, as recommended in the European General Data Protection Regulation, aims to avoid additional data if possible for the given purpose. Therefore, a ministry commissioned a simulation study to inform legislators on the minimum set of identifiers needed for a national register. A microsimulation of the population consisting of nearly 20 million people was implemented to generate data on accumulating changes and errors in identifiers over ten simulated years. The simulation covered, for example, international migration, regional mobility, marriages, school careers and mortality. Each event triggered changes of identifiers according to specified error probability models. The resulting data were linked by different record-linkage procedures. Linkage quality and linkage bias dependent on the available identifiers were assessed. We report on the design of the simulation study, the linkage results and recommendations for the minimum set of identifiers. The results may be helpful for the design of other population covering registers.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47137022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1942
M. Janus, Jeanne Sinclair, J. Hove, Scott Davies
ObjectivesThe objective of this study was to establish a partnership between a university and a jurisdictional education body (Education Quality and Assessment Organization, EQAO) which would allow creation of a linked dataset from kindergarten to later grades in order to examine educational trajectory in mathematics in Ontario. ApproachBuilding on mutual goals of improving the understanding of children’s learning trajectories, we developed a project with an investigator team that included university researchers and representatives of the provincial educational assessment body, to link a database of child development status in kindergarten (Early Development Instrument/EDI data, including neighbourhood socioeconomic/SES index) with academic assessment EQAO data, and received research funding. A deterministic matching process was employed to match the datasets. We examined differences between the unmatched and fully matched cases and constructed a growth mixture model of math scores in grades 3, 6 and 9, with key EDI/SES variables as covariates. ResultsDespite lacking a common identifier, we successfully matched approximately 50% of the EDI cases from 2002-2014 (n=183,771). Effect sizes indicated negligible differences between matched and unmatched, except for SES and child development status, which were poorer for unmatched group. A 3-class solution was the best fit for a 20,000-person subsample of math trajectories based on AIC, BIC, ICL, and entropy values as well as sufficiently high proportions of posterior probabilities, which indicate confidence in class membership. 61% of sample showed steady moderate-high achievement; 9% started high, but declined, and 30% deteriorated then improved. Males, children in low SES, and those with adequate kindergarten EDI outcomes had better math achievement trajectories than females, children in high SES, and those with poor kindergarten outcomes. ConclusionGiven the two datasets were collected without explicit linkage plan, the matching was only 50%, nevertheless resulting in a large database that allows study of early development antecedents of students’ educational trajectories. The partnership between university and EQAO ensures a wide dissemination of results in both academia and policy worlds.
{"title":"Building partnerships, capacity, and knowledge through a use of newly linked child development and education datasets in Ontario, Canada.","authors":"M. Janus, Jeanne Sinclair, J. Hove, Scott Davies","doi":"10.23889/ijpds.v7i3.1942","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1942","url":null,"abstract":"ObjectivesThe objective of this study was to establish a partnership between a university and a jurisdictional education body (Education Quality and Assessment Organization, EQAO) which would allow creation of a linked dataset from kindergarten to later grades in order to examine educational trajectory in mathematics in Ontario. \u0000ApproachBuilding on mutual goals of improving the understanding of children’s learning trajectories, we developed a project with an investigator team that included university researchers and representatives of the provincial educational assessment body, to link a database of child development status in kindergarten (Early Development Instrument/EDI data, including neighbourhood socioeconomic/SES index) with academic assessment EQAO data, and received research funding. A deterministic matching process was employed to match the datasets. We examined differences between the unmatched and fully matched cases and constructed a growth mixture model of math scores in grades 3, 6 and 9, with key EDI/SES variables as covariates. \u0000ResultsDespite lacking a common identifier, we successfully matched approximately 50% of the EDI cases from 2002-2014 (n=183,771). Effect sizes indicated negligible differences between matched and unmatched, except for SES and child development status, which were poorer for unmatched group. A 3-class solution was the best fit for a 20,000-person subsample of math trajectories based on AIC, BIC, ICL, and entropy values as well as sufficiently high proportions of posterior probabilities, which indicate confidence in class membership. 61% of sample showed steady moderate-high achievement; 9% started high, but declined, and 30% deteriorated then improved. Males, children in low SES, and those with adequate kindergarten EDI outcomes had better math achievement trajectories than females, children in high SES, and those with poor kindergarten outcomes. \u0000ConclusionGiven the two datasets were collected without explicit linkage plan, the matching was only 50%, nevertheless resulting in a large database that allows study of early development antecedents of students’ educational trajectories. The partnership between university and EQAO ensures a wide dissemination of results in both academia and policy worlds.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43306843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2000
Robin Flaig, Jacqui Oakley, Kirsteen Campbell, Katharine Evans, S. McLachlan, Richard Thomas, E. Turner, A. Boyd
ObjectivesThe UK Longitudinal Linkage Collaboration (UK LLC) is a new, unprecedented infrastructure enabling research into the COVID-19 pandemic. The UK LLC integrates data from >20 UK longitudinal studies with systematically linked health, administrative and environmental records to facilitate cross-disciplinary COVID-19 research for accredited UK based researchers. ApproachBringing together all of the key components that form the UK LLC was a huge challenge that may have only been possible in the midst of the pandemic. First, we collaborated with the Longitudinal Population Studies (LPS) to create and agree how data linkage, data provision and applications to access the UK LLC would work. In parallel, public contributors helped to create fair processing materials. Finally, we worked closely with NHS Digital and other key national data providers to organise approvals for all studies to be linked, and for the UK LLC to have delegated decision-making for research applications. ResultsWe faced a myriad of challenges creating the UK LLC including: Short timeframe and short-term funding structure – initial funding for six months with an 18-month extension. Working across >20 different LPS and four nations with different structures for access, consent and data provision. Lack of capacity at various points in the data pipeline due to the volume of COVID-19 research required and underway across the involved organisations. Data processing complexities – split data method means no one can see the entire process therefore catching linkage errors requires working across four different organisations. With such complex data flows it is challenging to find the balance with communications about data to the public – being accurate about what we are doing, but expressing the complexity in lay terms. ConclusionCreating the UK LLC required collaboration with LPS, data providers and researchers. An iterative approach to creating the data application and data provision pipelines was crucial in developing these processes. The UK LLC was built quickly, from initial funding in October 2020 to provisioning data to researchers in December 2021.
{"title":"Longitudinal study of diabetes prevalence and hospitalisations among care experienced and general population children in Scotland: evidence of an end of care “cliff edge”?","authors":"Robin Flaig, Jacqui Oakley, Kirsteen Campbell, Katharine Evans, S. McLachlan, Richard Thomas, E. Turner, A. Boyd","doi":"10.23889/ijpds.v7i3.2000","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2000","url":null,"abstract":"ObjectivesThe UK Longitudinal Linkage Collaboration (UK LLC) is a new, unprecedented infrastructure enabling research into the COVID-19 pandemic. The UK LLC integrates data from >20 UK longitudinal studies with systematically linked health, administrative and environmental records to facilitate cross-disciplinary COVID-19 research for accredited UK based researchers. \u0000ApproachBringing together all of the key components that form the UK LLC was a huge challenge that may have only been possible in the midst of the pandemic. First, we collaborated with the Longitudinal Population Studies (LPS) to create and agree how data linkage, data provision and applications to access the UK LLC would work. In parallel, public contributors helped to create fair processing materials. Finally, we worked closely with NHS Digital and other key national data providers to organise approvals for all studies to be linked, and for the UK LLC to have delegated decision-making for research applications. \u0000ResultsWe faced a myriad of challenges creating the UK LLC including: \u0000 \u0000Short timeframe and short-term funding structure – initial funding for six months with an 18-month extension. \u0000Working across >20 different LPS and four nations with different structures for access, consent and data provision. \u0000Lack of capacity at various points in the data pipeline due to the volume of COVID-19 research required and underway across the involved organisations. \u0000Data processing complexities – split data method means no one can see the entire process therefore catching linkage errors requires working across four different organisations. \u0000With such complex data flows it is challenging to find the balance with communications about data to the public – being accurate about what we are doing, but expressing the complexity in lay terms. \u0000 \u0000ConclusionCreating the UK LLC required collaboration with LPS, data providers and researchers. An iterative approach to creating the data application and data provision pipelines was crucial in developing these processes. The UK LLC was built quickly, from initial funding in October 2020 to provisioning data to researchers in December 2021.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68930083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2024
Lucy Carty, M. Cortina-Borja, Rachel Plachcinsk, C. Grollman, A. Macfarlane
ObjectivesPotential ‘weekend effects’ in healthcare prompt concerns that care could be of lower quality during non-working hours, but may reflect differences in case mix or other factors This research aimed to compare neonatal mortality in English hospitals from 2005 to 2014 by time of day and day of the week. ApproachWe analysed data from a retrospective cohort of 6,054,536 singleton births in England 2005—2014, created by linking ONS birth and death registration and birth notification data with Hospital Episode Statistics. Working hours were defined as 07:00—19:00 on weekdays, and non-working hours were all other times on weekdays and all weekends and public holidays. The primary outcome was all-cause neonatal mortality unattributed to congenital anomaly. We also modelled cause-specific neonatal mortality attributed to asphyxia, anoxia or trauma (AAT). On advice through our public involvement and strategy, analysis was stratified by mode of onset of labour and method of delivery. ResultsAfter adjustment for confounders, the odds of all-cause neonatal mortality outside of working hours were similar to those during working hours for spontaneous births, instrumental births and emergency caesareans. Planned caesareans occurring in non-working hours had a high crude risk compared to planned caesareans in working hours, but were considered to be unreliably recorded and likely to reflect emergency caesarean delivery of babies originally scheduled for planned caesarean birth. Further stratification of emergency caesareans by onset of labour showed higher odds of cause-specific neonatal mortality (AAT) during non-working compared with working hours for emergency caesareans without labour recorded but not for emergency caesareans after spontaneous or induced onset of labour. ConclusionIt may be that the apparent ‘weekend effect’ is caused by deaths among the relatively small number of babies who were born by caesarean section apparently without labour outside normal working hours. Obstetric staffing should be planned to allow for these relatively unusual emergencies.
{"title":"Neonatal mortality in NHS maternity units by timing of birth and method of delivery: a retrospective linked cohort study.","authors":"Lucy Carty, M. Cortina-Borja, Rachel Plachcinsk, C. Grollman, A. Macfarlane","doi":"10.23889/ijpds.v7i3.2024","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2024","url":null,"abstract":"ObjectivesPotential ‘weekend effects’ in healthcare prompt concerns that care could be of lower quality during non-working hours, but may reflect differences in case mix or other factors This research aimed to compare neonatal mortality in English hospitals from 2005 to 2014 by time of day and day of the week. \u0000ApproachWe analysed data from a retrospective cohort of 6,054,536 singleton births in England 2005—2014, created by linking ONS birth and death registration and birth notification data with Hospital Episode Statistics. \u0000Working hours were defined as 07:00—19:00 on weekdays, and non-working hours were all other times on weekdays and all weekends and public holidays. \u0000The primary outcome was all-cause neonatal mortality unattributed to congenital anomaly. We also modelled cause-specific neonatal mortality attributed to asphyxia, anoxia or trauma (AAT). On advice through our public involvement and strategy, analysis was stratified by mode of onset of labour and method of delivery. \u0000ResultsAfter adjustment for confounders, the odds of all-cause neonatal mortality outside of working hours were similar to those during working hours for spontaneous births, instrumental births and emergency caesareans. Planned caesareans occurring in non-working hours had a high crude risk compared to planned caesareans in working hours, but were considered to be unreliably recorded and likely to reflect emergency caesarean delivery of babies originally scheduled for planned caesarean birth. \u0000Further stratification of emergency caesareans by onset of labour showed higher odds of cause-specific neonatal mortality (AAT) during non-working compared with working hours for emergency caesareans without labour recorded but not for emergency caesareans after spontaneous or induced onset of labour. \u0000ConclusionIt may be that the apparent ‘weekend effect’ is caused by deaths among the relatively small number of babies who were born by caesarean section apparently without labour outside normal working hours. Obstetric staffing should be planned to allow for these relatively unusual emergencies.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68930235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1924
J. Enns, A. Katz, M. Yogendran, Marcelo L. Urquia, S. Muthukumarana, Surani Matharaarachchi, A. Singer, Nathan C. Nickel, L. Star, Teresa Cavett, Y. Keynan, L. Lix, D. Sanchez-Ramirez
ObjectivePost-acute COVID-19 (or ‘long COVID’) manifests as a wide range of long-lasting symptoms affecting multiple organ systems. We are developing criteria for identifying long COVID cases using administrative, clinical, survey and other data from Manitoba, Canada, with the ultimate goal of examining long COVID prevalence, risk factors, prognosis and recovery. ApproachGiven the lack of an accepted clinical definition and resulting lack of diagnostic codes, we are adopting several different creative and complementary strategies to identify long COVID cases. We are examining administrative and clinical data sources (laboratory data, physician claims, drug prescriptions, and electronic medical records) for information on positive COVID tests, common symptoms and complaints, and treatment provided. To identify people with long COVID who may not have sought healthcare, we are collecting survey data from a convenience community sample (members of a medical health fitness facility) and mining data on long COVID symptoms from Twitter. ResultsThe combination of approaches we have adopted and the expanding scientific literature on long COVID are contributing to a more comprehensive understanding of the impacts of long COVID in Manitoba. Through preliminary work on the laboratory data (positive COVID tests March 2020-June 2021), we have developed and characterized a COVID-positive cohort (n=47,515). Work is now underway to develop an algorithm for long COVID using symptoms from free text in electronic medical records, ICD-9 codes, and changes in health-seeking behaviour (compared to the pre-positive COVID test period and a matched sample). This population data-driven approach will then allow us to examine how multiple underlying health conditions, COVID illness severity, COVID vaccination status, and various socio-demographic factors are related to risk of long COVID. ConclusionThis research is generating actionable information by identifying risk factors to support clinical diagnosis of long COVID, making it easier for clinicians to recognize this new illness and develop plans to manage it, and will inform healthcare system planning by quantifying the burden of long COVID at the population level.
{"title":"A population data-driven approach to identifying ‘Long COVID’ cases in support of diagnosis and treatment.","authors":"J. Enns, A. Katz, M. Yogendran, Marcelo L. Urquia, S. Muthukumarana, Surani Matharaarachchi, A. Singer, Nathan C. Nickel, L. Star, Teresa Cavett, Y. Keynan, L. Lix, D. Sanchez-Ramirez","doi":"10.23889/ijpds.v7i3.1924","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1924","url":null,"abstract":"ObjectivePost-acute COVID-19 (or ‘long COVID’) manifests as a wide range of long-lasting symptoms affecting multiple organ systems. We are developing criteria for identifying long COVID cases using administrative, clinical, survey and other data from Manitoba, Canada, with the ultimate goal of examining long COVID prevalence, risk factors, prognosis and recovery. \u0000ApproachGiven the lack of an accepted clinical definition and resulting lack of diagnostic codes, we are adopting several different creative and complementary strategies to identify long COVID cases. We are examining administrative and clinical data sources (laboratory data, physician claims, drug prescriptions, and electronic medical records) for information on positive COVID tests, common symptoms and complaints, and treatment provided. To identify people with long COVID who may not have sought healthcare, we are collecting survey data from a convenience community sample (members of a medical health fitness facility) and mining data on long COVID symptoms from Twitter. \u0000ResultsThe combination of approaches we have adopted and the expanding scientific literature on long COVID are contributing to a more comprehensive understanding of the impacts of long COVID in Manitoba. Through preliminary work on the laboratory data (positive COVID tests March 2020-June 2021), we have developed and characterized a COVID-positive cohort (n=47,515). Work is now underway to develop an algorithm for long COVID using symptoms from free text in electronic medical records, ICD-9 codes, and changes in health-seeking behaviour (compared to the pre-positive COVID test period and a matched sample). This population data-driven approach will then allow us to examine how multiple underlying health conditions, COVID illness severity, COVID vaccination status, and various socio-demographic factors are related to risk of long COVID. \u0000ConclusionThis research is generating actionable information by identifying risk factors to support clinical diagnosis of long COVID, making it easier for clinicians to recognize this new illness and develop plans to manage it, and will inform healthcare system planning by quantifying the burden of long COVID at the population level.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48630806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2096
M. Ho, Stefana Jovanovska, Jennafer Novess, Dina Skvirsky, R. Saskin, J. C. Victor
ObjectiveIn March 2014, ICES launched Data & Analytic Services (DAS), expanding the access to ICES data and analytics beyond ICES scientists and analytic staff. In eight years, DAS has grown and evolved to increase high quality services offered to an expanding client base of external researchers. ApproachAt the inception of DAS, two services were offered to public sector researchers: data access and analytics. Data access enabled researchers to analyze coded record-level data through a secure virtual environment. Analytics, conducted by DAS staff in ICES analytic environment, provided researchers with risk-cleared summary level reports. In response to growing demand from an increasingly diverse range of researchers, ICES engaged in extensive consultations with internal and external stakeholders to re-evaluate and operationalize new services. Compliance with contractual obligations and Ontario law, organizational capacity to scale up, alignment with ICES’ mission, vision and values, were cornerstones in establishing new offerings. ResultsAnalytic services became available to private sector researchers in June 2016. In March 2017, support for cohort and longitudinal follow-up studies became the newest service offering (researchers provided with a list of applicable individuals defined for the purposes of conducting publicly funded research). As more data assets become available to researchers, requests continue to increase in volume and complexity, particularly of projects seeking to import external data for linkage to ICES data. A second high performance computing virtual environment onboarded researchers September 2021 while the original analytic environment has undergone multiple upgrades, and will soon be fully refreshed. Regular solicitation of feedback has enabled DAS to increase staffing and diversify resources available which improves the client experience at all stages. ConclusionsSince its inception, DAS has expanded from five to thirty personnel, grown and diversified its new and returning client base and has responded to demand for new services. DAS continues to provide high quality services which enable impactful research and is responsive to new opportunities for collaboration and service provision.
{"title":"ICES Data and Analytic Services: Eight Years Young.","authors":"M. Ho, Stefana Jovanovska, Jennafer Novess, Dina Skvirsky, R. Saskin, J. C. Victor","doi":"10.23889/ijpds.v7i3.2096","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2096","url":null,"abstract":"ObjectiveIn March 2014, ICES launched Data & Analytic Services (DAS), expanding the access to ICES data and analytics beyond ICES scientists and analytic staff. In eight years, DAS has grown and evolved to increase high quality services offered to an expanding client base of external researchers. \u0000ApproachAt the inception of DAS, two services were offered to public sector researchers: data access and analytics. Data access enabled researchers to analyze coded record-level data through a secure virtual environment. Analytics, conducted by DAS staff in ICES analytic environment, provided researchers with risk-cleared summary level reports. In response to growing demand from an increasingly diverse range of researchers, ICES engaged in extensive consultations with internal and external stakeholders to re-evaluate and operationalize new services. Compliance with contractual obligations and Ontario law, organizational capacity to scale up, alignment with ICES’ mission, vision and values, were cornerstones in establishing new offerings. \u0000ResultsAnalytic services became available to private sector researchers in June 2016. In March 2017, support for cohort and longitudinal follow-up studies became the newest service offering (researchers provided with a list of applicable individuals defined for the purposes of conducting publicly funded research). As more data assets become available to researchers, requests continue to increase in volume and complexity, particularly of projects seeking to import external data for linkage to ICES data. A second high performance computing virtual environment onboarded researchers September 2021 while the original analytic environment has undergone multiple upgrades, and will soon be fully refreshed. Regular solicitation of feedback has enabled DAS to increase staffing and diversify resources available which improves the client experience at all stages. \u0000ConclusionsSince its inception, DAS has expanded from five to thirty personnel, grown and diversified its new and returning client base and has responded to demand for new services. DAS continues to provide high quality services which enable impactful research and is responsive to new opportunities for collaboration and service provision.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48761540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1928
Hannah Dickson, G. Vamvakas, N. Blackwood
ObjectivesTotal annual costs of crime in England and Wales is estimated at £50bn.The age-crime curve indicates that criminal behavioural peaks in adolescence and decreases in adulthood. Life-course persistent offenders begin to behave antisocially early in childhood and continue this behaviour into adulthood. By contrast, adolescent-limited offenders exhibit most of their antisocial behaviour during adolescence, with a minority continuing to offend into adulthood. However, evidence suggests that this curve conceals distinct developmental trajectories. Prospective cohort study data has highlighted distinct risk factors for these offending trajectories, but this research is limited because of small sample sizes for disadvantaged groups, selection bias and infrequency of data collection. ApproachThe current study began in February 2022 and is one of the first to use UK linked national crime and education records. The aim is to: (1) establish the offending trajectories of individuals between the ages of 10 and 32 years following their first recorded conviction or caution using national crime records; and (2) develop prediction models of these offending trajectories using administrative education and social care data. ResultsIn my talk, I will share findings on the offending trajectories identified and present some early results on the key education and social care drivers of the offending trajectories. ConclusionsFindings from the project have the potential to identify previously unknown, or confirm lesser known, offending trajectories using real world data based on the UK population. It may also lead to the detection of previously unknown risk or protective factors for offending, which has implications for early intervention and could help inform criminal justice system responses to early antisocial behaviour.
{"title":"Education and social care predictors of offending trajectories: A UK administrative data linkage study.","authors":"Hannah Dickson, G. Vamvakas, N. Blackwood","doi":"10.23889/ijpds.v7i3.1928","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1928","url":null,"abstract":"ObjectivesTotal annual costs of crime in England and Wales is estimated at £50bn.The age-crime curve indicates that criminal behavioural peaks in adolescence and decreases in adulthood. Life-course persistent offenders begin to behave antisocially early in childhood and continue this behaviour into adulthood. By contrast, adolescent-limited offenders exhibit most of their antisocial behaviour during adolescence, with a minority continuing to offend into adulthood. However, evidence suggests that this curve conceals distinct developmental trajectories. Prospective cohort study data has highlighted distinct risk factors for these offending trajectories, but this research is limited because of small sample sizes for disadvantaged groups, selection bias and infrequency of data collection. \u0000ApproachThe current study began in February 2022 and is one of the first to use UK linked national crime and education records. The aim is to: (1) establish the offending trajectories of individuals between the ages of 10 and 32 years following their first recorded conviction or caution using national crime records; and (2) develop prediction models of these offending trajectories using administrative education and social care data. \u0000ResultsIn my talk, I will share findings on the offending trajectories identified and present some early results on the key education and social care drivers of the offending trajectories. \u0000ConclusionsFindings from the project have the potential to identify previously unknown, or confirm lesser known, offending trajectories using real world data based on the UK population. It may also lead to the detection of previously unknown risk or protective factors for offending, which has implications for early intervention and could help inform criminal justice system responses to early antisocial behaviour.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44324426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.1793
Lara M. Greaves, Cinnamon-Jo Lindsay, Eileen Li, Emerald Muriwai, A. Sporle
Linked data presents different social and ethical issues for different contexts and communities. The Statistics New Zealand Integrated Data Infrastructure (IDI) is a collection of de-identified whole-population administrative datasets that researchers are increasingly using to answer pressing social and policy research questions. Our work seeks to provide an overview of the IDI, associated issues for Māori (the Indigenous peoples of New Zealand), and steps to realise Māori data aspirations. In this paper, we first introduce the IDI including what it is and how it developed. We then move to an overview of Māori Data Sovereignty. Our paper then turns to examples of organisations, agreements, and frameworks which seek to make the IDI and data better for Māori communities. We then discuss the main issues with the IDI for Māori including technical issues, deficit-framed work, involvement from communities, consent, social license, further data linkage, and barriers to access for Māori. We finish with a set of recommendations around how to improve the IDI for Māori, making sure that Māori can get the most out of administrative data for our communities. These include the need to build data researcher capacity and capability for Māori, Māori data co-governance and accountability, reducing practical and skill barriers for access by Māori and Māori organisations, providing robust, consistent and transparent practice exemplars for best practice, and potentially even abolishing the IDI and starting again. These issues are being worked through via Indigenous engagement and co-governance processes that could provide useful exemplars for Indigenous and community engagement with linked data resources.
{"title":"Māori and Linked Administrative Data: A Critical Review of the Literature and Suggestions to Realise Māori Data Aspirations.","authors":"Lara M. Greaves, Cinnamon-Jo Lindsay, Eileen Li, Emerald Muriwai, A. Sporle","doi":"10.23889/ijpds.v7i3.1793","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.1793","url":null,"abstract":"Linked data presents different social and ethical issues for different contexts and communities. The Statistics New Zealand Integrated Data Infrastructure (IDI) is a collection of de-identified whole-population administrative datasets that researchers are increasingly using to answer pressing social and policy research questions. Our work seeks to provide an overview of the IDI, associated issues for Māori (the Indigenous peoples of New Zealand), and steps to realise Māori data aspirations. In this paper, we first introduce the IDI including what it is and how it developed. We then move to an overview of Māori Data Sovereignty. Our paper then turns to examples of organisations, agreements, and frameworks which seek to make the IDI and data better for Māori communities. We then discuss the main issues with the IDI for Māori including technical issues, deficit-framed work, involvement from communities, consent, social license, further data linkage, and barriers to access for Māori. We finish with a set of recommendations around how to improve the IDI for Māori, making sure that Māori can get the most out of administrative data for our communities. These include the need to build data researcher capacity and capability for Māori, Māori data co-governance and accountability, reducing practical and skill barriers for access by Māori and Māori organisations, providing robust, consistent and transparent practice exemplars for best practice, and potentially even abolishing the IDI and starting again. These issues are being worked through via Indigenous engagement and co-governance processes that could provide useful exemplars for Indigenous and community engagement with linked data resources.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43088793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2038
Sarah Collyer, Josie Plachta
ObjectivesThe Demographic Index (DI) comprises of five linked administrative datasets, used for population estimation. Current linkage methods are not ideal to utilise the power of this asset. Using the 2021 England and Wales Census, we are developing an innovative composite linkage method to fully utilise the power of the DI. ApproachUsing non-greedy deterministic and probabilistic linkage methods, we will link the DI to the Census at a composite level where we believe links exist – i.e., linking a Census cluster (consisting of linked Census and Census Coverage Survey (CCS) records) with a DI cluster (consisting of linked records from the data sources used to make the DI). We will then conduct a pairwise linkage of records from these linked clusters to link individual source records to the Census. We will utilise clerical review to resolve uncertain and conflicting links and to inform the quality of our linkage. ResultsWe anticipate producing a high-quality linkage that will inform how the coverage of the DI compares to Census (through the composite-level linkage) and the quality of the DI itself (through the pairwise-level linkage). We have developed a clerical matching system that can display composite-level linkage, i.e., candidate cluster-pairs. We will tailor our clerical review and quality assessment to records that fall within carefully chosen postcode areas, to ensure all hard-to-count groups and geographical areas are sampled. Working with large datasets is a challenge we are overcoming by using distributed computing and search space reduction. The 2021 Census has been previously linked to the CCS with high accuracy; these records are considered intrinsically linked. ConclusionTo assess national population estimates’ quality and the policy decisions based upon them, we are linking a key composite population-level dataset to the 2021 England and Wales Census. The presentation will showcase the methods we are developing and how we are ensuring the highest quality possible.
{"title":"Using linkage to assess coverage of population estimates.","authors":"Sarah Collyer, Josie Plachta","doi":"10.23889/ijpds.v7i3.2038","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2038","url":null,"abstract":"ObjectivesThe Demographic Index (DI) comprises of five linked administrative datasets, used for population estimation. Current linkage methods are not ideal to utilise the power of this asset. Using the 2021 England and Wales Census, we are developing an innovative composite linkage method to fully utilise the power of the DI. \u0000ApproachUsing non-greedy deterministic and probabilistic linkage methods, we will link the DI to the Census at a composite level where we believe links exist – i.e., linking a Census cluster (consisting of linked Census and Census Coverage Survey (CCS) records) with a DI cluster (consisting of linked records from the data sources used to make the DI). We will then conduct a pairwise linkage of records from these linked clusters to link individual source records to the Census. We will utilise clerical review to resolve uncertain and conflicting links and to inform the quality of our linkage. \u0000ResultsWe anticipate producing a high-quality linkage that will inform how the coverage of the DI compares to Census (through the composite-level linkage) and the quality of the DI itself (through the pairwise-level linkage). We have developed a clerical matching system that can display composite-level linkage, i.e., candidate cluster-pairs. We will tailor our clerical review and quality assessment to records that fall within carefully chosen postcode areas, to ensure all hard-to-count groups and geographical areas are sampled. Working with large datasets is a challenge we are overcoming by using distributed computing and search space reduction. \u0000The 2021 Census has been previously linked to the CCS with high accuracy; these records are considered intrinsically linked. \u0000ConclusionTo assess national population estimates’ quality and the policy decisions based upon them, we are linking a key composite population-level dataset to the 2021 England and Wales Census. The presentation will showcase the methods we are developing and how we are ensuring the highest quality possible.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43453716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-25DOI: 10.23889/ijpds.v7i3.2057
J. Cooper, D. O’Reilly, Richard Kirk, Trish Kelly, Rachel Gibbs, M. Donnelly
A project designed to examine, for the first time, the health records of adult prisoners in Northern Ireland and their linkage to other available health data: the test case of prisoner post-release mortality risk ObjectivesThe linkage of routinely collected administrative data for research purposes has the potential to improve knowledge and public benefit. We describe a novel data linkage study between the Northern Ireland (NI) Healthcare in Prisons and Business Services Organisation (BSO). This work is undertaken within the Administrative Data Research Centre-NI (ADRC-NI). ApproachThis joint project between ADRC-NI Queen’s University Belfast and NI Healthcare in Prisons (South Eastern Health and Social Care Trust) will test linkage of prisoner health records to health data held in the BSO and the potential to generate a population-based cohort for a retrospective analysis of prisoner health (2012-2021) that will attempt to characterise prisoners according to socio-demographic, health and committal factors, compare post-release mortality rates with a reference group from the NI population using indirect standardisation and estimate post-release mortality risk using Cox proportional hazards models. ResultsUsing novel data-linkages, a dataset will be created to examine the health of prisoners (and former prisoners) in NI. Ethics and governance approvals are in place for this data-linkage. The linkage will be undertaken via the Honest Broker Service (HBS) in NI and the dataset will be accessed in the safe setting at the BSO. The processes involved, experiences including significant delays or difficulties, and recommendations for future data-linkage studies will be discussed. In addition, a key deliverable of this project will be an assessment of access and linkage capabilities of the prisoner health data, with metadata created and made available to future researchers. In addition, we plan to present preliminary results relating to the test research question. ConclusionWe will describe the processes involved and first-hand research experience in the development of a novel data-linkage project, in addition we will detail access and linkage capabilities in relation to this new dataset to examine health in prisoners (and former prisoners) in NI.
{"title":"A project designed to examine, for the first time, the health records of adult prisoners in Northern Ireland and their linkage to other available health data: the test case of prisoner post-release mortality risk.","authors":"J. Cooper, D. O’Reilly, Richard Kirk, Trish Kelly, Rachel Gibbs, M. Donnelly","doi":"10.23889/ijpds.v7i3.2057","DOIUrl":"https://doi.org/10.23889/ijpds.v7i3.2057","url":null,"abstract":"A project designed to examine, for the first time, the health records of adult prisoners in Northern Ireland and their linkage to other available health data: the test case of prisoner post-release mortality risk \u0000ObjectivesThe linkage of routinely collected administrative data for research purposes has the potential to improve knowledge and public benefit. We describe a novel data linkage study between the Northern Ireland (NI) Healthcare in Prisons and Business Services Organisation (BSO). This work is undertaken within the Administrative Data Research Centre-NI (ADRC-NI). \u0000ApproachThis joint project between ADRC-NI Queen’s University Belfast and NI Healthcare in Prisons (South Eastern Health and Social Care Trust) will test linkage of prisoner health records to health data held in the BSO and the potential to generate a population-based cohort for a retrospective analysis of prisoner health (2012-2021) that will attempt to characterise prisoners according to socio-demographic, health and committal factors, compare post-release mortality rates with a reference group from the NI population using indirect standardisation and estimate post-release mortality risk using Cox proportional hazards models. \u0000ResultsUsing novel data-linkages, a dataset will be created to examine the health of prisoners (and former prisoners) in NI. Ethics and governance approvals are in place for this data-linkage. The linkage will be undertaken via the Honest Broker Service (HBS) in NI and the dataset will be accessed in the safe setting at the BSO. The processes involved, experiences including significant delays or difficulties, and recommendations for future data-linkage studies will be discussed. In addition, a key deliverable of this project will be an assessment of access and linkage capabilities of the prisoner health data, with metadata created and made available to future researchers. In addition, we plan to present preliminary results relating to the test research question. \u0000ConclusionWe will describe the processes involved and first-hand research experience in the development of a novel data-linkage project, in addition we will detail access and linkage capabilities in relation to this new dataset to examine health in prisoners (and former prisoners) in NI.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49534857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}