Introduction: Schools worldwide balance whole-class teaching with additional provision for children with special educational needs or disability (SEND). Robust evidence on equity and effectiveness of SEND provision is essential to address growing demand and rising costs globally.
Objectives: To synthesise findings from the Health Outcomes for young People throughout Education (HOPE) evaluation of variation in SEND provision and its impact on health and education outcomes in English primary schools. We integrated findings from 14 sub-studies using administrative data in the Education and Child Health Insights from Linked Data (ECHILD) database and 10 mixed methods sub-studies.
Methods: Analyses of ECHILD data followed children from birth to age 11 years. We examined how variation in SEND provision was associated with health conditions, and school, social and organisational factors. Using target trial emulation, we estimated the impact of SEND provision on hospital admissions, school absences and attainment. We surveyed and interviewed young people, parents, and professionals and reviewed information about services to understand SEND processes and contexts.
Results: Of 3.8 million children born 2004 to 2013, 30% had SEND provision recorded by age 11. Health conditions were only partially associated with SEND provision, which was also related to male gender, social disadvantage, low attainment and type of school. SEND provision modestly reduced rates of unauthorised absences in subgroups of children but showed no measurable benefit on hospital admissions or school attainment. Mixed methods studies highlighted benefits of early, responsive support, challenges posed by limited capacity, harms caused by delayed or inadequate provision, and need for parent advocacy to access SEND provision.
Discussion: Weak evidence of benefits of SEND provision in causal analyses likely reflects unmeasured confounding, lack of measures of provision received and insensitive outcomes in ECHILD data. SEND policies need robust evidence from analyses across jurisdictions using administrative data, enhanced with better measures, experimental methods and contextual evaluation.
Introduction: Clinical guidelines may reduce statistical power in epidemiological studies by discarding informative measures. Epidemiological studies of lung function may discard one-third to one-half of participants due to spirometry measures deemed "low quality" using criteria adapted from clinical practice.
Objectives: To optimise the signal-to-noise ratio in epidemiological studies of lung function, we aimed to develop a data-driven method to refine spirometry quality control (QC) criteria.
Methods: We proposed a genetic risk score (GRS) informed strategy to categorise spirometer blows by quality criteria. GRS was built using SNPs associated with lung function traits in non-UK Biobank cohorts. In the UK Biobank, we applied a step-wise testing of the GRS association across groups of spirometry blows stratified by acceptability flags to rank the blow quality. We reassessed QC criteria by comparing the genetic associations under different acceptability flags and repeatability thresholds to determine the trade-off between sample size and measurement error.
Results: We found that including blows previously excluded by strict QC criteria would maximise the statistical power for genome-wide association study and retain acceptable precision in the UK Biobank. This approach allowed the inclusion of 29% more participants compared to the strictest clinical guidelines and demonstrated genetic signals could be identified earlier.
Conclusions: Our GRS-based method offers an important framework to challenge prevailing practices that exclude informative measures and limit power in epidemiological studies.
Aim: To create longitudinal postcode history datasets that allocate mothers to one postcode for each week of pregnancy and children to one postcode for each week of infancy for a study of air pollution and respiratory infections in infants.
Datasets: We used linked birth registrations and NHS birth notifications for all children born in London between 2010 and 2014, which constituted the spine for the Air Pollution, housing and respiratory tract Infections in Children: National Birth Cohort Study (PICNIC) study. The birth data were linked by NHS England to the Personal Demographics Service (PDS) in order to derive maternal and child postcode histories for each week of pregnancy and infancy.
Challenges: While the research team had extensive experience working with administrative data, including birth registrations and notifications, the postcode history data was a new resource and lacked meta-data, papers or reports from previous users. A substantial number of records were missing a move-in date, or both a move-in date and postcode, adding complexities when ascertaining an address history for study participants. Further, we encountered instances of incorrectly recorded postcodes and implausible numbers of postcodes recorded in a week.
Lessons learned: One half of children in this London-based cohort moved during infancy, and one third of their mothers moved during pregnancy. This highlights the importance of taking into account changes in residential address in studies examining the association between environmental exposures and health outcomes. Cleaned and validated longitudinal national address records are crucial for environmental health studies. However, they are also resource intensive, with implications for researchers and research funders.
Introduction: In Australia, around 85% of children survive childhood cancer. Yet, up to 80% of survivors experience subsequent adverse health conditions called late effects, largely attributed to cancer treatment. The LACE study is a population-based linked data resource that aims to facilitate the investigation of childhood cancer and its treatment and the impact on late effects for childhood cancer survivors.
Methods: The study links the Australian Childhood Cancer Registry to administrative cross-jurisdictional health and education data to enable ongoing follow-up of outcomes for childhood cancer survivors. The study population includes all Australian children aged less than 15 years, diagnosed with cancer 1983-2021, and comparison groups comprising siblings of childhood cancer patients and a random sample of children from the general population frequency matched by age, sex and residential location to cases.
Results: To date, the case cohort includes 25,226 children diagnosed with cancer, with longest follow-up to the age of 53 years. The most commonly diagnosed childhood cancers were leukaemia and related cancers (n=8182, 32.4%), followed by central nervous system and related cancers (n=5850, 23.2%), and lymphomas and reticuloendothelial neoplasms (n=2568, 10.2%). Overall, 16,314 (64.7%) children underwent chemotherapy, 5555 (22.0%) received radiotherapy and 7300 (28.9%) had surgical treatment for their cancer, with immunotherapy use reported for 641 (2.5%), hormonal therapy for 4549 (18.0%) and ancillary therapies for 2581 (10.2%). A total of 19,321 (76.6%) cases were alive at the end of the study.
Conclusion: This new comprehensive national data linkage resource represents a valuable asset that will facilitate research to identify the risk of late effects and effective follow-up care to inform counselling patients and their families, as well as guidelines, models of care and personalised follow-up care plans. Further, it will enable identification of inequities in healthcare access and outcomes across population sub-groups.
Introduction: Priority setting with patients, public and professionals is essential for research utilising routinely collected data, as this ensures data are being used in the public interest. However, it is challenging to identify research priorities that are relevant to a wide range of local stakeholders and can be addressed with routinely collected data.
Objectives: To describe and present the results of a priority setting exercise aiming to identify research priorities for Born in Bradford for All (BiB4All), a routine data linkage cohort of mothers and babies born in Bradford, a city in the north of England.
Methods: We developed a two-hour online workshop to engage a range of stakeholders across Bradford, including parents, early years practitioners, commissioners, and service providers. The workshop method combined elements of existing priority setting approaches to ensure priorities were identified in an inclusive, timely and deliberative way, and supported stakeholders to develop their understanding of using linked routine data for research.
Results: The workshop identified seventeen important and urgent research priorities around child and maternal health for research with locally linked routine data. Key topic areas included maternal and infant mental health, the long-term impact of the Covid-19 pandemic on maternal and child health outcomes, inequalities in access to services, and infant feeding experiences.
Conclusions: The identified research priorities have been shared widely amongst interested networks and have shaped the BiB4All research agenda, demonstrating the feasibility of the stakeholder engagement method. They also have important implications for policy and practice. For policy, they provide an understanding of the key issues faced by local communities, which can steer policy priorities and investment in evidence generation. For practice, involvement in the workshop has generated a greater understanding of how local service data can be used for research and to inform improvements to service delivery.
Introduction: Deprivation measures have been used in research to assess within-country health inequalities globally. Most of these indices are created using data from national census, given their availability and nationwide coverage.
Objectives: This study aims to create a census-based deprivation index in Ecuador, the Ecuadorian Deprivation Index (EDI), that reflects the country specific context using national census data for four geographical units (census sector, parish, canton and province). It will be compared to two traditional small area indices (Townsend and Carstairs) to assess the most appropriate and context specific index for Ecuador. Finally, the performance of the three indices will be assessed by examining the association and extent of inequalities with teenage pregnancy as this has been shown to be socially patterned in other countries.
Methods: This study uses the 2010 Ecuadorian census and follows the stages and recommendations for developing small-area deprivation indices. The Townsend and Carstairs are firstly replicated. For the EDI, Principal Component Analysis is used to select the most appropriate indicators. Summary measures for higher-level geographical areas were developed following the techniques used in the English Index of Multiple Deprivation. Inequalities in teenage pregnancy is measured using the Slope index of inequality and the Relative index of inequality.
Results: The three indices exhibit a good match in urban areas and can describe pattern of inequalities in teenage pregnancy. However, the EDI Index captures rural deprivation more appropriately and that includes the Coast and Amazon geographical regions.
Conclusions: Traditional deprivation measures may not adequately identify deprivation in Ecuador, given the country's unique specific contextual factors. The wider scope of the EDI will inform policy-makers towards developing tailored programs to alleviate deprivation and health inequalities in Ecuador.
Introduction: Unique Property Reference Numbers (UPRNs) provide every addressable location in the United Kingdom (UK) with an identifier up to 12-digits in length, which are persistently unique, and are a mandated standard across the public sector in the UK. This standardisation means they are suited to be pseudonymised for data linkage for research, innovation and public benefit. While there have been many consultations exploring public trust in, and attitudes to, using patient data for research, none have explicitly considered their use for address-based linkage using UPRNs.
Objectives: Our overarching aim is to build public trust in the uses of address-based data at household level. We set out to develop and test materials to facilitate conversations about the use of address-based data linkage at the household-level. In this case study, we describe the development of information materials and an initial dialogue to inform future public deliberation.
Methods: In collaboration with designers and researchers, we generated a prototype website and shared this with experienced public advisory groups. Feedback from these groups informed development of a suite of resources, including slides and a facilitator's script to guide workshop discussions. These were supplemented by interactive, tactile tools designed to promote understanding of key concepts, and to encourage participants to ask questions relevant to their interests and concerns. We hosted two workshops with residents in a multi-ethnic, disadvantaged inner city locality to test and refine these materials.
Results: Dialogue with residents emphasised the importance of accessibility, including clear descriptions of technical jargon, and the effectiveness of using less text-heavy materials and more interactive formats, particularly for participants for whom English is not their first language. Visual representations of people included in workshop materials need to reflect diversity in age, gender, ethnicity, and mobility to ensure resources are relatable. Adapting the approach to delivering information - whether through digital or physical formats - proved crucial in engaging with participants and meeting their diverse needs.
Conclusions: We have created and tested with different public groups a toolkit to support conversations with academic and public audiences about research using address-linked patient data. The toolkit has been disseminated and made freely available for use by the research community.
Introduction: Being able to accurately link primary and secondary healthcare records is invaluable for public health research. The Clinical Practice Research Datalink (CPRD) collects and curates primary care electronic health records from UK GP practices. These data are linked to secondary health data by National Health Service (NHS) England. As of 2020, NHS England introduced the Master Person Service (MPS) method to link data at the person-level. The method was first applied to CPRD data in the November 2024 linked data release.
Objectives: This paper provides an overview of the MPS linkage method and its impact on linked CPRD data.
Methods: The MPS linkage method searches each set of personal identifiers against records within the Personal Demographics Service and the MPS record bucket. Successful matches are assigned a patient identifier 'Person_ID', which is used to link records between datasets. The number of successfully linked CPRD patients was compared between the MPS and the previous linkage method. The impact of the change in linkage eligibility definition was also examined.
Results: There are 7.9 million (CPRD GOLD) and 34.2 million (CPRD Aurum) patient records in the December 2024 primary care builds that are of research quality and were successfully linked to a Person_ID. Compared to the previous linkage method, the proportion of patient records who were defined as eligible to be linked to Hospital Episode Statistics Admitted Patient Care (HES APC) and had data in HES APC increased from 75.7% to 81.0% in CPRD GOLD and from 72.1% to 79.0% in CPRD Aurum.
Conclusion: The new linkage eligibility definition is superior to the previous definition, resulting in greater ability to define appropriate denominator populations and to differentiate why some patients do not have linked data. The MPS linkage method offers the potential for CPRD to investigate individuals with duplicate records and practice mergers.
Introduction: Prescribing data has been collected electronically in Scotland for many years; however, data are collated in individual, non-overlapping datasets based on the origin of the prescription (e.g., primary or secondary care). The vision was to create a unified view of all prescribing data to provide a longitudinal dataset of medicines use for patients treated by the National Health Services (NHS) Scotland, irrespective of where or how that care was provided.
Methods: The Scottish Combined Medicines Dataset (SCoMeD) is, in essence, a data virtualisation tool collating information from three previously available prescribing datasets: the Prescribing Information System (PIS); the Hospital Electronic Prescribing and Medicines Administration (HEPMA) national dataset; and the Homecare Medicines (HCM) dataset. This allows the creation of study cohorts (patient groups of interest) that meet specified criteria across all prescribing settings and facilitates the retrieval of the prescribing history for individuals pre-identified from other datasets. Records contain a unique patient identifier (Community Health Index number) which is used to identify patients for inclusion in the dataset and also enables linkage to other routinely collected data, including hospital admission episodes and death records.
Results: SCoMeD contains details on the patient (age, sex, geographical information) and on the medication prescribed. Medication-related information includes what was received and when; strength and dose information are also available. The earliest date of data availability depends on the source (PIS, 01/2010; HEPMA, 07/2022; HCM, 01/2019). Data is held by Public Health Scotland.
Conclusion: SCoMeD facilitates a range of different studies, including cross-sectional/point-prevalence studies and drug utilisation studies as well as longitudinal studies, e.g., cohort and case-control studies. With the possibility to link to other relevant datasets, additional areas of interest may include health policy evaluations and health economics studies. Access to data is subject to approval; researchers need to contact the electronic Data Research and Innovation Service in the first instance.

