Introduction: Up-to-date, high-quality estimates of population and households are essential for planning the provision of local and central infrastructure.
Objectives: We aimed to derive estimates of population size, and household numbers and size on Census date (21/03/2021) using north-east London primary care Electronic Health Records (EHR) and calculate levels of their agreement with the publicly available official Census 2021 estimates to assess if health data have the potential to be used to create reliable statistics.
Methods: We compared EHR and Census population estimates by sex, age, local authority, and IMD quintile, and EHR and Census household estimates by number, size, and local authority. We estimated 95% Limits of Agreement between EHR and Census household and population estimates using the Bland and Altman method. In sensitivity analyses, we excluded people with no General Practice encounter within 12 months and compared the adjusted population's size to Census estimate.We compared EHR and administrative Statistical Population Dataset (SPD) to Census population estimates by sex and age, and EHR and Admin-based Occupied Address Dataset (ABOAD) to Census household estimates by local authority and household size.
Results: EHR population estimate was 2,130,965, i.e. 7.1% higher than Census of 1,990,087. EHR household estimate was 658,264, i.e. 9.1% lower than Census of 724,045. The estimate of population with recent GP encounter was 11.6% lower than the Census estimate.Compared to Census, both SPD and EHR overcounted population of males (10.7%, 7.9% respectively) and females (3.6%, 2.7% respectively). Both ABOAD and EHR had undercounted households compared to Census (-7.3%; -9.1% respectively).
Conclusions: Reliable, up-to-date populations and households estimates can be derived from health records. High residential mobility increases the complexity of deriving these estimates. Excluding people without GP encounters does not improve agreement with Census. Future work will focus on comparing Census and EHR estimates using individual-level data.
Introduction: Monitoring and addressing health inequalities is important. However, socioeconomic variables are usually unavailable within health datasets. Area deprivation measures provide access to open-source reliable socioeconomic data within low/middle-income countries and can contribute to the monitoring of the Sustainable Development Goals and assessing the growing burden of health inequalities.
Objective: To create a small-area deprivation measure for the whole of Brazil - the Brazilian Deprivation Index (Índice Brasileiro de Privação - IBP).
Methods: Using Census Sector data (mean population size=615) from the most recently available Brazilian Demographic Census (2010), variables measuring literacy, household income and housing conditions were standardised using z-scores and summed into a single measure. The IBP was validated using regional small-area measures of vulnerability: Belo Horizonte's Health Vulnerability Index (IVS) and São Paulo's Social Vulnerability Index (IPVS). Mortality data from Minas Gerais were used to estimate age-standardised mortality rates (ASMR) by ill-defined causes across IBP deprivation quintiles.
Results: The IBP was created for 303,218 (97.8%) census sectors (99.7% population). Substantial regional variation in deprivation was found using the IBP measure, with higher deprivation in rural than urban areas. The IBP was correlated with the other indicators used for validation: the IVS (r = 0.96) and the IPVS (r = 0.68). We found gradients across the ill-defined causes ASMR, in Minas Gerais mortality was 2.6 higher in the most deprived quintile of IBP, compared with the least deprived. Main challenges in creating a deprivation measure for LMICs and possible solutions are demonstrated.
Conclusion: A small area deprivation index was created for Brazil, a large and highly diverse middle-income country. The IBP improves our understanding and monitoring of inequalities, serving as a valuable tool for informing targeted public policies. Although the index is based on Brazil's specific context, the challenges faced, and the strategies implemented to tackle them are relevant for other low- and middle-income countries aiming to develop similar tools.
Introduction: Childhood exposure to and duration of poverty can affect several individual characteristics related to intellectual development.
Objectives: This paper examines the implications of movement in and out of childhood poverty using a unique linkable database from the Canadian province of Manitoba. Differences in measurement of poverty and intellectual development are explored.
Methods: Almost 90,000 children were followed using two definitions of poverty - neighborhood and household poverty. The large database permitted exploring the role of another variable - maternal mental health.
Results: The association of household poverty with poorer intellectual outcomes has been shown to be stronger than the association of neighborhood poverty with such outcomes. This was true using various outcome measures appropriate across childhood (from age 5 to age 17). Comparisons with the role of maternal mental health were made and further analyses suggested.
Conclusion: The richness of the data has facilitated the study of childhood intellectual development. Household poverty appears to play an important role; neighborhood poverty and maternal mental health also seem to influence such development, but less strongly.
Introduction: Psychotic disorders are associated with high levels of disability and poor clinical outcomes but little is known about the regional incidence of psychosis in Ontario.
Objective: This study aimed to understand regional incidence variation and demographic and regional characteristics of individuals who may be suitable for receiving early psychosis intervention (EPI) services, as well as evaluate post-diagnosis healthcare utilisation.
Methods: A population-based retrospective cohort study captured incident affective and non-affective psychosis cases among Ontario, Canada residents aged 12-50 from 2017-2021. The sociodemographic characteristics of the cohort were described, including Ontario Health region of residence. Incident cases were followed for 6-months post-diagnosis to capture health service utilisation. Logistic regression was used to model post-diagnosis hospitalisations and Poisson regression to model outpatient psychiatrist visits.
Results: The cohort contained 44,188 individuals (41,257 non-affective psychosis; 3,058 affective psychosis). We observed substantial regional variation in incidence rates, which were higher in the North Western region for non-affective psychosis (167.44/100,000) and North Eastern region for affective psychosis (14.23/100,000) compared to the provincial average (92.24; 6.84/100,000, respectively). Compared to the Toronto region, post-diagnosis hospitalisations were significantly higher in the North East (non-affective psychosis aOR 1.14, 95%CI 1.01-1.30; affective psychosis aOR 1.69, 95%CI 1.13-2.54). Among those with non-affective psychosis, outpatient psychiatrist visits were significantly lower in all regions compared to Toronto (e.g., East aRR 0.61, 95%CI 0.60-0.62; North West aRR 0.34, 95%CI 0.32-0.36).
Conclusions: There is considerable regional variation in incident psychosis and inverse relationships between hospitalisations and outpatient care. To successfully plan for future EPI programs in Ontario, it is essential to understand regional needs using a systematic, population-based approach.
Confidential administrative data is usually only available to researchers within a Trusted Research Environment (TRE). Recently, some UK groups have proposed that low-fidelity synthetic data (LFSD) be made available to researchers outside the TRE, to allow code-testing and data discovery. There is a need for transparency so that those who access LFSD know how it has been created and what to expect from it. Relationships between variables are not maintained in LFSD, but a real or apparent data breach can occur from its release. To be useful to researchers for preliminary analyses LFSD needs to meet some minimum quality standards. Researchers who will use the LFSD need to have details of how it compares with the data they will access in the TRE clearly explained and documented. We propose that these checks should be run by data controllers before releasing LFSD to ensure it is well documented, useful and non-disclosive. Labelling To avoid an apparent data breach, steps must be taken to ensure that the synthetic data (SD) is clearly identified as not being real data.Disclosure The LFSD should undergo disclosure risk evaluation as described below and any risks identified should be mitigated.Structure The structure of the SD should be as similar as possible to the TRE data.Documentation Differences in the structure of the SD compared to data in the TRE must be documented, and the way(s) that analyses of the SD expect to differ from those of data in the TRE must be clarified. We propose details of each of these below; but a strict, rule-based approach should not be used. Instead, the data holders should modify the rules to take account of the type of information that may be disclosed and the circumstances of the data release (to whom and under what conditions).
Introduction: Public service leaders face increasing challenges using data effectively due to program silos, limited resources, and the increasing complexity of data. To address these challenges, Iowa's Integrated Data System for Decision-Making (I2D2) partnered with state and local leaders in early childhood to curate key indicators and develop population-level data tools and training to promote policy and practice improvements.
Methods: We relied on a mixed-methods, participatory approach to understand early childhood data and reporting requirements and how state and local leaders leverage data to meet these requirements and inform decisions. We conducted a Data Landscape Overview consisting of interviews, surveys, document review, and meetings with state and local leaders. Public deliberation facilitated iterative feedback and collective decision-making through stakeholder discussions.
Results: Our participatory approach resulted in three actions to improve data collection and use within Iowa's early childhood system: curating a set of early childhood indicators; developing training and strategic planning tools for effective data use; and building the Iowa Data Drive (IDD), an interactive data portal for accessing key early childhood indicators and population-level insights.
Conclusions: A robust IDS can promote systems change when grounded in strong partnerships, phased implementation, and a commitment to clear communication. By centering local voices and fostering trust, we developed indicators and tools that support data-informed decisions and improved services for young children and their families.
Introduction: Mental health and substance use (MH/SU) problems are highly prevalent among the prison population. However, early and preventative post-imprisonment care appears to be insufficient to meet the MH/SU needs of people released. This is demonstrated by elevated rates of MH/SU-related emergency care and deaths attributable to alcohol, drugs and suicide. Studies examining post-imprisonment healthcare contacts across community, outpatient, inpatient and emergency services for MH/SU are required to address this issue. This protocol paper describes the outcome of data linkage and details our plans for data cleaning and analysis.
Methods: The RELEASE study will follow a retrospective observational cohort design. This is the first study using national individual-level linked administrative health and prison data from Scotland. We report the results of creating the cohort, and outline proposed methods for data preparation and analysis. Within the cohort, the exposed group comprises everyone released from prison in 2015, and the unexposed group consists of a random sample of the general population matched (1:5 ratio) on age, sex, postcode and postcode-derived index of multiple deprivation, and with no prison exposure in the preceding 5 years. Health data (community prescribing, outpatient visits, specialist substance use, psychiatric inpatient, general inpatient, out-of-hours general practice, 24-hour National Health Service [NHS] helpline, ambulance, and emergency services), deaths data, and prison data (admissions, releases, demographic data) were linked to the cohort using unique identifiers. Service contacts associated with MH/SU will be quantified and compared across the two groups using regression modelling, controlling for potential confounding variables, reimprisonment and deaths.
Conclusion: RELEASE is a comprehensive study with potential to inform post-imprisonment MH/SU service delivery, whilst the dataset holds significant potential for exploring other health conditions and outcomes. This research will allow for an unprecedented understanding of post-imprisonment service use patterns in Scotland, and RELEASE will make a significant public health contribution given the overrepresentation of people released in costly emergency care contact and death rates.
Synthetic data is emerging as a key area of development for supporting research that involves secure forms of administrative and health data, both in the United Kingdom and globally. In practice, key challenges in the generation and adoption of synthetic data are closely tied to the need for agreed and consistent terminology for describing it. The absence of standardised language hinders the setting of quality standards, establishment of governance and guidelines and effective sharing of knowledge and best practices. This has implications for research that uses synthetic healthcare and administrative data, particularly when such data are generated from protected personal data. This commentary paper reviews existing literature on synthetic data to explore how key terms are currently defined in practice, with a focus on privacy-preserving use cases. Our analysis reveals that terms describing properties of synthetic data are often lacking and inconsistent, largely due to the breadth of synthetic data types, contexts and use cases. Context-specific terminology with nuanced meanings complicates efforts for the development of universally agreed definitions, particularly for privacy-preserving synthetic data that captures characteristics from protected data sources. To address this, we propose broad definitions for key terms including synthetic data, utility, utility measure and fidelity. We conclude by offering a set of recommendations emphasising the need for consensus on terminology and encouraging clearer descriptions in future literature that specify both the intended use of the data and the measures used to describe it.
Data transparency lays the groundwork for the ethical use of administrative data. This is particularly true for linked administrative data within integrated data systems (IDS). Data dictionaries, resources that maintain the metadata of the information housed in an IDS, offer a tool to ensure transparency throughout the data life cycle. The FAIR Principles, which assert that data be Findable, Accessible, Interoperable, and Reusable provide a useful framework by which to measure the effectiveness of data dictionaries in the IDS context. This paper uses the FAIR Principles to discuss the ways in which data dictionaries serve as tools in the ethical and transparent use of integrated data as well as the challenges that remain. Linked administrative data is a valuable source of information for programmatic and academic research. Data dictionaries facilitate the ethical handling of this sensitive information and maintain a commitment to transparency in data inquiry and research.

