Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2436
Nathan Bourne, Michael Spencer, Oliver Berry
Introduction & BackgroundFinancial transaction data are highly valuable sources of digital footprints data for behavioural and economic research, but to properly create impact we must closely consider their limitations. Financial institutions hold a wealth of consumer data with untapped potential for community intelligence. These datasets combine excellent coverage with extremely granular information on consumer finances, income and spending, yet these institutions face great challenges in leveraging this data for social good. Smart Data Foundry is a university-owned, non-profit organisation that facilitates safe access to these datasets for researchers and provides insights to enable government bodies to tackle today's major challenges including the cost-of-living crisis and climate change. Objectives & ApproachWe will explore the opportunities afforded by these datasets for social and economic research. For example, using pseudonymised individual consumer banking data from NatWest Group, we have developed metrics for understanding income volatility and economic insecurity in collaboration with the Joseph Rowntree Foundation. We can also use these data to study consumer spending patterns and responses to economic changes such as interest rate rises and the net zero transition. We will assess the limitations of the data including issues of representativeness, bias, and missing data, and describe methods and mitigations to account for these challenges. We also discuss the barriers to accessing this type of data, in both relationship development with data partners, and privacy and governance concerns. Relevance to Digital FootprintsIndividual level customer transaction data provides a rich and novel form of digital footprint for behavioural and economic analyses. Every point of income or expenditure is recorded in a uniquely valuable digital footprint by financial institutions. These can provide a variety of insights, such as responses to macroeconomic shocks across demographic sets, emerging areas of financial distress, and help us better understand the drivers and risks of financial vulnerability. In both its aggregated and individual form, the data can provide an additional layer of understanding for trends we may see in other data, such as health or administrative data. Conclusions & ImplicationsHaving addressed the challenges of data access and data quality, we demonstrate that consumer banking data is an incredibly valuable form of digital footprints data, capturing key information on consumer behaviour. We conclude with a call for further research to develop use cases of this data for social good.
导言与背景金融交易数据是行为和经济研究中极具价值的数字足迹数据来源,但要产生适当的影响,我们必须仔细考虑其局限性。金融机构拥有丰富的消费者数据,这些数据在社区情报方面具有尚未开发的潜力。这些数据集结合了极好的覆盖面和有关消费者财务、收入和支出的极为细化的信息,但这些机构在利用这些数据为社会造福方面却面临着巨大的挑战。智能数据基金会(Smart Data Foundry)是一家由大学拥有的非营利性组织,它为研究人员安全访问这些数据集提供便利,并为政府机构应对生活成本危机和气候变化等当今重大挑战提供见解。目标和方法我们将探索这些数据集为社会和经济研究带来的机遇。例如,利用 NatWest 集团提供的化名个人消费者银行数据,我们与约瑟夫-罗特里基金会合作开发了用于了解收入波动性和经济不安全性的指标。我们还可以利用这些数据研究消费者的消费模式以及对利率上升和净零过渡等经济变化的反应。我们将评估数据的局限性,包括代表性、偏差和数据缺失等问题,并介绍应对这些挑战的方法和缓解措施。我们还将讨论获取此类数据的障碍,包括与数据合作伙伴的关系发展以及隐私和管理问题。与数字足迹的相关性个人层面的客户交易数据为行为和经济分析提供了丰富而新颖的数字足迹形式。每个收入或支出点都被金融机构记录在独一无二的宝贵数字足迹中。这些数据可以提供各种见解,如不同人口群体对宏观经济冲击的反应、新出现的金融困境领域,并帮助我们更好地了解金融脆弱性的驱动因素和风险。无论是汇总数据还是个体数据,这些数据都能为我们了解其他数据(如健康或行政数据)中的趋势提供额外的视角。结论与启示在解决了数据访问和数据质量的难题之后,我们证明了消费者银行数据是一种非常有价值的数字足迹数据形式,可以捕捉到消费者行为的关键信息。最后,我们呼吁开展进一步的研究,开发此类数据的社会公益用例。
{"title":"Challenges in access, representativeness, and bias in smart financial data relating to income volatility and economic insecurity.","authors":"Nathan Bourne, Michael Spencer, Oliver Berry","doi":"10.23889/ijpds.v9i4.2436","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2436","url":null,"abstract":"Introduction & BackgroundFinancial transaction data are highly valuable sources of digital footprints data for behavioural and economic research, but to properly create impact we must closely consider their limitations. \u0000Financial institutions hold a wealth of consumer data with untapped potential for community intelligence. These datasets combine excellent coverage with extremely granular information on consumer finances, income and spending, yet these institutions face great challenges in leveraging this data for social good. Smart Data Foundry is a university-owned, non-profit organisation that facilitates safe access to these datasets for researchers and provides insights to enable government bodies to tackle today's major challenges including the cost-of-living crisis and climate change. \u0000Objectives & ApproachWe will explore the opportunities afforded by these datasets for social and economic research. For example, using pseudonymised individual consumer banking data from NatWest Group, we have developed metrics for understanding income volatility and economic insecurity in collaboration with the Joseph Rowntree Foundation. We can also use these data to study consumer spending patterns and responses to economic changes such as interest rate rises and the net zero transition. We will assess the limitations of the data including issues of representativeness, bias, and missing data, and describe methods and mitigations to account for these challenges. We also discuss the barriers to accessing this type of data, in both relationship development with data partners, and privacy and governance concerns. \u0000Relevance to Digital FootprintsIndividual level customer transaction data provides a rich and novel form of digital footprint for behavioural and economic analyses. Every point of income or expenditure is recorded in a uniquely valuable digital footprint by financial institutions. These can provide a variety of insights, such as responses to macroeconomic shocks across demographic sets, emerging areas of financial distress, and help us better understand the drivers and risks of financial vulnerability. In both its aggregated and individual form, the data can provide an additional layer of understanding for trends we may see in other data, such as health or administrative data. \u0000Conclusions & ImplicationsHaving addressed the challenges of data access and data quality, we demonstrate that consumer banking data is an incredibly valuable form of digital footprints data, capturing key information on consumer behaviour. We conclude with a call for further research to develop use cases of this data for social good.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"122 49","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141361623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2415
A.S. Gilmour, Ruth Fairchild
Introduction & BackgroundWelsh secondary schools generally use a cashless catering system and pupils pay for school food and drink via contactless cards, thumb or fingerprint biometric scanning. Helpfully, the digital footprint of school canteen purchase data already exists and is continually compiled over a limitless period. Compared to other methods of recording dietary intake (i.e., questionnaires and food diaries), utilising this transactional data is an unobtrusive method of data collection and has great potential. Although individual-level transactional data is being amassed, it remains unexploited by either the school, local authorities or Welsh Government. Obtaining this anonymised individual-level transactional data would provide immense insight into what pupils purchase throughout the school day. Objectives & ApproachThe Welsh School Meals (WSM) project aimed to investigate the feasibility of using secondary school canteen transactional data to better understand what pupils purchase during the school day and its nutritional quality. Semi-structured interviews have been conducted with a representative from all cashless system providers (n=7) used in Welsh secondary schools. Next, the WSM project initially aimed to recruit nine secondary schools, with the methodological plan to: (i) liaise with head teachers; (ii) mine data; (iii) interview catering managers and head teachers; (iv) facilitate nutritional analysis; (v) conduct focus groups with pupils; and (vi) co-produce case studies. Relevance to Digital FootprintsSchool canteen transaction data is a form of digital footprint and utilising it to understand the current landscape of food and drink choices during the school day will inform Public Health policymakers and practice. ResultsDespite trying numerous strategies, the WSM project has encountered blockages which have prevented data acquisition. The four key stumbling blocks faced were: (i) identifying data providers; (ii) identifying data owners; (iii) data sharing; and, (iv) engaging stakeholders. Only the first barrier has been overcome and despite starting school recruitment in May 2023, the latter three barriers have stalled any progress. Conclusions & ImplicationsExploiting existing cashless catering system technology to collect individual-level big data from school canteen transactions has enormous potential. However, the WSM project has concluded that obtaining this data was not feasible.
{"title":"What do secondary school pupils eat at school? The barriers experienced in collecting transactional data from canteen purchases.","authors":"A.S. Gilmour, Ruth Fairchild","doi":"10.23889/ijpds.v9i4.2415","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2415","url":null,"abstract":"Introduction & BackgroundWelsh secondary schools generally use a cashless catering system and pupils pay for school food and drink via contactless cards, thumb or fingerprint biometric scanning. Helpfully, the digital footprint of school canteen purchase data already exists and is continually compiled over a limitless period. Compared to other methods of recording dietary intake (i.e., questionnaires and food diaries), utilising this transactional data is an unobtrusive method of data collection and has great potential. \u0000Although individual-level transactional data is being amassed, it remains unexploited by either the school, local authorities or Welsh Government. Obtaining this anonymised individual-level transactional data would provide immense insight into what pupils purchase throughout the school day. \u0000Objectives & ApproachThe Welsh School Meals (WSM) project aimed to investigate the feasibility of using secondary school canteen transactional data to better understand what pupils purchase during the school day and its nutritional quality. Semi-structured interviews have been conducted with a representative from all cashless system providers (n=7) used in Welsh secondary schools. Next, the WSM project initially aimed to recruit nine secondary schools, with the methodological plan to: (i) liaise with head teachers; (ii) mine data; (iii) interview catering managers and head teachers; (iv) facilitate nutritional analysis; (v) conduct focus groups with pupils; and (vi) co-produce case studies. \u0000Relevance to Digital FootprintsSchool canteen transaction data is a form of digital footprint and utilising it to understand the current landscape of food and drink choices during the school day will inform Public Health policymakers and practice. \u0000ResultsDespite trying numerous strategies, the WSM project has encountered blockages which have prevented data acquisition. The four key stumbling blocks faced were: (i) identifying data providers; (ii) identifying data owners; (iii) data sharing; and, (iv) engaging stakeholders. Only the first barrier has been overcome and despite starting school recruitment in May 2023, the latter three barriers have stalled any progress. \u0000Conclusions & ImplicationsExploiting existing cashless catering system technology to collect individual-level big data from school canteen transactions has enormous potential. However, the WSM project has concluded that obtaining this data was not feasible.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"113 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141361191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2434
Nina H Di Cara, Oliver Davis, Claire Haworth
Introduction & BackgroundTo use digital footprint data for mental health and well-being research we often need to collect concurrent, high-quality measures of ground truth. Delivering frequent surveys to participants using an ecological momentary assessment (EMA) methodology is one way to collect such data. However, existing surveys tend to be long, not focused on momentary states or rely on rating images which are not platform agnostic. Here we present a five-item test-based survey designed with participants and validated for use in EMA studies to collect data about momentary changes in mood. We describe its methodological development and how it has been used to investigate music listening on Spotify as a digital footprint of mood. Objectives & ApproachThe survey is based on the circumplex model of affect. It was co-produced with a participant advisory group (N=5), who gave feedback on the length, content and delivery of the survey. It was then piloted in a group of N=98 participants to assess statistical validity, and congruence with the 20-item Positive and Negative Affect Schedule (PANAS). Following this it was delivered in a wider sample (N=150) four times a day over a two-week period using an EMA app on participant’s phones. Relevance to Digital FootprintsEMA is an increasingly popular method for collecting ground truth to support the interpretation of digital footprint data. This newly developed and tested mood survey offers an opportunity to reduce participant burden for collecting mood data in EMA studies which will support the collection of high quality and high time-resolution ground truth for digital footprints research. ResultsTogether with participants we selected four emotions across the axes of arousal and valence, as well as rumination which participants considered important in their music listening behaviors. Factor analysis of pilot data showed that the questions represented two factors of positive and negative affect. The ratings on a 0-10 scale of the emotions ‘cheerful’ and ‘relaxed’ explained 44% of the variance in positive affect, and ratings of ‘worried’, ‘sad’ and ‘frustrated’ explained 40% of the variance in negative affect. Delivery of the questionnaire in a wider student sample (N=150) four times per day for two weeks allowed for the opportunity to assess typical response rates in a realistic EMA setting. On average participants completed 3 out of the 4 surveys a day. Conclusions & ImplicationsThe co-created, short mood survey for the collection of ground truth in digital footprint studies was validated across two independent samples, and shown to allow for good response rates in a two week study. Future testing on wider samples will provide opportunities to validate the survey and assess its effectiveness across demographic groups and different sample types.
导言与背景要将数字足迹数据用于心理健康和幸福感研究,我们通常需要同时收集高质量的基本真实测量数据。使用生态瞬时评估(EMA)方法对参与者进行频繁调查是收集此类数据的一种方法。然而,现有的调查往往时间较长,不侧重于瞬间状态,或者依赖于评级图像,而这些都与平台无关。在此,我们介绍一种基于测试的五项调查,该调查由参与者共同设计,并经过验证,可用于 EMA 研究,以收集有关情绪瞬间变化的数据。我们将介绍其方法论的发展,以及如何将其用于调查 Spotify 上的音乐聆听作为情绪的数字足迹。目标与方法该调查基于情绪的圆周模型。它是与一个参与者咨询小组(N=5)共同制作的,该小组就调查问卷的长度、内容和交付方式提供了反馈意见。然后在一组 N=98 名参与者中进行试点,以评估统计有效性以及与 20 个项目的积极和消极情绪表(PANAS)的一致性。之后,在更广泛的样本中(样本数=150),使用参与者手机上的 EMA 应用程序,在两周内每天进行四次问卷调查。与数字足迹的相关性 EMA 是一种日益流行的收集基本事实的方法,可为数字足迹数据的解释提供支持。这项新开发和测试的情绪调查为减轻 EMA 研究中收集情绪数据的参与者负担提供了机会,这将有助于为数字足迹研究收集高质量和高时间分辨率的基本事实。结果我们与参与者一起选择了四种情绪,它们横跨唤醒轴、情绪轴以及反刍轴,参与者认为这些情绪对他们的音乐聆听行为很重要。对试验数据进行的因子分析显示,这些问题代表了积极情绪和消极情绪两个因子。对 "愉快 "和 "放松 "这两种情绪的 0-10 级评分解释了 44% 的积极情绪变异,而对 "担忧"、"悲伤 "和 "沮丧 "这三种情绪的评分解释了 40% 的消极情绪变异。在更广泛的学生样本中(样本数=150),连续两周每天发放四次问卷,从而有机会在现实的 EMA 环境中评估典型的回复率。参与者平均每天完成 4 次调查中的 3 次。结论与启示这项用于收集数字足迹研究基本事实的共同制作的简短情绪调查在两个独立样本中得到了验证,并表明在为期两周的研究中响应率较高。未来将在更广泛的样本中进行测试,以验证该调查问卷,并评估其在不同人口群体和不同样本类型中的有效性。
{"title":"Development and use of a co-produced short mood survey to collect ground truth in digital footprints research","authors":"Nina H Di Cara, Oliver Davis, Claire Haworth","doi":"10.23889/ijpds.v9i4.2434","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2434","url":null,"abstract":"Introduction & BackgroundTo use digital footprint data for mental health and well-being research we often need to collect concurrent, high-quality measures of ground truth. Delivering frequent surveys to participants using an ecological momentary assessment (EMA) methodology is one way to collect such data. However, existing surveys tend to be long, not focused on momentary states or rely on rating images which are not platform agnostic. Here we present a five-item test-based survey designed with participants and validated for use in EMA studies to collect data about momentary changes in mood. We describe its methodological development and how it has been used to investigate music listening on Spotify as a digital footprint of mood. \u0000Objectives & ApproachThe survey is based on the circumplex model of affect. It was co-produced with a participant advisory group (N=5), who gave feedback on the length, content and delivery of the survey. It was then piloted in a group of N=98 participants to assess statistical validity, and congruence with the 20-item Positive and Negative Affect Schedule (PANAS). Following this it was delivered in a wider sample (N=150) four times a day over a two-week period using an EMA app on participant’s phones. \u0000Relevance to Digital FootprintsEMA is an increasingly popular method for collecting ground truth to support the interpretation of digital footprint data. This newly developed and tested mood survey offers an opportunity to reduce participant burden for collecting mood data in EMA studies which will support the collection of high quality and high time-resolution ground truth for digital footprints research. \u0000ResultsTogether with participants we selected four emotions across the axes of arousal and valence, as well as rumination which participants considered important in their music listening behaviors. Factor analysis of pilot data showed that the questions represented two factors of positive and negative affect. The ratings on a 0-10 scale of the emotions ‘cheerful’ and ‘relaxed’ explained 44% of the variance in positive affect, and ratings of ‘worried’, ‘sad’ and ‘frustrated’ explained 40% of the variance in negative affect. Delivery of the questionnaire in a wider student sample (N=150) four times per day for two weeks allowed for the opportunity to assess typical response rates in a realistic EMA setting. On average participants completed 3 out of the 4 surveys a day. \u0000Conclusions & ImplicationsThe co-created, short mood survey for the collection of ground truth in digital footprint studies was validated across two independent samples, and shown to allow for good response rates in a two week study. Future testing on wider samples will provide opportunities to validate the survey and assess its effectiveness across demographic groups and different sample types.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"117 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141361717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2425
Gregor Milligan, Georgiana Nica-Avram, John Harvey, James Goulding
Introduction & BackgroundThe ability of policymakers to positively transform food environments requires robust empirical evidence that can inform decisions. At present, there is limited data on food-insecurity in the UK that can be used to inform interventions by local authorities, due to the prohibitive costs and logistical challenges of administering longitudinal surveys. This study builds on existing research and a key pilot study developed in partnership between Olio - a food-sharing app with 7 million registered users as of 2023, the University of Nottingham and Havering Council in 2020, which resulted in the world’s first map prototype of food-insecurity. Objectives & ApproachOur approach leverages Machine Learning methods applied to unprecedented food-acquisition behavioural data and open area-level deprivation statistics to model and predict individuals' experience of food-insecurity across London. We used Olio’s extensive network of users to distribute 2,849 surveys, asking respondents across London about their experiences of food-insecurity. The survey was distributed online, adapting the US Department of Agriculture Food Security module. Respondents were asked about their experiences, including (1) eating smaller meals or skipping meals, (2) being hungry but being unable to eat, and (3) not eating for a whole day, because they could not afford food or because they could not get access to food. Using the household, rather than the individual-level version of the food insecurity module helped shed light on the experience of vulnerable groups - such as children. Relevance to Digital FootprintsThe survey responses provided a ground truth about users' experiences of destitution. Deprivation metrics and digital footprint data in the form of food-acquisition behavioural data were then used in a Random Forests Machine Learning model to predict whether households were experiencing food-insecurity, achieving high accuracy. Food-sharing data from almost 50,000 London-based users active on Olio’s platform were then used to identify relevant food-seeking behaviours and aggregate recognised instances of food-insecurity at neighbourhood (MSOA) level. Conclusions & ImplicationsTo identify and rank relevant socio-demographics and food-seeking behaviours most informative for describing food-insecurity an extensive variable selection analysis was performed. The resulting SHAP (SHapley Additive exPlanations) values showed that a combination of food solicitation and the general deprivation of an area were important predictors of food-insecurity.
{"title":"Foodinsecurity.london: Developing a food-insecurity prevalence map for London - a machine learning from food-sharing footprints","authors":"Gregor Milligan, Georgiana Nica-Avram, John Harvey, James Goulding","doi":"10.23889/ijpds.v9i4.2425","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2425","url":null,"abstract":"Introduction & BackgroundThe ability of policymakers to positively transform food environments requires robust empirical evidence that can inform decisions. At present, there is limited data on food-insecurity in the UK that can be used to inform interventions by local authorities, due to the prohibitive costs and logistical challenges of administering longitudinal surveys. This study builds on existing research and a key pilot study developed in partnership between Olio - a food-sharing app with 7 million registered users as of 2023, the University of Nottingham and Havering Council in 2020, which resulted in the world’s first map prototype of food-insecurity. \u0000Objectives & ApproachOur approach leverages Machine Learning methods applied to unprecedented food-acquisition behavioural data and open area-level deprivation statistics to model and predict individuals' experience of food-insecurity across London. We used Olio’s extensive network of users to distribute 2,849 surveys, asking respondents across London about their experiences of food-insecurity. The survey was distributed online, adapting the US Department of Agriculture Food Security module. Respondents were asked about their experiences, including (1) eating smaller meals or skipping meals, (2) being hungry but being unable to eat, and (3) not eating for a whole day, because they could not afford food or because they could not get access to food. Using the household, rather than the individual-level version of the food insecurity module helped shed light on the experience of vulnerable groups - such as children. \u0000Relevance to Digital FootprintsThe survey responses provided a ground truth about users' experiences of destitution. Deprivation metrics and digital footprint data in the form of food-acquisition behavioural data were then used in a Random Forests Machine Learning model to predict whether households were experiencing food-insecurity, achieving high accuracy. Food-sharing data from almost 50,000 London-based users active on Olio’s platform were then used to identify relevant food-seeking behaviours and aggregate recognised instances of food-insecurity at neighbourhood (MSOA) level. \u0000Conclusions & ImplicationsTo identify and rank relevant socio-demographics and food-seeking behaviours most informative for describing food-insecurity an extensive variable selection analysis was performed. The resulting SHAP (SHapley Additive exPlanations) values showed that a combination of food solicitation and the general deprivation of an area were important predictors of food-insecurity.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141366253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2419
Romana Burgess, A. Skatova, Poppy Taylor
Introduction & BackgroundShopping data is a valuable resource, offering insights into consumer behaviour and health. In recent years, there has been a growing interest in using shopping data for the purposes of health research. However, little is known about the characteristics of those who are willing to share their shopping data with researchers. Objectives & ApproachThis study aims to investigate the factors that influence individuals' decisions to consent to sharing their shopping data for research purposes. We will leverage data from a cohort study – the Avon Longitudinal Study of Parents and Children (ALSPAC) – to address this question. We will draw upon the responses of a 2018 survey, which asked ALSPAC participants about their use of supermarket loyalty cards, their perceived acceptability of sharing this data with ALSPAC, and their perspectives on potential privacy concerns. Of the 4,462 respondents, 65.4% indicated ownership of at least one major UK supermarket or store loyalty card. Among these, 88.4% expressed a potential willingness to share this data with ALSPAC for research purposes. In the present day – around 2023 – participants have explicitly either granted or withheld consent for the sharing of this data. Our analysis approach will consider factors such as biological gender, ethnicity, education, employment, socioeconomic status, and anxiety as potential moderators to the consent process. We plan to employ a mix of standard statistical methods to analyse sampling biases in the dataset, including regression modelling and correlation tests. Relevance to Digital FootprintsOur study will contribute to the growing body of literature on data linkage between cohort studies and digital footprints datasets. ResultsThe findings from this study will offer valuable insights into the factors influencing participants' consent decisions regarding data sharing. We will contribute to ongoing discussions about privacy and the ethical use of digital footprints data, informing the development of more inclusive consent processes in this field. Conclusions & ImplicationsThis study will inform consent practices in the realm of digital footprints, identifying and addressing potential barriers to consent, and contributing to the ongoing discourse on the responsible and ethical use of shopping data.
{"title":"Understanding moderators of consent regarding the sharing of supermarket shopping data in ALSPAC.","authors":"Romana Burgess, A. Skatova, Poppy Taylor","doi":"10.23889/ijpds.v9i4.2419","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2419","url":null,"abstract":"Introduction & BackgroundShopping data is a valuable resource, offering insights into consumer behaviour and health. In recent years, there has been a growing interest in using shopping data for the purposes of health research. However, little is known about the characteristics of those who are willing to share their shopping data with researchers. \u0000Objectives & ApproachThis study aims to investigate the factors that influence individuals' decisions to consent to sharing their shopping data for research purposes. We will leverage data from a cohort study – the Avon Longitudinal Study of Parents and Children (ALSPAC) – to address this question. \u0000We will draw upon the responses of a 2018 survey, which asked ALSPAC participants about their use of supermarket loyalty cards, their perceived acceptability of sharing this data with ALSPAC, and their perspectives on potential privacy concerns. Of the 4,462 respondents, 65.4% indicated ownership of at least one major UK supermarket or store loyalty card. Among these, 88.4% expressed a potential willingness to share this data with ALSPAC for research purposes. In the present day – around 2023 – participants have explicitly either granted or withheld consent for the sharing of this data. \u0000Our analysis approach will consider factors such as biological gender, ethnicity, education, employment, socioeconomic status, and anxiety as potential moderators to the consent process. We plan to employ a mix of standard statistical methods to analyse sampling biases in the dataset, including regression modelling and correlation tests. \u0000Relevance to Digital FootprintsOur study will contribute to the growing body of literature on data linkage between cohort studies and digital footprints datasets. \u0000ResultsThe findings from this study will offer valuable insights into the factors influencing participants' consent decisions regarding data sharing. We will contribute to ongoing discussions about privacy and the ethical use of digital footprints data, informing the development of more inclusive consent processes in this field. \u0000Conclusions & ImplicationsThis study will inform consent practices in the realm of digital footprints, identifying and addressing potential barriers to consent, and contributing to the ongoing discourse on the responsible and ethical use of shopping data.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"5 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141363389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2427
Bogna Liziniewicz, John Harvey, James Goulding, Liz Dowthwaite
Introduction & BackgroundDespite existing research evidence for the negative influence of loneliness on people’s wellbeing, most studies focus on the experiences of older adults and the student population. Moreover, research concerning loneliness and digital footprints uses demographic proxies, as opposed to a behavioural focus, thus providing an incomplete representation of the phenomenon’s influence on the general population. Objectives & ApproachThis project aims to use people’s digital social media data (shared by the participants from their Facebook, Twitter, or Reddit accounts) to address people’s experiences of loneliness in order to provide guidance for the design of interventions catering to the improvement of the wellbeing of individuals. Screening the participants for loneliness levels using the UCLA Loneliness Scale and looking how these experiences differ cross-sectionally (25-65-year-olds; minorities) will help understand the following: social network structures, as shaped by loneliness experience; the dynamics within one’s social network; and the linguistic content of the relationships. Using digital footprints for language modelling and thematic analysis of digital language data shared by the participants, in addition to social network analysis (mapped out based on the individuals’ digital interactions) will allow insight into digital wellbeing. A traditional approach will be utilised alongside digital data analysis to address the issue of limited social media data representativeness - relationships formed in non-digital settings, along with the associated loneliness experiences, will be included. In addition to sharing their digital footprints, the participants will be surveyed and interviewed about their everyday offline and digital experiences of loneliness; as well as their social network structures and dynamics. The interview and survey data will be analysed using thematic analysis of text data and predictive models of quantitative survey responses; in addition to social network analysis of the relationships listed during the interview. Predictions of loneliness outcomes in relation to people’s digital and offline behaviour; and correlations between loneliness experiences and social network dynamics will be made from the data. Relevance to Digital FootprintsThe project focuses on people’s experiences of loneliness in both digital and offline settings utilising the analysis of digital footprints from social media and traditional survey- and interview-based methodology. This approach will allow to gain insight into the similarities and differences between social network structures and dynamics as well as loneliness experiences in digital and real-world relationships as these inevitably interplay in everyday life. The inclusion of digital footprints data will allow to measure and predict loneliness impact on mental health and digital social behaviour; and design tailored digital wellbeing interventions in the future. Concl
{"title":"Digital footprints as means of measuring loneliness experience and embeddedness in social networks for designing digital mental health interventions","authors":"Bogna Liziniewicz, John Harvey, James Goulding, Liz Dowthwaite","doi":"10.23889/ijpds.v9i4.2427","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2427","url":null,"abstract":"Introduction & BackgroundDespite existing research evidence for the negative influence of loneliness on people’s wellbeing, most studies focus on the experiences of older adults and the student population. Moreover, research concerning loneliness and digital footprints uses demographic proxies, as opposed to a behavioural focus, thus providing an incomplete representation of the phenomenon’s influence on the general population. \u0000Objectives & ApproachThis project aims to use people’s digital social media data (shared by the participants from their Facebook, Twitter, or Reddit accounts) to address people’s experiences of loneliness in order to provide guidance for the design of interventions catering to the improvement of the wellbeing of individuals. Screening the participants for loneliness levels using the UCLA Loneliness Scale and looking how these experiences differ cross-sectionally (25-65-year-olds; minorities) will help understand the following: social network structures, as shaped by loneliness experience; the dynamics within one’s social network; and the linguistic content of the relationships. \u0000Using digital footprints for language modelling and thematic analysis of digital language data shared by the participants, in addition to social network analysis (mapped out based on the individuals’ digital interactions) will allow insight into digital wellbeing. \u0000A traditional approach will be utilised alongside digital data analysis to address the issue of limited social media data representativeness - relationships formed in non-digital settings, along with the associated loneliness experiences, will be included. In addition to sharing their digital footprints, the participants will be surveyed and interviewed about their everyday offline and digital experiences of loneliness; as well as their social network structures and dynamics. The interview and survey data will be analysed using thematic analysis of text data and predictive models of quantitative survey responses; in addition to social network analysis of the relationships listed during the interview. \u0000Predictions of loneliness outcomes in relation to people’s digital and offline behaviour; and correlations between loneliness experiences and social network dynamics will be made from the data. \u0000Relevance to Digital FootprintsThe project focuses on people’s experiences of loneliness in both digital and offline settings utilising the analysis of digital footprints from social media and traditional survey- and interview-based methodology. This approach will allow to gain insight into the similarities and differences between social network structures and dynamics as well as loneliness experiences in digital and real-world relationships as these inevitably interplay in everyday life. The inclusion of digital footprints data will allow to measure and predict loneliness impact on mental health and digital social behaviour; and design tailored digital wellbeing interventions in the future. \u0000Concl","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141365113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2429
Raphael Derecki, Brian O'Shea, James Goulding
Introduction & BackgroundRacial, gender, and sexual-orientation biases are pervasive throughout society. Importantly, modern digitally oriented datasets can elucidate important societal variables and potential solutions. One contemporary theory that attempts to explain these biases is parasite-stress: an evolutionary psychology hypothesis suggesting that increased infectious diseases rates increase out-group biases. We present preliminary findings that suggest that disease rates are a meaningful geospatial predictor of multiple biases. Objectives & ApproachWe explored biases using geospatial analyses throughout multiple datasets based on US participants: Project Implicit, American National Election Studies (ANES), Google Trends, and Twitter/X. We included state-based variables to compare between states and assess the most important environmental-level predictors of biases. We built generalised linear and linear mixed-effect models and general linear models. Within Project implicit (n > 3,000,000) and ANES datasets (n > 30,000), we assessed racial and sexual-orientation biases via explicit and implicit measures. For Google Trends and Twitter/X datasets, we assessed racial and sex-based biases via search and tweet-per-state scores. To analyse the biases, we included environmental-level variables, e.g., infectious disease rates (developed by Thornhill and Fincher in 2014), and individual-level variables, e.g., political orientation. Relevance to Digital FootprintsThese preliminary findings analyse everyday people’s online behaviour including volunteered surveys, searches and posts. We attempt to address the pressing societal issue of bias by leveraging modern datasets. Our primary goal is to aid policy makers by recommending cost-effective solutions that can improve several factors of the population’s quality of life. ResultsWe find that the most consistently significant predictor of racial bias is infectious disease rates. When leveraging Google Trends data including anti-women terminology, infectious disease rates and population density are consistent predictors of bias. Finally, we find preliminary results suggesting that increased levels of infectious diseases increases homophobic bias. Conclusions & ImplicationsOverall, we find that as infectious disease rates increase in a state, the level of racial and sexist bias significantly increases. Consistent with parasite-stress theory, we argue that focusing on reducing infectious disease rates in an area can have a plethora of benefits including improving physical and mental health and reducing biases that damage society.
{"title":"Leveraging multiple digital footprint datasets to predict racial, sex-based, and sexual-orientation bias across US states","authors":"Raphael Derecki, Brian O'Shea, James Goulding","doi":"10.23889/ijpds.v9i4.2429","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2429","url":null,"abstract":"Introduction & BackgroundRacial, gender, and sexual-orientation biases are pervasive throughout society. Importantly, modern digitally oriented datasets can elucidate important societal variables and potential solutions. One contemporary theory that attempts to explain these biases is parasite-stress: an evolutionary psychology hypothesis suggesting that increased infectious diseases rates increase out-group biases. We present preliminary findings that suggest that disease rates are a meaningful geospatial predictor of multiple biases.\u0000Objectives & ApproachWe explored biases using geospatial analyses throughout multiple datasets based on US participants: Project Implicit, American National Election Studies (ANES), Google Trends, and Twitter/X. We included state-based variables to compare between states and assess the most important environmental-level predictors of biases. We built generalised linear and linear mixed-effect models and general linear models. Within Project implicit (n > 3,000,000) and ANES datasets (n > 30,000), we assessed racial and sexual-orientation biases via explicit and implicit measures. For Google Trends and Twitter/X datasets, we assessed racial and sex-based biases via search and tweet-per-state scores. To analyse the biases, we included environmental-level variables, e.g., infectious disease rates (developed by Thornhill and Fincher in 2014), and individual-level variables, e.g., political orientation.\u0000Relevance to Digital FootprintsThese preliminary findings analyse everyday people’s online behaviour including volunteered surveys, searches and posts. We attempt to address the pressing societal issue of bias by leveraging modern datasets. Our primary goal is to aid policy makers by recommending cost-effective solutions that can improve several factors of the population’s quality of life.\u0000ResultsWe find that the most consistently significant predictor of racial bias is infectious disease rates. When leveraging Google Trends data including anti-women terminology, infectious disease rates and population density are consistent predictors of bias. Finally, we find preliminary results suggesting that increased levels of infectious diseases increases homophobic bias.\u0000Conclusions & ImplicationsOverall, we find that as infectious disease rates increase in a state, the level of racial and sexist bias significantly increases. Consistent with parasite-stress theory, we argue that focusing on reducing infectious disease rates in an area can have a plethora of benefits including improving physical and mental health and reducing biases that damage society.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 415","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141364599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2438
Tarek Al Baghal, Paulo Serôdio, Shujun Liu, Luke Sloan, C. Jessop
Introduction & BackgroundLinking social media and survey data at the individual level has the potential to add evidence to a variety of research questions. To make this data openly available to others, social media data need to be converted into useful metrics that minimise issues of disclosure while maximising utility. This research explores linkages of Twitter data and survey data in the Understanding Society Innovation panel, focusing on the usage of non-disclosive metrics created from Twitter data alongside the similarly anonymised survey data. Objectives & ApproachThe Innovation Panel asked for consent to link Twitter data to survey responses and data has been collected from the Twitter API. However, Twitter’s unstructured nature necessitates creating measures that can be used jointly with linked survey data. We have developed a framework to create social media metrics that can be combined with survey data that also remove any disclosive data, so these data can be widely shared for maximum utility. The current research analyses these data to understand what the metrics look like through presentation of descriptive statistics. We also begin to show these data may be used in combination with survey data through inclusion of a set of metrics in logistic regression models predicting attrition and measurement of mental health. Relevance to Digital FootprintsSocial media is a prevalent aspect of social life and leaves a substantial digital footprint. However, there are a number of limitations to these data, including a lack of understanding of who is producing the data, and having the ability to relate these to a variety of specific (and possibly higher quality) measures for a representative sample of the population. Linkage to surveys address these problems and can lead to new research opportunities using digital footprint data. ResultsWhile small sample sizes impact the power of some analyses, the methods developed are illustrative of ways to use this novel data source. Results show that there is high variation in the created metrics, and initial analysis shows that the inclusion of a set of user-level Twitter data is not significantly related to attrition. However, more accounts followed on Twitter and the number of user retweets are significantly related to higher levels of mental distress on the GHQ scale. Conclusions & ImplicationsOverall, there is some evidence that social media helps to understand survey outcomes, perhaps more so on measurement outcomes. This study provides an initial start on how to use these curated linked social media and survey data, and we note there are other social media networks that we can apply this strategy to; for example, LinkedIn, particularly with changes made to Twitter (X).
{"title":"Using social media metrics and linked survey data to understand survey behaviors","authors":"Tarek Al Baghal, Paulo Serôdio, Shujun Liu, Luke Sloan, C. Jessop","doi":"10.23889/ijpds.v9i4.2438","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2438","url":null,"abstract":"Introduction & BackgroundLinking social media and survey data at the individual level has the potential to add evidence to a variety of research questions. To make this data openly available to others, social media data need to be converted into useful metrics that minimise issues of disclosure while maximising utility. This research explores linkages of Twitter data and survey data in the Understanding Society Innovation panel, focusing on the usage of non-disclosive metrics created from Twitter data alongside the similarly anonymised survey data. \u0000Objectives & ApproachThe Innovation Panel asked for consent to link Twitter data to survey responses and data has been collected from the Twitter API. However, Twitter’s unstructured nature necessitates creating measures that can be used jointly with linked survey data. We have developed a framework to create social media metrics that can be combined with survey data that also remove any disclosive data, so these data can be widely shared for maximum utility. The current research analyses these data to understand what the metrics look like through presentation of descriptive statistics. We also begin to show these data may be used in combination with survey data through inclusion of a set of metrics in logistic regression models predicting attrition and measurement of mental health. \u0000Relevance to Digital FootprintsSocial media is a prevalent aspect of social life and leaves a substantial digital footprint. However, there are a number of limitations to these data, including a lack of understanding of who is producing the data, and having the ability to relate these to a variety of specific (and possibly higher quality) measures for a representative sample of the population. Linkage to surveys address these problems and can lead to new research opportunities using digital footprint data. \u0000ResultsWhile small sample sizes impact the power of some analyses, the methods developed are illustrative of ways to use this novel data source. Results show that there is high variation in the created metrics, and initial analysis shows that the inclusion of a set of user-level Twitter data is not significantly related to attrition. However, more accounts followed on Twitter and the number of user retweets are significantly related to higher levels of mental distress on the GHQ scale. \u0000Conclusions & ImplicationsOverall, there is some evidence that social media helps to understand survey outcomes, perhaps more so on measurement outcomes. This study provides an initial start on how to use these curated linked social media and survey data, and we note there are other social media networks that we can apply this strategy to; for example, LinkedIn, particularly with changes made to Twitter (X).","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 540","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141364067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IntroductionSibling dynamics play a crucial role in individual development, health and wellbeing. We established a national birth cohort using administrative health, education and social care data in England featuring clusters of mothers and their children (mothers and only-children, MoC; and mothers and siblings, MSib). MethodsFrom 13.6 million mother-baby pairs from births between April 1997 and January 2022 captured in Hospital Episode Statistics in England, we identified MoC and MSib clusters by identifying livebirths linked to the same mother. We compared only-children and children with siblings, by ethnicity, sociodemographic variables, and birth characteristics. We calculated birth intervals for children with siblings. ResultsWe identified 4,086,648 MoC and 3,957,856 MSib clusters. Compared with only-children, children with siblings were more likely to be Asian, live in more deprived areas, and have younger mothers, but were less likely to be overdue births (>=42 weeks), or to have very low birth weight (< 1500g). Children with siblings were also less likely to have been admitted to special neonatal care after birth compared to only-children. Among the MSib clusters, sibship sizes varied between 2 and 15, with a mean of 2.4 children per mother. The median birth interval was 3.0 years. ConclusionThis national cohort ECHILD-oCSib of 4.1 million MoC and 4.0 million MSib clusters in England is an important resource for investigating the effects of maternal exposures, sibling dynamics and their interplay on individual development, health and wellbeing. Potential sources of bias should be considered in analyses of these data.
{"title":"Data Resource Profile: ECHILD only-children and siblings (ECHILD-oCSib): a national cohort of linked health, education and social care data on mothers and children in England","authors":"Qi Feng, Georgina Ireland, Ruth Gilbert, Katie Harron","doi":"10.23889/ijpds.v8i6.2392","DOIUrl":"https://doi.org/10.23889/ijpds.v8i6.2392","url":null,"abstract":"IntroductionSibling dynamics play a crucial role in individual development, health and wellbeing. We established a national birth cohort using administrative health, education and social care data in England featuring clusters of mothers and their children (mothers and only-children, MoC; and mothers and siblings, MSib).\u0000MethodsFrom 13.6 million mother-baby pairs from births between April 1997 and January 2022 captured in Hospital Episode Statistics in England, we identified MoC and MSib clusters by identifying livebirths linked to the same mother. We compared only-children and children with siblings, by ethnicity, sociodemographic variables, and birth characteristics. We calculated birth intervals for children with siblings.\u0000ResultsWe identified 4,086,648 MoC and 3,957,856 MSib clusters. Compared with only-children, children with siblings were more likely to be Asian, live in more deprived areas, and have younger mothers, but were less likely to be overdue births (>=42 weeks), or to have very low birth weight (< 1500g). Children with siblings were also less likely to have been admitted to special neonatal care after birth compared to only-children. Among the MSib clusters, sibship sizes varied between 2 and 15, with a mean of 2.4 children per mother. The median birth interval was 3.0 years.\u0000ConclusionThis national cohort ECHILD-oCSib of 4.1 million MoC and 4.0 million MSib clusters in England is an important resource for investigating the effects of maternal exposures, sibling dynamics and their interplay on individual development, health and wellbeing. Potential sources of bias should be considered in analyses of these data.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"226 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141376179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-12DOI: 10.23889/ijpds.v9i1.2181
Rachel Burns, Sacha Wyke, Y. Boukari, Sirinivasa Vittal Katikireddi, D. Zenner, I. Campos-Matos, Katie Harron, Robert Aldridge
IntroductionDifficulties ascertaining migrant status in national data sources such as hospital records have limited large-scale evaluation of migrant healthcare needs in many countries, including England. Linkage of immigration data for migrants and refugees, with National Health Service (NHS) hospital care data enables research into the relationship between migration and health for a large cohort of international migrants. ObjectivesWe aimed to describe the linkage process and compare linkage rates between migrant sub-groups to evaluate for potential bias for data on non-EU migrants and resettled refugees linked to Hospital Episode Statistics (HES) in England. MethodsWe used stepwise deterministic linkage to match records from migrants and refugees to a unique healthcare identifier indicating interaction with the NHS (linkage stage 1 to NHS Personal Demographic Services, PDS), and then to hospital records (linkage stage 2 to HES). We calculated linkage rates and compared linked and unlinked migrant characteristics for each linkage stage. ResultsOf the 1,799,307 unique migrant records, 1,134,007 (63%) linked to PDS and 451,689 (25%) linked to at least one hospital record between 01/01/2005 and 23/03/2020. Individuals on work, student, or working holiday visas were less likely to link to a hospital record than those on settlement and dependent visas and refugees. Migrants from the Middle East and North Africa and South Asia were four times more likely to link to at least one hospital record, compared to those from East Asia and the Pacific. Differences in age, sex, visa type, and region of origin between linked and unlinked samples were small to moderate. ConclusionThis linked dataset represents a unique opportunity to explore healthcare use in migrants. However, lower linkage rates disproportionately affected individuals on shorter-term visas so future studies of these groups may be more biased as a result. Increasing the quality and completeness of identifiers recorded in administrative data could improve data linkage quality.
{"title":"Linking migration and hospital data in England: linkage process and evaluation of bias","authors":"Rachel Burns, Sacha Wyke, Y. Boukari, Sirinivasa Vittal Katikireddi, D. Zenner, I. Campos-Matos, Katie Harron, Robert Aldridge","doi":"10.23889/ijpds.v9i1.2181","DOIUrl":"https://doi.org/10.23889/ijpds.v9i1.2181","url":null,"abstract":"IntroductionDifficulties ascertaining migrant status in national data sources such as hospital records have limited large-scale evaluation of migrant healthcare needs in many countries, including England. Linkage of immigration data for migrants and refugees, with National Health Service (NHS) hospital care data enables research into the relationship between migration and health for a large cohort of international migrants.\u0000ObjectivesWe aimed to describe the linkage process and compare linkage rates between migrant sub-groups to evaluate for potential bias for data on non-EU migrants and resettled refugees linked to Hospital Episode Statistics (HES) in England.\u0000MethodsWe used stepwise deterministic linkage to match records from migrants and refugees to a unique healthcare identifier indicating interaction with the NHS (linkage stage 1 to NHS Personal Demographic Services, PDS), and then to hospital records (linkage stage 2 to HES). We calculated linkage rates and compared linked and unlinked migrant characteristics for each linkage stage.\u0000ResultsOf the 1,799,307 unique migrant records, 1,134,007 (63%) linked to PDS and 451,689 (25%) linked to at least one hospital record between 01/01/2005 and 23/03/2020. Individuals on work, student, or working holiday visas were less likely to link to a hospital record than those on settlement and dependent visas and refugees. Migrants from the Middle East and North Africa and South Asia were four times more likely to link to at least one hospital record, compared to those from East Asia and the Pacific. Differences in age, sex, visa type, and region of origin between linked and unlinked samples were small to moderate.\u0000ConclusionThis linked dataset represents a unique opportunity to explore healthcare use in migrants. However, lower linkage rates disproportionately affected individuals on shorter-term visas so future studies of these groups may be more biased as a result. Increasing the quality and completeness of identifiers recorded in administrative data could improve data linkage quality.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"103 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139842095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}