首页 > 最新文献

International Journal of Population Data Science最新文献

英文 中文
Challenges in access, representativeness, and bias in smart financial data relating to income volatility and economic insecurity. 与收入波动和经济不安全有关的智能财务数据在获取、代表性和偏差方面存在挑战。
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2436
Nathan Bourne, Michael Spencer, Oliver Berry
Introduction & BackgroundFinancial transaction data are highly valuable sources of digital footprints data for behavioural and economic research, but to properly create impact we must closely consider their limitations. Financial institutions hold a wealth of consumer data with untapped potential for community intelligence. These datasets combine excellent coverage with extremely granular information on consumer finances, income and spending, yet these institutions face great challenges in leveraging this data for social good. Smart Data Foundry is a university-owned, non-profit organisation that facilitates safe access to these datasets for researchers and provides insights to enable government bodies to tackle today's major challenges including the cost-of-living crisis and climate change. Objectives & ApproachWe will explore the opportunities afforded by these datasets for social and economic research. For example, using pseudonymised individual consumer banking data from NatWest Group, we have developed metrics for understanding income volatility and economic insecurity in collaboration with the Joseph Rowntree Foundation. We can also use these data to study consumer spending patterns and responses to economic changes such as interest rate rises and the net zero transition. We will assess the limitations of the data including issues of representativeness, bias, and missing data, and describe methods and mitigations to account for these challenges. We also discuss the barriers to accessing this type of data, in both relationship development with data partners, and privacy and governance concerns. Relevance to Digital FootprintsIndividual level customer transaction data provides a rich and novel form of digital footprint for behavioural and economic analyses. Every point of income or expenditure is recorded in a uniquely valuable digital footprint by financial institutions. These can provide a variety of insights, such as responses to macroeconomic shocks across demographic sets, emerging areas of financial distress, and help us better understand the drivers and risks of financial vulnerability. In both its aggregated and individual form, the data can provide an additional layer of understanding for trends we may see in other data, such as health or administrative data. Conclusions & ImplicationsHaving addressed the challenges of data access and data quality, we demonstrate that consumer banking data is an incredibly valuable form of digital footprints data, capturing key information on consumer behaviour. We conclude with a call for further research to develop use cases of this data for social good.
导言与背景金融交易数据是行为和经济研究中极具价值的数字足迹数据来源,但要产生适当的影响,我们必须仔细考虑其局限性。金融机构拥有丰富的消费者数据,这些数据在社区情报方面具有尚未开发的潜力。这些数据集结合了极好的覆盖面和有关消费者财务、收入和支出的极为细化的信息,但这些机构在利用这些数据为社会造福方面却面临着巨大的挑战。智能数据基金会(Smart Data Foundry)是一家由大学拥有的非营利性组织,它为研究人员安全访问这些数据集提供便利,并为政府机构应对生活成本危机和气候变化等当今重大挑战提供见解。目标和方法我们将探索这些数据集为社会和经济研究带来的机遇。例如,利用 NatWest 集团提供的化名个人消费者银行数据,我们与约瑟夫-罗特里基金会合作开发了用于了解收入波动性和经济不安全性的指标。我们还可以利用这些数据研究消费者的消费模式以及对利率上升和净零过渡等经济变化的反应。我们将评估数据的局限性,包括代表性、偏差和数据缺失等问题,并介绍应对这些挑战的方法和缓解措施。我们还将讨论获取此类数据的障碍,包括与数据合作伙伴的关系发展以及隐私和管理问题。与数字足迹的相关性个人层面的客户交易数据为行为和经济分析提供了丰富而新颖的数字足迹形式。每个收入或支出点都被金融机构记录在独一无二的宝贵数字足迹中。这些数据可以提供各种见解,如不同人口群体对宏观经济冲击的反应、新出现的金融困境领域,并帮助我们更好地了解金融脆弱性的驱动因素和风险。无论是汇总数据还是个体数据,这些数据都能为我们了解其他数据(如健康或行政数据)中的趋势提供额外的视角。结论与启示在解决了数据访问和数据质量的难题之后,我们证明了消费者银行数据是一种非常有价值的数字足迹数据形式,可以捕捉到消费者行为的关键信息。最后,我们呼吁开展进一步的研究,开发此类数据的社会公益用例。
{"title":"Challenges in access, representativeness, and bias in smart financial data relating to income volatility and economic insecurity.","authors":"Nathan Bourne, Michael Spencer, Oliver Berry","doi":"10.23889/ijpds.v9i4.2436","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2436","url":null,"abstract":"Introduction & BackgroundFinancial transaction data are highly valuable sources of digital footprints data for behavioural and economic research, but to properly create impact we must closely consider their limitations. \u0000Financial institutions hold a wealth of consumer data with untapped potential for community intelligence. These datasets combine excellent coverage with extremely granular information on consumer finances, income and spending, yet these institutions face great challenges in leveraging this data for social good. Smart Data Foundry is a university-owned, non-profit organisation that facilitates safe access to these datasets for researchers and provides insights to enable government bodies to tackle today's major challenges including the cost-of-living crisis and climate change. \u0000Objectives & ApproachWe will explore the opportunities afforded by these datasets for social and economic research. For example, using pseudonymised individual consumer banking data from NatWest Group, we have developed metrics for understanding income volatility and economic insecurity in collaboration with the Joseph Rowntree Foundation. We can also use these data to study consumer spending patterns and responses to economic changes such as interest rate rises and the net zero transition. We will assess the limitations of the data including issues of representativeness, bias, and missing data, and describe methods and mitigations to account for these challenges. We also discuss the barriers to accessing this type of data, in both relationship development with data partners, and privacy and governance concerns. \u0000Relevance to Digital FootprintsIndividual level customer transaction data provides a rich and novel form of digital footprint for behavioural and economic analyses. Every point of income or expenditure is recorded in a uniquely valuable digital footprint by financial institutions. These can provide a variety of insights, such as responses to macroeconomic shocks across demographic sets, emerging areas of financial distress, and help us better understand the drivers and risks of financial vulnerability. In both its aggregated and individual form, the data can provide an additional layer of understanding for trends we may see in other data, such as health or administrative data. \u0000Conclusions & ImplicationsHaving addressed the challenges of data access and data quality, we demonstrate that consumer banking data is an incredibly valuable form of digital footprints data, capturing key information on consumer behaviour. We conclude with a call for further research to develop use cases of this data for social good.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"122 49","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141361623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What do secondary school pupils eat at school? The barriers experienced in collecting transactional data from canteen purchases. 中学生在学校吃什么?收集食堂采购交易数据时遇到的障碍。
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2415
A.S. Gilmour, Ruth Fairchild
Introduction & BackgroundWelsh secondary schools generally use a cashless catering system and pupils pay for school food and drink via contactless cards, thumb or fingerprint biometric scanning. Helpfully, the digital footprint of school canteen purchase data already exists and is continually compiled over a limitless period. Compared to other methods of recording dietary intake (i.e., questionnaires and food diaries), utilising this transactional data is an unobtrusive method of data collection and has great potential. Although individual-level transactional data is being amassed, it remains unexploited by either the school, local authorities or Welsh Government. Obtaining this anonymised individual-level transactional data would provide immense insight into what pupils purchase throughout the school day. Objectives & ApproachThe Welsh School Meals (WSM) project aimed to investigate the feasibility of using secondary school canteen transactional data to better understand what pupils purchase during the school day and its nutritional quality. Semi-structured interviews have been conducted with a representative from all cashless system providers (n=7) used in Welsh secondary schools. Next, the WSM project initially aimed to recruit nine secondary schools, with the methodological plan to: (i) liaise with head teachers; (ii) mine data; (iii) interview catering managers and head teachers; (iv) facilitate nutritional analysis; (v) conduct focus groups with pupils; and (vi) co-produce case studies. Relevance to Digital FootprintsSchool canteen transaction data is a form of digital footprint and utilising it to understand the current landscape of food and drink choices during the school day will inform Public Health policymakers and practice. ResultsDespite trying numerous strategies, the WSM project has encountered blockages which have prevented data acquisition. The four key stumbling blocks faced were: (i) identifying data providers; (ii) identifying data owners; (iii) data sharing; and, (iv) engaging stakeholders. Only the first barrier has been overcome and despite starting school recruitment in May 2023, the latter three barriers have stalled any progress. Conclusions & ImplicationsExploiting existing cashless catering system technology to collect individual-level big data from school canteen transactions has enormous potential. However, the WSM project has concluded that obtaining this data was not feasible.
简介与背景威尔士中学普遍采用无现金餐饮系统,学生通过非接触式卡、拇指或指纹生物识别扫描来支付学校餐饮费用。有益的是,学校食堂购买数据的数字足迹已经存在,并在无限的时间内不断被编制。与其他记录饮食摄入量的方法(如问卷调查和食物日记)相比,利用这种交易数据是一种不显眼的数据收集方法,具有很大的潜力。尽管个人层面的交易数据正在积累中,但学校、地方当局或威尔士政府仍未加以利用。获取这些匿名的个人交易数据将有助于深入了解学生在校期间购买的物品。目标和方法威尔士学校膳食 (WSM) 项目旨在调查使用中学食堂交易数据的可行性,以更好地了解学生在校期间购买的食物及其营养质量。我们对威尔士中学使用的所有无现金系统供应商(n=7)的代表进行了半结构式访谈。接下来,WSM 项目最初的目标是招募九所中学,方法计划是(i) 与校长联系;(ii) 挖掘数据;(iii) 采访餐饮经理和校长;(iv) 促进营养分析;(v) 与学生进行焦点小组讨论;(vi) 共同编制案例研究。与数字足迹的相关性学校食堂交易数据是数字足迹的一种形式,利用这些数据来了解学生在校期间饮食选择的现状,将为公共卫生决策者和实践提供信息。结果尽管尝试了多种策略,WSM 项目还是遇到了阻碍数据采集的障碍。面临的四个主要障碍是(i) 确定数据提供者;(ii) 确定数据所有者;(iii) 数据共享;(iv) 吸引利益相关者参与。目前只克服了第一个障碍,尽管学校招聘工作已于 2023 年 5 月开始,但后三个障碍阻碍了任何进展。结论与启示利用现有的无现金餐饮系统技术从学校食堂交易中收集个人层面的大数据具有巨大的潜力。然而,WSM 项目得出的结论是,获取这些数据并不可行。
{"title":"What do secondary school pupils eat at school? The barriers experienced in collecting transactional data from canteen purchases.","authors":"A.S. Gilmour, Ruth Fairchild","doi":"10.23889/ijpds.v9i4.2415","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2415","url":null,"abstract":"Introduction & BackgroundWelsh secondary schools generally use a cashless catering system and pupils pay for school food and drink via contactless cards, thumb or fingerprint biometric scanning. Helpfully, the digital footprint of school canteen purchase data already exists and is continually compiled over a limitless period. Compared to other methods of recording dietary intake (i.e., questionnaires and food diaries), utilising this transactional data is an unobtrusive method of data collection and has great potential. \u0000Although individual-level transactional data is being amassed, it remains unexploited by either the school, local authorities or Welsh Government. Obtaining this anonymised individual-level transactional data would provide immense insight into what pupils purchase throughout the school day. \u0000Objectives & ApproachThe Welsh School Meals (WSM) project aimed to investigate the feasibility of using secondary school canteen transactional data to better understand what pupils purchase during the school day and its nutritional quality. Semi-structured interviews have been conducted with a representative from all cashless system providers (n=7) used in Welsh secondary schools. Next, the WSM project initially aimed to recruit nine secondary schools, with the methodological plan to: (i) liaise with head teachers; (ii) mine data; (iii) interview catering managers and head teachers; (iv) facilitate nutritional analysis; (v) conduct focus groups with pupils; and (vi) co-produce case studies. \u0000Relevance to Digital FootprintsSchool canteen transaction data is a form of digital footprint and utilising it to understand the current landscape of food and drink choices during the school day will inform Public Health policymakers and practice. \u0000ResultsDespite trying numerous strategies, the WSM project has encountered blockages which have prevented data acquisition. The four key stumbling blocks faced were: (i) identifying data providers; (ii) identifying data owners; (iii) data sharing; and, (iv) engaging stakeholders. Only the first barrier has been overcome and despite starting school recruitment in May 2023, the latter three barriers have stalled any progress. \u0000Conclusions & ImplicationsExploiting existing cashless catering system technology to collect individual-level big data from school canteen transactions has enormous potential. However, the WSM project has concluded that obtaining this data was not feasible.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"113 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141361191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and use of a co-produced short mood survey to collect ground truth in digital footprints research 开发和使用共同制作的简短情绪调查,以收集数字足迹研究的基本真相
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2434
Nina H Di Cara, Oliver Davis, Claire Haworth
Introduction & BackgroundTo use digital footprint data for mental health and well-being research we often need to collect concurrent, high-quality measures of ground truth. Delivering frequent surveys to participants using an ecological momentary assessment (EMA) methodology is one way to collect such data. However, existing surveys tend to be long, not focused on momentary states or rely on rating images which are not platform agnostic. Here we present a five-item test-based survey designed with participants and validated for use in EMA studies to collect data about momentary changes in mood. We describe its methodological development and how it has been used to investigate music listening on Spotify as a digital footprint of mood. Objectives & ApproachThe survey is based on the circumplex model of affect. It was co-produced with a participant advisory group (N=5), who gave feedback on the length, content and delivery of the survey. It was then piloted in a group of N=98 participants to assess statistical validity, and congruence with the 20-item Positive and Negative Affect Schedule (PANAS). Following this it was delivered in a wider sample (N=150) four times a day over a two-week period using an EMA app on participant’s phones. Relevance to Digital FootprintsEMA is an increasingly popular method for collecting ground truth to support the interpretation of digital footprint data. This newly developed and tested mood survey offers an opportunity to reduce participant burden for collecting mood data in EMA studies which will support the collection of high quality and high time-resolution ground truth for digital footprints research. ResultsTogether with participants we selected four emotions across the axes of arousal and valence, as well as rumination which participants considered important in their music listening behaviors. Factor analysis of pilot data showed that the questions represented two factors of positive and negative affect. The ratings on a 0-10 scale of the emotions ‘cheerful’ and ‘relaxed’ explained 44% of the variance in positive affect, and ratings of ‘worried’, ‘sad’ and ‘frustrated’ explained 40% of the variance in negative affect. Delivery of the questionnaire in a wider student sample (N=150) four times per day for two weeks allowed for the opportunity to assess typical response rates in a realistic EMA setting. On average participants completed 3 out of the 4 surveys a day. Conclusions & ImplicationsThe co-created, short mood survey for the collection of ground truth in digital footprint studies was validated across two independent samples, and shown to allow for good response rates in a two week study. Future testing on wider samples will provide opportunities to validate the survey and assess its effectiveness across demographic groups and different sample types.
导言与背景要将数字足迹数据用于心理健康和幸福感研究,我们通常需要同时收集高质量的基本真实测量数据。使用生态瞬时评估(EMA)方法对参与者进行频繁调查是收集此类数据的一种方法。然而,现有的调查往往时间较长,不侧重于瞬间状态,或者依赖于评级图像,而这些都与平台无关。在此,我们介绍一种基于测试的五项调查,该调查由参与者共同设计,并经过验证,可用于 EMA 研究,以收集有关情绪瞬间变化的数据。我们将介绍其方法论的发展,以及如何将其用于调查 Spotify 上的音乐聆听作为情绪的数字足迹。目标与方法该调查基于情绪的圆周模型。它是与一个参与者咨询小组(N=5)共同制作的,该小组就调查问卷的长度、内容和交付方式提供了反馈意见。然后在一组 N=98 名参与者中进行试点,以评估统计有效性以及与 20 个项目的积极和消极情绪表(PANAS)的一致性。之后,在更广泛的样本中(样本数=150),使用参与者手机上的 EMA 应用程序,在两周内每天进行四次问卷调查。与数字足迹的相关性 EMA 是一种日益流行的收集基本事实的方法,可为数字足迹数据的解释提供支持。这项新开发和测试的情绪调查为减轻 EMA 研究中收集情绪数据的参与者负担提供了机会,这将有助于为数字足迹研究收集高质量和高时间分辨率的基本事实。结果我们与参与者一起选择了四种情绪,它们横跨唤醒轴、情绪轴以及反刍轴,参与者认为这些情绪对他们的音乐聆听行为很重要。对试验数据进行的因子分析显示,这些问题代表了积极情绪和消极情绪两个因子。对 "愉快 "和 "放松 "这两种情绪的 0-10 级评分解释了 44% 的积极情绪变异,而对 "担忧"、"悲伤 "和 "沮丧 "这三种情绪的评分解释了 40% 的消极情绪变异。在更广泛的学生样本中(样本数=150),连续两周每天发放四次问卷,从而有机会在现实的 EMA 环境中评估典型的回复率。参与者平均每天完成 4 次调查中的 3 次。结论与启示这项用于收集数字足迹研究基本事实的共同制作的简短情绪调查在两个独立样本中得到了验证,并表明在为期两周的研究中响应率较高。未来将在更广泛的样本中进行测试,以验证该调查问卷,并评估其在不同人口群体和不同样本类型中的有效性。
{"title":"Development and use of a co-produced short mood survey to collect ground truth in digital footprints research","authors":"Nina H Di Cara, Oliver Davis, Claire Haworth","doi":"10.23889/ijpds.v9i4.2434","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2434","url":null,"abstract":"Introduction & BackgroundTo use digital footprint data for mental health and well-being research we often need to collect concurrent, high-quality measures of ground truth. Delivering frequent surveys to participants using an ecological momentary assessment (EMA) methodology is one way to collect such data. However, existing surveys tend to be long, not focused on momentary states or rely on rating images which are not platform agnostic. Here we present a five-item test-based survey designed with participants and validated for use in EMA studies to collect data about momentary changes in mood. We describe its methodological development and how it has been used to investigate music listening on Spotify as a digital footprint of mood. \u0000Objectives & ApproachThe survey is based on the circumplex model of affect. It was co-produced with a participant advisory group (N=5), who gave feedback on the length, content and delivery of the survey. It was then piloted in a group of N=98 participants to assess statistical validity, and congruence with the 20-item Positive and Negative Affect Schedule (PANAS). Following this it was delivered in a wider sample (N=150) four times a day over a two-week period using an EMA app on participant’s phones. \u0000Relevance to Digital FootprintsEMA is an increasingly popular method for collecting ground truth to support the interpretation of digital footprint data. This newly developed and tested mood survey offers an opportunity to reduce participant burden for collecting mood data in EMA studies which will support the collection of high quality and high time-resolution ground truth for digital footprints research. \u0000ResultsTogether with participants we selected four emotions across the axes of arousal and valence, as well as rumination which participants considered important in their music listening behaviors. Factor analysis of pilot data showed that the questions represented two factors of positive and negative affect. The ratings on a 0-10 scale of the emotions ‘cheerful’ and ‘relaxed’ explained 44% of the variance in positive affect, and ratings of ‘worried’, ‘sad’ and ‘frustrated’ explained 40% of the variance in negative affect. Delivery of the questionnaire in a wider student sample (N=150) four times per day for two weeks allowed for the opportunity to assess typical response rates in a realistic EMA setting. On average participants completed 3 out of the 4 surveys a day. \u0000Conclusions & ImplicationsThe co-created, short mood survey for the collection of ground truth in digital footprint studies was validated across two independent samples, and shown to allow for good response rates in a two week study. Future testing on wider samples will provide opportunities to validate the survey and assess its effectiveness across demographic groups and different sample types.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"117 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141361717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foodinsecurity.london: Developing a food-insecurity prevalence map for London - a machine learning from food-sharing footprints Foodinsecurity.london:绘制伦敦粮食不安全流行图--从食物共享足迹中进行机器学习
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2425
Gregor Milligan, Georgiana Nica-Avram, John Harvey, James Goulding
Introduction & BackgroundThe ability of policymakers to positively transform food environments requires robust empirical evidence that can inform decisions. At present, there is limited data on food-insecurity in the UK that can be used to inform interventions by local authorities, due to the prohibitive costs and logistical challenges of administering longitudinal surveys. This study builds on existing research and a key pilot study developed in partnership between Olio - a food-sharing app with 7 million registered users as of 2023, the University of Nottingham and Havering Council in 2020, which resulted in the world’s first map prototype of food-insecurity. Objectives & ApproachOur approach leverages Machine Learning methods applied to unprecedented food-acquisition behavioural data and open area-level deprivation statistics to model and predict individuals' experience of food-insecurity across London. We used Olio’s extensive network of users to distribute 2,849 surveys, asking respondents across London about their experiences of food-insecurity. The survey was distributed online, adapting the US Department of Agriculture Food Security module. Respondents were asked about their experiences, including (1) eating smaller meals or skipping meals, (2) being hungry but being unable to eat, and (3) not eating for a whole day, because they could not afford food or because they could not get access to food. Using the household, rather than the individual-level version of the food insecurity module helped shed light on the experience of vulnerable groups - such as children. Relevance to Digital FootprintsThe survey responses provided a ground truth about users' experiences of destitution. Deprivation metrics and digital footprint data in the form of food-acquisition behavioural data were then used in a Random Forests Machine Learning model to predict whether households were experiencing food-insecurity, achieving high accuracy. Food-sharing data from almost 50,000 London-based users active on Olio’s platform were then used to identify relevant food-seeking behaviours and aggregate recognised instances of food-insecurity at neighbourhood (MSOA) level. Conclusions & ImplicationsTo identify and rank relevant socio-demographics and food-seeking behaviours most informative for describing food-insecurity an extensive variable selection analysis was performed. The resulting SHAP (SHapley Additive exPlanations) values showed that a combination of food solicitation and the general deprivation of an area were important predictors of food-insecurity.
导言与背景政策制定者要想积极改变食品环境,就必须有可靠的经验证据为决策提供依据。目前,由于实施纵向调查的成本过高和后勤方面的挑战,英国可用于为地方当局的干预措施提供信息的粮食不安全数据非常有限。本研究建立在现有研究和一项重要试点研究的基础上,该试点研究由 Olio(截至 2023 年拥有 700 万注册用户的食品共享应用程序)、诺丁汉大学和哈弗林议会于 2020 年合作开发,该研究产生了世界上首个粮食不安全地图原型。目标与方法我们的方法利用机器学习方法,将其应用于前所未有的食物获取行为数据和开放的地区级贫困统计数据,以建立模型并预测伦敦各地个人的食物不安全经历。我们利用 Olio 广泛的用户网络分发了 2849 份调查问卷,向伦敦各地的受访者询问他们的粮食不安全经历。调查采用美国农业部的食品安全模块进行在线分发。受访者被问及他们的经历,包括(1)少食多餐或不吃饭,(2)饥饿但无法进食,以及(3)因买不起食物或无法获得食物而一整天不吃饭。使用家庭而非个人层面的粮食不安全模块有助于了解弱势群体(如儿童)的经历。与数字足迹的相关性调查反馈提供了用户赤贫经历的基本事实。然后,以食物获取行为数据为形式的贫困度量和数字足迹数据被用于随机森林机器学习模型,以预测家庭是否面临粮食不安全问题,准确率很高。然后,利用活跃在 Olio 平台上的近 50,000 名伦敦用户的食物分享数据来识别相关的食物寻求行为,并汇总邻里(MSOA)层面公认的食物无保障情况。结论与启示为了识别和排列最能说明粮食不安全问题的相关社会人口统计数据和寻求食物行为,我们进行了广泛的变量选择分析。分析得出的 SHAP(SHapley Additive exPlanations)值显示,食物索取行为和一个地区的总体贫困程度是预测食物不安全的重要因素。
{"title":"Foodinsecurity.london: Developing a food-insecurity prevalence map for London - a machine learning from food-sharing footprints","authors":"Gregor Milligan, Georgiana Nica-Avram, John Harvey, James Goulding","doi":"10.23889/ijpds.v9i4.2425","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2425","url":null,"abstract":"Introduction & BackgroundThe ability of policymakers to positively transform food environments requires robust empirical evidence that can inform decisions. At present, there is limited data on food-insecurity in the UK that can be used to inform interventions by local authorities, due to the prohibitive costs and logistical challenges of administering longitudinal surveys. This study builds on existing research and a key pilot study developed in partnership between Olio - a food-sharing app with 7 million registered users as of 2023, the University of Nottingham and Havering Council in 2020, which resulted in the world’s first map prototype of food-insecurity. \u0000Objectives & ApproachOur approach leverages Machine Learning methods applied to unprecedented food-acquisition behavioural data and open area-level deprivation statistics to model and predict individuals' experience of food-insecurity across London. We used Olio’s extensive network of users to distribute 2,849 surveys, asking respondents across London about their experiences of food-insecurity. The survey was distributed online, adapting the US Department of Agriculture Food Security module. Respondents were asked about their experiences, including (1) eating smaller meals or skipping meals, (2) being hungry but being unable to eat, and (3) not eating for a whole day, because they could not afford food or because they could not get access to food. Using the household, rather than the individual-level version of the food insecurity module helped shed light on the experience of vulnerable groups - such as children. \u0000Relevance to Digital FootprintsThe survey responses provided a ground truth about users' experiences of destitution. Deprivation metrics and digital footprint data in the form of food-acquisition behavioural data were then used in a Random Forests Machine Learning model to predict whether households were experiencing food-insecurity, achieving high accuracy. Food-sharing data from almost 50,000 London-based users active on Olio’s platform were then used to identify relevant food-seeking behaviours and aggregate recognised instances of food-insecurity at neighbourhood (MSOA) level. \u0000Conclusions & ImplicationsTo identify and rank relevant socio-demographics and food-seeking behaviours most informative for describing food-insecurity an extensive variable selection analysis was performed. The resulting SHAP (SHapley Additive exPlanations) values showed that a combination of food solicitation and the general deprivation of an area were important predictors of food-insecurity.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141366253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding moderators of consent regarding the sharing of supermarket shopping data in ALSPAC. 了解 ALSPAC 中同意共享超市购物数据的调节因素。
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2419
Romana Burgess, A. Skatova, Poppy Taylor
Introduction & BackgroundShopping data is a valuable resource, offering insights into consumer behaviour and health. In recent years, there has been a growing interest in using shopping data for the purposes of health research. However, little is known about the characteristics of those who are willing to share their shopping data with researchers. Objectives & ApproachThis study aims to investigate the factors that influence individuals' decisions to consent to sharing their shopping data for research purposes. We will leverage data from a cohort study – the Avon Longitudinal Study of Parents and Children (ALSPAC) – to address this question. We will draw upon the responses of a 2018 survey, which asked ALSPAC participants about their use of supermarket loyalty cards, their perceived acceptability of sharing this data with ALSPAC, and their perspectives on potential privacy concerns. Of the 4,462 respondents, 65.4% indicated ownership of at least one major UK supermarket or store loyalty card. Among these, 88.4% expressed a potential willingness to share this data with ALSPAC for research purposes. In the present day – around 2023 – participants have explicitly either granted or withheld consent for the sharing of this data. Our analysis approach will consider factors such as biological gender, ethnicity, education, employment, socioeconomic status, and anxiety as potential moderators to the consent process. We plan to employ a mix of standard statistical methods to analyse sampling biases in the dataset, including regression modelling and correlation tests. Relevance to Digital FootprintsOur study will contribute to the growing body of literature on data linkage between cohort studies and digital footprints datasets. ResultsThe findings from this study will offer valuable insights into the factors influencing participants' consent decisions regarding data sharing. We will contribute to ongoing discussions about privacy and the ethical use of digital footprints data, informing the development of more inclusive consent processes in this field. Conclusions & ImplicationsThis study will inform consent practices in the realm of digital footprints, identifying and addressing potential barriers to consent, and contributing to the ongoing discourse on the responsible and ethical use of shopping data.
导言与背景购物数据是一种宝贵的资源,可为消费者行为和健康提供洞察力。近年来,人们对将购物数据用于健康研究的兴趣与日俱增。然而,人们对愿意与研究人员分享购物数据者的特征知之甚少。目标与方法本研究旨在调查影响个人决定是否同意为研究目的共享其购物数据的因素。我们将利用一项队列研究--雅芳父母与子女纵向研究 (ALSPAC) --的数据来解决这个问题。我们将借鉴 2018 年一项调查的答复,该调查询问了 ALSPAC 参与者对超市会员卡的使用情况、他们对与 ALSPAC 共享这些数据的接受程度以及他们对潜在隐私问题的看法。在 4462 名受访者中,65.4% 表示至少拥有一张英国大型超市或商店会员卡。其中,88.4%的受访者表示愿意与 ALSPAC 分享这些数据用于研究目的。在今天,即 2023 年左右,参与者明确表示同意或不同意共享这些数据。我们的分析方法将考虑生理性别、种族、教育、就业、社会经济地位和焦虑等因素,将其作为同意过程的潜在调节因素。我们计划采用多种标准统计方法来分析数据集中的抽样偏差,包括回归模型和相关性测试。与数字足迹的相关性我们的研究将为越来越多的关于队列研究与数字足迹数据集之间数据关联的文献做出贡献。结果本研究的结果将为了解影响参与者就数据共享做出同意决定的因素提供有价值的见解。我们将为正在进行的有关隐私和数字足迹数据使用道德的讨论做出贡献,并为该领域制定更具包容性的同意程序提供信息。结论与启示本研究将为数字足迹领域的同意实践提供信息,识别并解决潜在的同意障碍,并为目前有关负责任地、合乎道德地使用购物数据的讨论做出贡献。
{"title":"Understanding moderators of consent regarding the sharing of supermarket shopping data in ALSPAC.","authors":"Romana Burgess, A. Skatova, Poppy Taylor","doi":"10.23889/ijpds.v9i4.2419","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2419","url":null,"abstract":"Introduction & BackgroundShopping data is a valuable resource, offering insights into consumer behaviour and health. In recent years, there has been a growing interest in using shopping data for the purposes of health research. However, little is known about the characteristics of those who are willing to share their shopping data with researchers. \u0000Objectives & ApproachThis study aims to investigate the factors that influence individuals' decisions to consent to sharing their shopping data for research purposes. We will leverage data from a cohort study – the Avon Longitudinal Study of Parents and Children (ALSPAC) – to address this question. \u0000We will draw upon the responses of a 2018 survey, which asked ALSPAC participants about their use of supermarket loyalty cards, their perceived acceptability of sharing this data with ALSPAC, and their perspectives on potential privacy concerns. Of the 4,462 respondents, 65.4% indicated ownership of at least one major UK supermarket or store loyalty card. Among these, 88.4% expressed a potential willingness to share this data with ALSPAC for research purposes. In the present day – around 2023 – participants have explicitly either granted or withheld consent for the sharing of this data. \u0000Our analysis approach will consider factors such as biological gender, ethnicity, education, employment, socioeconomic status, and anxiety as potential moderators to the consent process. We plan to employ a mix of standard statistical methods to analyse sampling biases in the dataset, including regression modelling and correlation tests. \u0000Relevance to Digital FootprintsOur study will contribute to the growing body of literature on data linkage between cohort studies and digital footprints datasets. \u0000ResultsThe findings from this study will offer valuable insights into the factors influencing participants' consent decisions regarding data sharing. We will contribute to ongoing discussions about privacy and the ethical use of digital footprints data, informing the development of more inclusive consent processes in this field. \u0000Conclusions & ImplicationsThis study will inform consent practices in the realm of digital footprints, identifying and addressing potential barriers to consent, and contributing to the ongoing discourse on the responsible and ethical use of shopping data.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"5 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141363389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital footprints as means of measuring loneliness experience and embeddedness in social networks for designing digital mental health interventions 将数字足迹作为衡量孤独体验和社交网络嵌入度的手段,以设计数字心理健康干预措施
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2427
Bogna Liziniewicz, John Harvey, James Goulding, Liz Dowthwaite
Introduction & BackgroundDespite existing research evidence for the negative influence of loneliness on people’s wellbeing, most studies focus on the experiences of older adults and the student population. Moreover, research concerning loneliness and digital footprints uses demographic proxies, as opposed to a behavioural focus, thus providing an incomplete representation of the phenomenon’s influence on the general population. Objectives & ApproachThis project aims to use people’s digital social media data (shared by the participants from their Facebook, Twitter, or Reddit accounts) to address people’s experiences of loneliness in order to provide guidance for the design of interventions catering to the improvement of the wellbeing of individuals. Screening the participants for loneliness levels using the UCLA Loneliness Scale and looking how these experiences differ cross-sectionally (25-65-year-olds; minorities) will help understand the following: social network structures, as shaped by loneliness experience; the dynamics within one’s social network; and the linguistic content of the relationships. Using digital footprints for language modelling and thematic analysis of digital language data shared by the participants, in addition to social network analysis (mapped out based on the individuals’ digital interactions) will allow insight into digital wellbeing. A traditional approach will be utilised alongside digital data analysis to address the issue of limited social media data representativeness - relationships formed in non-digital settings, along with the associated loneliness experiences, will be included. In addition to sharing their digital footprints, the participants will be surveyed and interviewed about their everyday offline and digital experiences of loneliness; as well as their social network structures and dynamics. The interview and survey data will be analysed using thematic analysis of text data and predictive models of quantitative survey responses; in addition to social network analysis of the relationships listed during the interview. Predictions of loneliness outcomes in relation to people’s digital and offline behaviour; and correlations between loneliness experiences and social network dynamics will be made from the data. Relevance to Digital FootprintsThe project focuses on people’s experiences of loneliness in both digital and offline settings utilising the analysis of digital footprints from social media and traditional survey- and interview-based methodology. This approach will allow to gain insight into the similarities and differences between social network structures and dynamics as well as loneliness experiences in digital and real-world relationships as these inevitably interplay in everyday life. The inclusion of digital footprints data will allow to measure and predict loneliness impact on mental health and digital social behaviour; and design tailored digital wellbeing interventions in the future. Concl
引言与背景尽管现有研究证据表明孤独对人们的幸福有负面影响,但大多数研究都集中在老年人和学生群体的经历上。此外,有关孤独感和数字足迹的研究使用的是人口统计学代用指标,而不是行为学重点,因此无法全面反映这种现象对普通人群的影响。目标与方法本项目旨在利用人们的数字社交媒体数据(参与者从其 Facebook、Twitter 或 Reddit 账户中分享的数据)来了解人们的孤独体验,从而为设计干预措施提供指导,以改善个人的福祉。使用加州大学洛杉矶分校孤独感量表(UCLA Loneliness Scale)筛查参与者的孤独感水平,并观察这些体验在横截面上的差异(25-65 岁;少数民族),这将有助于了解以下内容:由孤独感体验形成的社交网络结构;个人社交网络内的动态;以及社交关系的语言内容。利用数字足迹进行语言建模,对参与者分享的数字语言数据进行主题分析,再加上社会网络分析(根据个人的数字互动绘制),将有助于深入了解数字福祉。在进行数字数据分析的同时,还将利用传统方法来解决社交媒体数据代表性有限的问题--在非数字环境中形成的关系以及相关的孤独体验都将被纳入其中。除了分享他们的数字足迹外,还将对参与者进行调查和访谈,了解他们日常的线下和数字孤独体验,以及他们的社交网络结构和动态。除了对访谈中列出的关系进行社会网络分析外,还将使用文本数据的主题分析和定量调查回答的预测模型对访谈和调查数据进行分析。将根据数据预测与人们的数字和离线行为相关的孤独感结果,以及孤独感体验与社会网络动态之间的相关性。与数字足迹的相关性该项目重点关注人们在数字和离线环境中的孤独体验,利用社交媒体的数字足迹分析以及传统的调查和访谈方法。这种方法将有助于深入了解社交网络结构和动态之间的异同,以及数字关系和现实世界关系中的孤独体验,因为这些在日常生活中不可避免地会发生相互作用。纳入数字足迹数据将有助于衡量和预测孤独感对心理健康和数字社交行为的影响,并在未来设计有针对性的数字健康干预措施。结论与影响研究结果将作为设计创新型孤独干预措施的基础,为公众和咨询师提供量身定制的健康支持。此外,该项目还旨在提高人们对孤独的认识。将来自不同社会群体的各种经验纳入其中,将实现以用户为中心的包容性方法。
{"title":"Digital footprints as means of measuring loneliness experience and embeddedness in social networks for designing digital mental health interventions","authors":"Bogna Liziniewicz, John Harvey, James Goulding, Liz Dowthwaite","doi":"10.23889/ijpds.v9i4.2427","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2427","url":null,"abstract":"Introduction & BackgroundDespite existing research evidence for the negative influence of loneliness on people’s wellbeing, most studies focus on the experiences of older adults and the student population. Moreover, research concerning loneliness and digital footprints uses demographic proxies, as opposed to a behavioural focus, thus providing an incomplete representation of the phenomenon’s influence on the general population. \u0000Objectives & ApproachThis project aims to use people’s digital social media data (shared by the participants from their Facebook, Twitter, or Reddit accounts) to address people’s experiences of loneliness in order to provide guidance for the design of interventions catering to the improvement of the wellbeing of individuals. Screening the participants for loneliness levels using the UCLA Loneliness Scale and looking how these experiences differ cross-sectionally (25-65-year-olds; minorities) will help understand the following: social network structures, as shaped by loneliness experience; the dynamics within one’s social network; and the linguistic content of the relationships. \u0000Using digital footprints for language modelling and thematic analysis of digital language data shared by the participants, in addition to social network analysis (mapped out based on the individuals’ digital interactions) will allow insight into digital wellbeing. \u0000A traditional approach will be utilised alongside digital data analysis to address the issue of limited social media data representativeness - relationships formed in non-digital settings, along with the associated loneliness experiences, will be included. In addition to sharing their digital footprints, the participants will be surveyed and interviewed about their everyday offline and digital experiences of loneliness; as well as their social network structures and dynamics. The interview and survey data will be analysed using thematic analysis of text data and predictive models of quantitative survey responses; in addition to social network analysis of the relationships listed during the interview. \u0000Predictions of loneliness outcomes in relation to people’s digital and offline behaviour; and correlations between loneliness experiences and social network dynamics will be made from the data. \u0000Relevance to Digital FootprintsThe project focuses on people’s experiences of loneliness in both digital and offline settings utilising the analysis of digital footprints from social media and traditional survey- and interview-based methodology. This approach will allow to gain insight into the similarities and differences between social network structures and dynamics as well as loneliness experiences in digital and real-world relationships as these inevitably interplay in everyday life. The inclusion of digital footprints data will allow to measure and predict loneliness impact on mental health and digital social behaviour; and design tailored digital wellbeing interventions in the future. \u0000Concl","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141365113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging multiple digital footprint datasets to predict racial, sex-based, and sexual-orientation bias across US states 利用多个数字足迹数据集预测美国各州的种族、性别和性取向偏见
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2429
Raphael Derecki, Brian O'Shea, James Goulding
Introduction & BackgroundRacial, gender, and sexual-orientation biases are pervasive throughout society. Importantly, modern digitally oriented datasets can elucidate important societal variables and potential solutions. One contemporary theory that attempts to explain these biases is parasite-stress: an evolutionary psychology hypothesis suggesting that increased infectious diseases rates increase out-group biases. We present preliminary findings that suggest that disease rates are a meaningful geospatial predictor of multiple biases.Objectives & ApproachWe explored biases using geospatial analyses throughout multiple datasets based on US participants: Project Implicit, American National Election Studies (ANES), Google Trends, and Twitter/X. We included state-based variables to compare between states and assess the most important environmental-level predictors of biases. We built generalised linear and linear mixed-effect models and general linear models. Within Project implicit (n > 3,000,000) and ANES datasets (n > 30,000), we assessed racial and sexual-orientation biases via explicit and implicit measures. For Google Trends and Twitter/X datasets, we assessed racial and sex-based biases via search and tweet-per-state scores. To analyse the biases, we included environmental-level variables, e.g., infectious disease rates (developed by Thornhill and Fincher in 2014), and individual-level variables, e.g., political orientation.Relevance to Digital FootprintsThese preliminary findings analyse everyday people’s online behaviour including volunteered surveys, searches and posts. We attempt to address the pressing societal issue of bias by leveraging modern datasets. Our primary goal is to aid policy makers by recommending cost-effective solutions that can improve several factors of the population’s quality of life.ResultsWe find that the most consistently significant predictor of racial bias is infectious disease rates. When leveraging Google Trends data including anti-women terminology, infectious disease rates and population density are consistent predictors of bias. Finally, we find preliminary results suggesting that increased levels of infectious diseases increases homophobic bias.Conclusions & ImplicationsOverall, we find that as infectious disease rates increase in a state, the level of racial and sexist bias significantly increases. Consistent with parasite-stress theory, we argue that focusing on reducing infectious disease rates in an area can have a plethora of benefits including improving physical and mental health and reducing biases that damage society.
导言与背景种族、性别和性取向偏见在整个社会中普遍存在。重要的是,以数字为导向的现代数据集可以阐明重要的社会变量和潜在的解决方案。当代一种试图解释这些偏见的理论是寄生虫压力:这是一种进化心理学假说,认为传染病发病率的上升会增加群体外偏见。我们提出的初步研究结果表明,疾病发病率是多种偏见的一个有意义的地理空间预测因素。目标与方法我们利用基于美国参与者的多个数据集的地理空间分析来探索偏见:项目、美国全国选举研究 (ANES)、谷歌趋势和 Twitter/X。我们纳入了基于州的变量,以便在各州之间进行比较,并评估最重要的环境层面偏差预测因素。我们建立了广义线性和线性混合效应模型以及广义线性模型。在隐性项目(n > 3,000,000)和 ANES 数据集(n > 30,000)中,我们通过显性和隐性测量来评估种族和性取向偏见。对于 Google Trends 和 Twitter/X 数据集,我们通过搜索和每州推文得分来评估种族和性别偏见。为了分析这些偏见,我们加入了环境层面的变量,如传染病发病率(由 Thornhill 和 Fincher 于 2014 年开发),以及个人层面的变量,如政治倾向。我们试图利用现代数据集来解决偏差这一紧迫的社会问题。我们的主要目标是通过推荐具有成本效益的解决方案来帮助政策制定者,从而改善人们生活质量的若干因素。在利用谷歌趋势数据(包括反女性术语)时,传染病发病率和人口密度也是预测偏见的一致因素。最后,我们发现初步结果表明,传染病发病率的上升会增加恐同偏见。结论与启示总体而言,我们发现随着一个州传染病发病率的上升,种族偏见和性别偏见的程度也会显著增加。与寄生虫-压力理论相一致,我们认为,集中精力降低一个地区的传染病发病率可带来诸多益处,包括改善身心健康和减少损害社会的偏见。
{"title":"Leveraging multiple digital footprint datasets to predict racial, sex-based, and sexual-orientation bias across US states","authors":"Raphael Derecki, Brian O'Shea, James Goulding","doi":"10.23889/ijpds.v9i4.2429","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2429","url":null,"abstract":"Introduction & BackgroundRacial, gender, and sexual-orientation biases are pervasive throughout society. Importantly, modern digitally oriented datasets can elucidate important societal variables and potential solutions. One contemporary theory that attempts to explain these biases is parasite-stress: an evolutionary psychology hypothesis suggesting that increased infectious diseases rates increase out-group biases. We present preliminary findings that suggest that disease rates are a meaningful geospatial predictor of multiple biases.\u0000Objectives & ApproachWe explored biases using geospatial analyses throughout multiple datasets based on US participants: Project Implicit, American National Election Studies (ANES), Google Trends, and Twitter/X. We included state-based variables to compare between states and assess the most important environmental-level predictors of biases. We built generalised linear and linear mixed-effect models and general linear models. Within Project implicit (n > 3,000,000) and ANES datasets (n > 30,000), we assessed racial and sexual-orientation biases via explicit and implicit measures. For Google Trends and Twitter/X datasets, we assessed racial and sex-based biases via search and tweet-per-state scores. To analyse the biases, we included environmental-level variables, e.g., infectious disease rates (developed by Thornhill and Fincher in 2014), and individual-level variables, e.g., political orientation.\u0000Relevance to Digital FootprintsThese preliminary findings analyse everyday people’s online behaviour including volunteered surveys, searches and posts. We attempt to address the pressing societal issue of bias by leveraging modern datasets. Our primary goal is to aid policy makers by recommending cost-effective solutions that can improve several factors of the population’s quality of life.\u0000ResultsWe find that the most consistently significant predictor of racial bias is infectious disease rates. When leveraging Google Trends data including anti-women terminology, infectious disease rates and population density are consistent predictors of bias. Finally, we find preliminary results suggesting that increased levels of infectious diseases increases homophobic bias.\u0000Conclusions & ImplicationsOverall, we find that as infectious disease rates increase in a state, the level of racial and sexist bias significantly increases. Consistent with parasite-stress theory, we argue that focusing on reducing infectious disease rates in an area can have a plethora of benefits including improving physical and mental health and reducing biases that damage society.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 415","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141364599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using social media metrics and linked survey data to understand survey behaviors 利用社交媒体指标和关联调查数据了解调查行为
Pub Date : 2024-06-10 DOI: 10.23889/ijpds.v9i4.2438
Tarek Al Baghal, Paulo Serôdio, Shujun Liu, Luke Sloan, C. Jessop
Introduction & BackgroundLinking social media and survey data at the individual level has the potential to add evidence to a variety of research questions. To make this data openly available to others, social media data need to be converted into useful metrics that minimise issues of disclosure while maximising utility. This research explores linkages of Twitter data and survey data in the Understanding Society Innovation panel, focusing on the usage of non-disclosive metrics created from Twitter data alongside the similarly anonymised survey data. Objectives & ApproachThe Innovation Panel asked for consent to link Twitter data to survey responses and data has been collected from the Twitter API. However, Twitter’s unstructured nature necessitates creating measures that can be used jointly with linked survey data. We have developed a framework to create social media metrics that can be combined with survey data that also remove any disclosive data, so these data can be widely shared for maximum utility. The current research analyses these data to understand what the metrics look like through presentation of descriptive statistics. We also begin to show these data may be used in combination with survey data through inclusion of a set of metrics in logistic regression models predicting attrition and measurement of mental health. Relevance to Digital FootprintsSocial media is a prevalent aspect of social life and leaves a substantial digital footprint. However, there are a number of limitations to these data, including a lack of understanding of who is producing the data, and having the ability to relate these to a variety of specific (and possibly higher quality) measures for a representative sample of the population. Linkage to surveys address these problems and can lead to new research opportunities using digital footprint data. ResultsWhile small sample sizes impact the power of some analyses, the methods developed are illustrative of ways to use this novel data source. Results show that there is high variation in the created metrics, and initial analysis shows that the inclusion of a set of user-level Twitter data is not significantly related to attrition. However, more accounts followed on Twitter and the number of user retweets are significantly related to higher levels of mental distress on the GHQ scale. Conclusions & ImplicationsOverall, there is some evidence that social media helps to understand survey outcomes, perhaps more so on measurement outcomes. This study provides an initial start on how to use these curated linked social media and survey data, and we note there are other social media networks that we can apply this strategy to; for example, LinkedIn, particularly with changes made to Twitter (X).
简介和背景 将个人层面的社交媒体和调查数据联系起来,有可能为各种研究问题提供更多证据。为了将这些数据开放给他人使用,需要将社交媒体数据转换成有用的指标,在最大限度地提高效用的同时尽量减少信息披露问题。本研究探讨了 "了解社会 "创新小组中 Twitter 数据与调查数据之间的联系,重点关注 Twitter 数据与类似的匿名调查数据中创建的非披露指标的使用情况。目标与方法创新小组要求同意将 Twitter 数据与调查回复联系起来,并已从 Twitter API 收集了数据。然而,Twitter 的非结构化特性要求我们创建可与关联调查数据共同使用的衡量指标。我们开发了一个框架,用于创建可与调查数据相结合的社交媒体指标,同时删除任何披露性数据,这样这些数据就可以广泛共享,发挥最大效用。目前的研究对这些数据进行了分析,通过描述性统计来了解这些指标是什么样的。我们还通过在逻辑回归模型中加入一组指标来预测自然减员和测量心理健康,从而开始显示这些数据可与调查数据结合使用。与数字足迹的相关性社交媒体是社会生活的一个普遍方面,并留下了大量的数字足迹。然而,这些数据有许多局限性,包括不了解数据的制作者,以及是否有能力将这些数据与具有代表性的人口样本的各种特定(可能是更高质量)衡量标准联系起来。与调查联系起来可以解决这些问题,并能利用数字足迹数据带来新的研究机会。结果虽然样本量小影响了某些分析的有效性,但所开发的方法说明了使用这种新型数据源的方法。结果表明,所创建的指标之间存在很大差异,初步分析表明,包含一组用户级 Twitter 数据与自然减员没有显著关系。然而,Twitter 上被关注的账户越多,用户转发的次数越多,这与 GHQ 量表中较高的精神压力水平有显著关系。结论与启示总的来说,有证据表明社交媒体有助于了解调查结果,也许在测量结果方面更有帮助。本研究为如何使用这些经过策划的社交媒体和调查数据提供了一个初步的起点,我们注意到还有其他社交媒体网络可以应用这一策略;例如 LinkedIn,尤其是在 Twitter (X) 发生变化的情况下。
{"title":"Using social media metrics and linked survey data to understand survey behaviors","authors":"Tarek Al Baghal, Paulo Serôdio, Shujun Liu, Luke Sloan, C. Jessop","doi":"10.23889/ijpds.v9i4.2438","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2438","url":null,"abstract":"Introduction & BackgroundLinking social media and survey data at the individual level has the potential to add evidence to a variety of research questions. To make this data openly available to others, social media data need to be converted into useful metrics that minimise issues of disclosure while maximising utility. This research explores linkages of Twitter data and survey data in the Understanding Society Innovation panel, focusing on the usage of non-disclosive metrics created from Twitter data alongside the similarly anonymised survey data. \u0000Objectives & ApproachThe Innovation Panel asked for consent to link Twitter data to survey responses and data has been collected from the Twitter API. However, Twitter’s unstructured nature necessitates creating measures that can be used jointly with linked survey data. We have developed a framework to create social media metrics that can be combined with survey data that also remove any disclosive data, so these data can be widely shared for maximum utility. The current research analyses these data to understand what the metrics look like through presentation of descriptive statistics. We also begin to show these data may be used in combination with survey data through inclusion of a set of metrics in logistic regression models predicting attrition and measurement of mental health. \u0000Relevance to Digital FootprintsSocial media is a prevalent aspect of social life and leaves a substantial digital footprint. However, there are a number of limitations to these data, including a lack of understanding of who is producing the data, and having the ability to relate these to a variety of specific (and possibly higher quality) measures for a representative sample of the population. Linkage to surveys address these problems and can lead to new research opportunities using digital footprint data. \u0000ResultsWhile small sample sizes impact the power of some analyses, the methods developed are illustrative of ways to use this novel data source. Results show that there is high variation in the created metrics, and initial analysis shows that the inclusion of a set of user-level Twitter data is not significantly related to attrition. However, more accounts followed on Twitter and the number of user retweets are significantly related to higher levels of mental distress on the GHQ scale. \u0000Conclusions & ImplicationsOverall, there is some evidence that social media helps to understand survey outcomes, perhaps more so on measurement outcomes. This study provides an initial start on how to use these curated linked social media and survey data, and we note there are other social media networks that we can apply this strategy to; for example, LinkedIn, particularly with changes made to Twitter (X).","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 540","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141364067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Resource Profile: ECHILD only-children and siblings (ECHILD-oCSib): a national cohort of linked health, education and social care data on mothers and children in England 数据资源简介:ECHILD 独生子女和兄弟姐妹 (ECHILD-oCSib):关于英格兰母亲和儿童的全国健康、教育和社会护理关联数据队列
Pub Date : 2024-06-06 DOI: 10.23889/ijpds.v8i6.2392
Qi Feng, Georgina Ireland, Ruth Gilbert, Katie Harron
IntroductionSibling dynamics play a crucial role in individual development, health and wellbeing. We established a national birth cohort using administrative health, education and social care data in England featuring clusters of mothers and their children (mothers and only-children, MoC; and mothers and siblings, MSib).MethodsFrom 13.6 million mother-baby pairs from births between April 1997 and January 2022 captured in Hospital Episode Statistics in England, we identified MoC and MSib clusters by identifying livebirths linked to the same mother. We compared only-children and children with siblings, by ethnicity, sociodemographic variables, and birth characteristics. We calculated birth intervals for children with siblings.ResultsWe identified 4,086,648 MoC and 3,957,856 MSib clusters. Compared with only-children, children with siblings were more likely to be Asian, live in more deprived areas, and have younger mothers, but were less likely to be overdue births (>=42 weeks), or to have very low birth weight (< 1500g). Children with siblings were also less likely to have been admitted to special neonatal care after birth compared to only-children. Among the MSib clusters, sibship sizes varied between 2 and 15, with a mean of 2.4 children per mother. The median birth interval was 3.0 years.ConclusionThis national cohort ECHILD-oCSib of 4.1 million MoC and 4.0 million MSib clusters in England is an important resource for investigating the effects of maternal exposures, sibling dynamics and their interplay on individual development, health and wellbeing. Potential sources of bias should be considered in analyses of these data.
导言:兄弟姐妹关系对个人发展、健康和幸福起着至关重要的作用。我们利用英格兰的卫生、教育和社会保健行政数据建立了一个全国出生队列,该队列以母亲及其子女(母亲和独生子女,MoC;母亲和兄弟姐妹,MSib)的集群为特征。方法从英格兰医院事件统计中记录的 1997 年 4 月至 2022 年 1 月间出生的 1360 万对母婴中,我们通过识别与同一母亲相关的活产来确定 MoC 和 MSib 集群。我们按照种族、社会人口变量和出生特征对独生子女和有兄弟姐妹的儿童进行了比较。我们计算了有兄弟姐妹的儿童的出生间隔。结果我们发现了 4,086,648 个 MoC 群体和 3,957,856 个 MSib 群体。与独生子女相比,有兄弟姐妹的儿童更有可能是亚裔、生活在更贫困的地区、母亲更年轻,但过期分娩(>=42周)或出生体重极低(<1500克)的可能性较小。与独生子女相比,有兄弟姐妹的儿童出生后接受特殊新生儿护理的可能性也较小。在 MSib 群组中,兄弟姐妹的人数从 2 到 15 不等,平均每位母亲有 2.4 个孩子。结论:ECHILD-oCSib 这一全国性队列包括英格兰 410 万名 MoC 和 400 万名 MSib 群体,是研究母体暴露、兄弟姐妹动态及其相互作用对个人发展、健康和幸福的影响的重要资源。在分析这些数据时应考虑潜在的偏差来源。
{"title":"Data Resource Profile: ECHILD only-children and siblings (ECHILD-oCSib): a national cohort of linked health, education and social care data on mothers and children in England","authors":"Qi Feng, Georgina Ireland, Ruth Gilbert, Katie Harron","doi":"10.23889/ijpds.v8i6.2392","DOIUrl":"https://doi.org/10.23889/ijpds.v8i6.2392","url":null,"abstract":"IntroductionSibling dynamics play a crucial role in individual development, health and wellbeing. We established a national birth cohort using administrative health, education and social care data in England featuring clusters of mothers and their children (mothers and only-children, MoC; and mothers and siblings, MSib).\u0000MethodsFrom 13.6 million mother-baby pairs from births between April 1997 and January 2022 captured in Hospital Episode Statistics in England, we identified MoC and MSib clusters by identifying livebirths linked to the same mother. We compared only-children and children with siblings, by ethnicity, sociodemographic variables, and birth characteristics. We calculated birth intervals for children with siblings.\u0000ResultsWe identified 4,086,648 MoC and 3,957,856 MSib clusters. Compared with only-children, children with siblings were more likely to be Asian, live in more deprived areas, and have younger mothers, but were less likely to be overdue births (>=42 weeks), or to have very low birth weight (< 1500g). Children with siblings were also less likely to have been admitted to special neonatal care after birth compared to only-children. Among the MSib clusters, sibship sizes varied between 2 and 15, with a mean of 2.4 children per mother. The median birth interval was 3.0 years.\u0000ConclusionThis national cohort ECHILD-oCSib of 4.1 million MoC and 4.0 million MSib clusters in England is an important resource for investigating the effects of maternal exposures, sibling dynamics and their interplay on individual development, health and wellbeing. Potential sources of bias should be considered in analyses of these data.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"226 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141376179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking migration and hospital data in England: linkage process and evaluation of bias 英格兰移民数据与医院数据的关联:关联过程与偏差评估
Pub Date : 2024-02-12 DOI: 10.23889/ijpds.v9i1.2181
Rachel Burns, Sacha Wyke, Y. Boukari, Sirinivasa Vittal Katikireddi, D. Zenner, I. Campos-Matos, Katie Harron, Robert Aldridge
IntroductionDifficulties ascertaining migrant status in national data sources such as hospital records have limited large-scale evaluation of migrant healthcare needs in many countries, including England. Linkage of immigration data for migrants and refugees, with National Health Service (NHS) hospital care data enables research into the relationship between migration and health for a large cohort of international migrants.ObjectivesWe aimed to describe the linkage process and compare linkage rates between migrant sub-groups to evaluate for potential bias for data on non-EU migrants and resettled refugees linked to Hospital Episode Statistics (HES) in England.MethodsWe used stepwise deterministic linkage to match records from migrants and refugees to a unique healthcare identifier indicating interaction with the NHS (linkage stage 1 to NHS Personal Demographic Services, PDS), and then to hospital records (linkage stage 2 to HES). We calculated linkage rates and compared linked and unlinked migrant characteristics for each linkage stage.ResultsOf the 1,799,307 unique migrant records, 1,134,007 (63%) linked to PDS and 451,689 (25%) linked to at least one hospital record between 01/01/2005 and 23/03/2020. Individuals on work, student, or working holiday visas were less likely to link to a hospital record than those on settlement and dependent visas and refugees. Migrants from the Middle East and North Africa and South Asia were four times more likely to link to at least one hospital record, compared to those from East Asia and the Pacific. Differences in age, sex, visa type, and region of origin between linked and unlinked samples were small to moderate.ConclusionThis linked dataset represents a unique opportunity to explore healthcare use in migrants. However, lower linkage rates disproportionately affected individuals on shorter-term visas so future studies of these groups may be more biased as a result. Increasing the quality and completeness of identifiers recorded in administrative data could improve data linkage quality.
导言:在医院记录等国家数据源中确定移民身份存在困难,这限制了包括英格兰在内的许多国家对移民医疗需求的大规模评估。将移民和难民的移民数据与英国国家医疗服务系统(NHS)的医院护理数据联系起来,可以研究大量国际移民群体的移民与健康之间的关系。方法我们采用逐步确定性链接法,将移民和难民的记录与表明与英国国家医疗服务系统(NHS)互动的唯一医疗标识符进行匹配(链接阶段 1:英国国家医疗服务系统个人人口统计服务(PDS)),然后与医院记录进行匹配(链接阶段 2:医院事件统计(HES))。在 1,799,307 份独特的移民记录中,1,134,007 份(63%)与 PDS 关联,451,689 份(25%)与 2005 年 1 月 1 日至 2020 年 3 月 23 日期间的至少一份医院记录关联。持工作、学生或工作假期签证的个人与医院记录关联的可能性低于持定居和受抚养人签证的个人及难民。与来自东亚和太平洋地区的移民相比,来自中东和北非以及南亚的移民与至少一家医院记录建立联系的可能性要高出四倍。链接样本和未链接样本在年龄、性别、签证类型和原籍地区方面的差异很小到中等。然而,较低的链接率对持有短期签证的个人影响过大,因此未来对这些群体的研究可能会因此产生更多偏差。提高行政数据中记录的标识符的质量和完整性可以提高数据关联的质量。
{"title":"Linking migration and hospital data in England: linkage process and evaluation of bias","authors":"Rachel Burns, Sacha Wyke, Y. Boukari, Sirinivasa Vittal Katikireddi, D. Zenner, I. Campos-Matos, Katie Harron, Robert Aldridge","doi":"10.23889/ijpds.v9i1.2181","DOIUrl":"https://doi.org/10.23889/ijpds.v9i1.2181","url":null,"abstract":"IntroductionDifficulties ascertaining migrant status in national data sources such as hospital records have limited large-scale evaluation of migrant healthcare needs in many countries, including England. Linkage of immigration data for migrants and refugees, with National Health Service (NHS) hospital care data enables research into the relationship between migration and health for a large cohort of international migrants.\u0000ObjectivesWe aimed to describe the linkage process and compare linkage rates between migrant sub-groups to evaluate for potential bias for data on non-EU migrants and resettled refugees linked to Hospital Episode Statistics (HES) in England.\u0000MethodsWe used stepwise deterministic linkage to match records from migrants and refugees to a unique healthcare identifier indicating interaction with the NHS (linkage stage 1 to NHS Personal Demographic Services, PDS), and then to hospital records (linkage stage 2 to HES). We calculated linkage rates and compared linked and unlinked migrant characteristics for each linkage stage.\u0000ResultsOf the 1,799,307 unique migrant records, 1,134,007 (63%) linked to PDS and 451,689 (25%) linked to at least one hospital record between 01/01/2005 and 23/03/2020. Individuals on work, student, or working holiday visas were less likely to link to a hospital record than those on settlement and dependent visas and refugees. Migrants from the Middle East and North Africa and South Asia were four times more likely to link to at least one hospital record, compared to those from East Asia and the Pacific. Differences in age, sex, visa type, and region of origin between linked and unlinked samples were small to moderate.\u0000ConclusionThis linked dataset represents a unique opportunity to explore healthcare use in migrants. However, lower linkage rates disproportionately affected individuals on shorter-term visas so future studies of these groups may be more biased as a result. Increasing the quality and completeness of identifiers recorded in administrative data could improve data linkage quality.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"103 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139842095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Population Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1