Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2426
V. Jenneson, F.L. Pontin, Emily Ennis, Alison Fildes, Michelle A. Morris
Introduction & BackgroundOn 1 October 2022, new legislation came into force for England restricting the placement of some food and drink products high in fat, sugar and salt (HFSS). Products such as confectionery can no longer be placed at store entrances, ends of aisles, or at the checkout in large retail stores and their online equivalents. Objectives & ApproachOur protocol sets out how daily sales and product data from multiple retailers will be used to evaluate the legislation’s success in relation to HFSS sales, product portfolios and equitability. Food and drink sales data from 18 months pre- and 12 months post-introduction of the policy will be gained from multiple large UK retailers. Online sales are excluded. Eligible stores were defined as supermarkets from our partner retailer brands with store areas larger than 280 square metres. From the eligible store sample, we selected 160 intervention stores (England) and 50 control stores (Scotland and Wales) from each partner retailer. The sample provides equal store numbers across each decile of the Priority Places for Food Index (PPFI) from each retailer (n = 16), capturing food insecurity risk, and maximum coverage of store (store size) and store area characteristics (urban/rural status). Controlled interrupted time-series will be used to estimate effects of the policy, with stores from Scotland and Wales (where the legislation has not been implemented) acting as controls. Relevance to Digital FootprintsThis protocol sets out the first multiple-retailer independent analysis of the HFSS legislation, demonstrating how business digital footprints data can contribute to policy evaluation. ResultsOutcomes will include sales of HFSS products and changes to available product portfolios. We will explore whether legislation impacts were equitable across stores in areas with different demographic characteristics, according to the English Indices of Multiple Deprivation and the PPFI. Findings at the retailer and cross-retailer levels will inform sector-level insights regarding impact and potential next steps for policy and business practice. Conclusions & ImplicationsOur conclusions will contribute to policy-relevant discussions around the effectiveness of HFSS government policy, with potential to influence future decision-making across the UK Devolved Nations.
{"title":"Has HFSS legislation led to healthier food and beverage sales? The DIO-Food protocol – using supermarket sales data for policy evaluation","authors":"V. Jenneson, F.L. Pontin, Emily Ennis, Alison Fildes, Michelle A. Morris","doi":"10.23889/ijpds.v9i4.2426","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2426","url":null,"abstract":"Introduction & BackgroundOn 1 October 2022, new legislation came into force for England restricting the placement of some food and drink products high in fat, sugar and salt (HFSS). Products such as confectionery can no longer be placed at store entrances, ends of aisles, or at the checkout in large retail stores and their online equivalents. \u0000Objectives & ApproachOur protocol sets out how daily sales and product data from multiple retailers will be used to evaluate the legislation’s success in relation to HFSS sales, product portfolios and equitability. Food and drink sales data from 18 months pre- and 12 months post-introduction of the policy will be gained from multiple large UK retailers. Online sales are excluded. \u0000Eligible stores were defined as supermarkets from our partner retailer brands with store areas larger than 280 square metres. From the eligible store sample, we selected 160 intervention stores (England) and 50 control stores (Scotland and Wales) from each partner retailer. \u0000The sample provides equal store numbers across each decile of the Priority Places for Food Index (PPFI) from each retailer (n = 16), capturing food insecurity risk, and maximum coverage of store (store size) and store area characteristics (urban/rural status). \u0000Controlled interrupted time-series will be used to estimate effects of the policy, with stores from Scotland and Wales (where the legislation has not been implemented) acting as controls. \u0000Relevance to Digital FootprintsThis protocol sets out the first multiple-retailer independent analysis of the HFSS legislation, demonstrating how business digital footprints data can contribute to policy evaluation. \u0000ResultsOutcomes will include sales of HFSS products and changes to available product portfolios. We will explore whether legislation impacts were equitable across stores in areas with different demographic characteristics, according to the English Indices of Multiple Deprivation and the PPFI. \u0000Findings at the retailer and cross-retailer levels will inform sector-level insights regarding impact and potential next steps for policy and business practice. \u0000Conclusions & ImplicationsOur conclusions will contribute to policy-relevant discussions around the effectiveness of HFSS government policy, with potential to influence future decision-making across the UK Devolved Nations.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 1253","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141363820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2435
Kuzivakwashe Makokoro, Gavin Long, John Harvey, Andrew Smith, Simon Welham, Evgeniya Lukinova, James Goulding
Introduction & BackgroundThe level of food insecurity in England is widening, with low-income families requiring more support to reduce income inequalities. The government have introduced policies to address these issues with targeted subsidies on healthy food on programs such as the Healthy Start Scheme. Despite this, national uptake of the Healthy Start Scheme remains lower than the government target. Objectives & ApproachOur study aims to predict uptake and take up discrepancies at a local authority level and understand the measures contributing to the prediction using anonymised supermarket loyalty card data records for over 4 million customers, deprivation and food insecurity measures. We used a machine-learning approach utilising transactional data, ONS Index of Deprivation datasets, neighbourhood statistics, and NHS Healthy Start Scheme uptake data. Regression prediction models were used to evaluate and predict the outcomes, whilst feature importance tools were used to evaluate the variables weighing within the model. Relevance to Digital FootprintsThis study leverages transaction data from a UK retailer to understand lifestyle factors at a local authority level and assesses their usefulness in predicting the scheme’s uptake. Loyalty card transactional data can provide valuable insight into purchase behaviour linked to health and nutrition. ResultsThe Linear and Ridge Regression models performed better than other prediction models. Analysis of measures revealed that whilst deprivation and population-related measures had a high contribution to the prediction model, findings from transactional data measures provided valuable insight into shopping behavioural characteristics that contribute to the model performance. Results suggested that areas with higher spending on fruits and vegetables and high-calorie food were associated with higher uptake prediction in test data but the converse for high spend on fish. Conclusions & ImplicationsOur study indicates that shopping data measures such as spend on fruits and vegetables, high-calorie food, fish and products bought can be utilised for prediction models for uptake and take-up discrepancy of the Healthy Start Scheme. This study highlights the complexity of understanding factors influencing public policy effectiveness and the need for tailored approaches in diverse urban contexts.
{"title":"Predicting Healthy Start Scheme Uptake using Deprivation and Food Insecurity Measures.","authors":"Kuzivakwashe Makokoro, Gavin Long, John Harvey, Andrew Smith, Simon Welham, Evgeniya Lukinova, James Goulding","doi":"10.23889/ijpds.v9i4.2435","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2435","url":null,"abstract":"Introduction & BackgroundThe level of food insecurity in England is widening, with low-income families requiring more support to reduce income inequalities. The government have introduced policies to address these issues with targeted subsidies on healthy food on programs such as the Healthy Start Scheme. Despite this, national uptake of the Healthy Start Scheme remains lower than the government target. \u0000Objectives & ApproachOur study aims to predict uptake and take up discrepancies at a local authority level and understand the measures contributing to the prediction using anonymised supermarket loyalty card data records for over 4 million customers, deprivation and food insecurity measures. We used a machine-learning approach utilising transactional data, ONS Index of Deprivation datasets, neighbourhood statistics, and NHS Healthy Start Scheme uptake data. Regression prediction models were used to evaluate and predict the outcomes, whilst feature importance tools were used to evaluate the variables weighing within the model. \u0000Relevance to Digital FootprintsThis study leverages transaction data from a UK retailer to understand lifestyle factors at a local authority level and assesses their usefulness in predicting the scheme’s uptake. Loyalty card transactional data can provide valuable insight into purchase behaviour linked to health and nutrition. \u0000ResultsThe Linear and Ridge Regression models performed better than other prediction models. Analysis of measures revealed that whilst deprivation and population-related measures had a high contribution to the prediction model, findings from transactional data measures provided valuable insight into shopping behavioural characteristics that contribute to the model performance. Results suggested that areas with higher spending on fruits and vegetables and high-calorie food were associated with higher uptake prediction in test data but the converse for high spend on fish. \u0000Conclusions & ImplicationsOur study indicates that shopping data measures such as spend on fruits and vegetables, high-calorie food, fish and products bought can be utilised for prediction models for uptake and take-up discrepancy of the Healthy Start Scheme. This study highlights the complexity of understanding factors influencing public policy effectiveness and the need for tailored approaches in diverse urban contexts.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141366301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2423
Keneuoe Maliehe, James Goulding, Salim Alam, Stuart Marsh
Introduction & BackgroundMethane (CH4) is a powerful greenhouse gas, leaving both a physical and digital footprint from natural (40%) and human (60%) sources. Its atmospheric concentration has increased from 722 ppb before the industrial age to ~1,922 ppb in recent times. Because of its global warming potential, measuring and monitoring CH4 is crucial to mitigating the impacts of climate change. However, large uncertainties exist in “bottom-up” inventories (a product of activity data based on counts of components, equipment or throughput, and estimates of gas-loss rates per unit of activity for different land uses) reported to the United Nations Framework Convention on Climate Change, making it difficult for policymakers to set emission reduction targets. To address this, we employ causality-constrained machine learning (ML) to combine different gas observations from satellite sensors onboard the TROPOspheric Monitoring Instrument (which measure a digital footprint of human methane-generating behaviour) with outputs from chemical modelling. These are linked with datasets from the national statistics office, meteorology office and a comprehensive survey on quality of life in the emission field, to improve bottom-up estimates of CH4 emissions at the Earth’s surface. Objectives & ApproachThe research uses mixed methods for collecting and analysing both qualitative and quantitative data for multidisciplinary processing strategies for monitoring CH4 emissions locally and regionally. It also assesses whether additional “digital footprint” variables besides the well-known chemical sources and sinks can be studied to improve our understanding of the CH4 budget. We have conducted an “analytical inversion” of satellite observations of CH4 to obtain emission fluxes. These represent the dependent variable for our ML model, in combination with 22 independent variables (co-occurring trace gases, meteorological fields, land use, land cover, population, livestock, and data from a survey of quality of life from the Gauteng City-Region Observatory, covering a broad range of socio-economic, personal and political issues) with near-real-time Earth observation data, to aid the development of a causality-constrained ML model for the prediction of CH4 fluxes. Relevance to Digital FootprintsWe make use of not only satellite imagery, but socio-economic, demographic, and environmental data, and repurpose it for environmental sustainability in the context of mitigating climate change. We are creating unique resources in documenting rapid changes in emissions. Conclusions & ImplicationsThis research will make important contributions to developing countries with limited resources, enabling them to contribute to the global stocktake towards net-zero by helping policymakers identify geographic regions that are major emitters, enabling them to put measures into place to mitigate emissions.
{"title":"Earth Observations, Digital Footprints and Machine-Learning: Greenhouse Gas Stocktaking for Climate Change Mitigation","authors":"Keneuoe Maliehe, James Goulding, Salim Alam, Stuart Marsh","doi":"10.23889/ijpds.v9i4.2423","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2423","url":null,"abstract":"Introduction & BackgroundMethane (CH4) is a powerful greenhouse gas, leaving both a physical and digital footprint from natural (40%) and human (60%) sources. Its atmospheric concentration has increased from 722 ppb before the industrial age to ~1,922 ppb in recent times. Because of its global warming potential, measuring and monitoring CH4 is crucial to mitigating the impacts of climate change. However, large uncertainties exist in “bottom-up” inventories (a product of activity data based on counts of components, equipment or throughput, and estimates of gas-loss rates per unit of activity for different land uses) reported to the United Nations Framework Convention on Climate Change, making it difficult for policymakers to set emission reduction targets. \u0000To address this, we employ causality-constrained machine learning (ML) to combine different gas observations from satellite sensors onboard the TROPOspheric Monitoring Instrument (which measure a digital footprint of human methane-generating behaviour) with outputs from chemical modelling. These are linked with datasets from the national statistics office, meteorology office and a comprehensive survey on quality of life in the emission field, to improve bottom-up estimates of CH4 emissions at the Earth’s surface. \u0000Objectives & ApproachThe research uses mixed methods for collecting and analysing both qualitative and quantitative data for multidisciplinary processing strategies for monitoring CH4 emissions locally and regionally. It also assesses whether additional “digital footprint” variables besides the well-known chemical sources and sinks can be studied to improve our understanding of the CH4 budget. \u0000We have conducted an “analytical inversion” of satellite observations of CH4 to obtain emission fluxes. These represent the dependent variable for our ML model, in combination with 22 independent variables (co-occurring trace gases, meteorological fields, land use, land cover, population, livestock, and data from a survey of quality of life from the Gauteng City-Region Observatory, covering a broad range of socio-economic, personal and political issues) with near-real-time Earth observation data, to aid the development of a causality-constrained ML model for the prediction of CH4 fluxes. \u0000Relevance to Digital FootprintsWe make use of not only satellite imagery, but socio-economic, demographic, and environmental data, and repurpose it for environmental sustainability in the context of mitigating climate change. We are creating unique resources in documenting rapid changes in emissions. \u0000Conclusions & ImplicationsThis research will make important contributions to developing countries with limited resources, enabling them to contribute to the global stocktake towards net-zero by helping policymakers identify geographic regions that are major emitters, enabling them to put measures into place to mitigate emissions.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141363502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2430
Eszter Vigh, Angela Attwood, Anne Roudaut
Introduction & BackgroundThere is opportunity to engage light to moderate drinkers in alcohol reduction interventions as a preventative measure. In the space of online grocery shopping there is an added challenge in intervention development in the form of deceptive patterns, which influence consumer behaviour in unhealthy ways including automating behaviour and encouraging overconsumption. Objectives & ApproachThe objectives of this study are to: 1) identify deceptive patterns in the online grocery shopping context, 2) develop interventions which support healthier decision making in this context, 3) apply those interventions to appropriate product categories. The method utilised in the first objective is heuristic analysis which was conducted across eleven major online grocery shopping platforms. The interventions were then developed using the Rapid Iterative Testing and Evaluation (RITE) method, which involved interviewing participants and iterating upon the inventions after every interview. Each interview was analysed using content analysis. When incorporating the interventions into the online grocery shopping environment, interviews were conducted to gain insight into drinking and purchasing habits of consumers. These final interviews were then analysed with inductive thematic analysis. Relevance to Digital FootprintsDigital Footprints underpin the entire intervention development space. The background of the project is built upon human shopping and interaction behaviour online when encountering deceptive patterns. These deceptive patterns have been established using mobile gaming micro-transaction data, online grocery shopping log-in and rewards data, among other data sources. Digital Footprints data can further support the findings from the thematic analysis by further showing cultural and social trends around drinking (e.g., increased purchasing of seasonal beers and ciders in the summer and during sporting tournaments). The purpose of the drinking identified through those social and cultural trends gauge the appropriateness of proposed alcohol interventions. Beyond this, digital footprints data around engagement with health and wellness promoting applications (e.g., active users and app downloads) provides greater insight into the types of health messaging that garner attention and can be used to further inform how to approach those currently outside the health-engaged group. Digital footprints serve to attach larger societal trends to the smaller-scaled interviews and thematic analysis conducted as part of the study. ResultsInitial findings have shown opportunities for nudging light to moderate drinkers who primarily consume beer, wine, or cider. Spirits have been identified as difficult to substitute due to a lack of substitution options in the low alcohol spirit category that are widely available on the consumer market via online grocery retailers. Conclusions & ImplicationsWithout significant change, costs to the National Health Service
{"title":"Alcohol Interventions on Online Grocery Shopping Platforms","authors":"Eszter Vigh, Angela Attwood, Anne Roudaut","doi":"10.23889/ijpds.v9i4.2430","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2430","url":null,"abstract":"Introduction & BackgroundThere is opportunity to engage light to moderate drinkers in alcohol reduction interventions as a preventative measure. In the space of online grocery shopping there is an added challenge in intervention development in the form of deceptive patterns, which influence consumer behaviour in unhealthy ways including automating behaviour and encouraging overconsumption.\u0000Objectives & ApproachThe objectives of this study are to: 1) identify deceptive patterns in the online grocery shopping context, 2) develop interventions which support healthier decision making in this context, 3) apply those interventions to appropriate product categories. The method utilised in the first objective is heuristic analysis which was conducted across eleven major online grocery shopping platforms. The interventions were then developed using the Rapid Iterative Testing and Evaluation (RITE) method, which involved interviewing participants and iterating upon the inventions after every interview. Each interview was analysed using content analysis. When incorporating the interventions into the online grocery shopping environment, interviews were conducted to gain insight into drinking and purchasing habits of consumers. These final interviews were then analysed with inductive thematic analysis.\u0000Relevance to Digital FootprintsDigital Footprints underpin the entire intervention development space. The background of the project is built upon human shopping and interaction behaviour online when encountering deceptive patterns. These deceptive patterns have been established using mobile gaming micro-transaction data, online grocery shopping log-in and rewards data, among other data sources. Digital Footprints data can further support the findings from the thematic analysis by further showing cultural and social trends around drinking (e.g., increased purchasing of seasonal beers and ciders in the summer and during sporting tournaments). The purpose of the drinking identified through those social and cultural trends gauge the appropriateness of proposed alcohol interventions. Beyond this, digital footprints data around engagement with health and wellness promoting applications (e.g., active users and app downloads) provides greater insight into the types of health messaging that garner attention and can be used to further inform how to approach those currently outside the health-engaged group. Digital footprints serve to attach larger societal trends to the smaller-scaled interviews and thematic analysis conducted as part of the study.\u0000ResultsInitial findings have shown opportunities for nudging light to moderate drinkers who primarily consume beer, wine, or cider. Spirits have been identified as difficult to substitute due to a lack of substitution options in the low alcohol spirit category that are widely available on the consumer market via online grocery retailers.\u0000Conclusions & ImplicationsWithout significant change, costs to the National Health Service","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 1232","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141363742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2418
Shujun Liu, Luke Sloan, C. Jessop, Tarek Al Baghal, Paulo Serôdio
Introduction & BackgroundUses and gratification (U&G) theory posits individuals’ engagement with social media is a deliberate effort to fulfill various needs, like information seeking, entertainment, and networking. However, prior studies predominantly addressed whether individuals use social media to satisfy their needs, leaving a gap in understanding how individuals behave online to satisfy needs. This study fills this gap by merging survey responses with actual Twitter activity, to investigate how individuals behave online to satisfy distinctive motivations, including (a) self-expression, (b) seeking entertainment, (c) business and working, (d) staying informed with news, and (e) networking. We also investigated how these online behaviors vary among individuals with different demographic features, including socio-economic classes, gender, and age. Objectives & ApproachOur research addressed questions by linking survey responses with actual Twitter activities within the U.K. Participants were asked to provide survey responses surrounding age, gender, socio-economic class, and motivations for using social media. They were also queried about the existence of Twitter account, willingness to disclose Twitter username, and, if agreeable, the username itself. The survey continued until a total of 2,195 individuals shared Twitter handles. Following the removal of accounts that were either suspended or nonexistent, the study proceeded with a final count of 1,915. We collected each user’s Twitter metadata with Twitter API, including tweet count, follower count, following count, and bio information, and linked each user’s metadata with survey responses. To ensure respondents’ anonymity, survey, Twitter and linked data are stored separately, and can only be accessed by designated researcher. Relevance to Digital FootprintsThe study's approach of linking survey responses with actual Twitter activity offers a detailed insight into the digital footprints left by users as they engage with social media to satisfy their diverse needs. By analyzing the behaviors associated with motivations, this research illuminates the specific ways individuals curate their digital presence. ResultsRegression analysis indicated that individuals motivated by self-expression tend to tweet (b = .28, SE = .06, p < .001), follow account (b = .38, SE = .06, p < .001), gain followers (b = .13, SE = .06, p = .035), and post bio details (b = .89, SE = .13, p < .001). Work and business motivation leads to post bio information (b = .38, SE = .15, p = .012), while networking leads to follow more accounts (b = .28, SE = .06, p < .001). Social-economic class moderated associations between networking motivation and tweet count (b = -.25, SE = .09, p = .004), and between self-expression and tweet count (b = .20, SE = .08, p = .009). For individuals with higher socio-economic, self-expression has a higher effect on tweet count, whereas networking motivation has a less effect on tweet count
{"title":"Understanding Twitter Usage through Linked Data: An Analysis of Motivations and Online Behavior","authors":"Shujun Liu, Luke Sloan, C. Jessop, Tarek Al Baghal, Paulo Serôdio","doi":"10.23889/ijpds.v9i4.2418","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2418","url":null,"abstract":"Introduction & BackgroundUses and gratification (U&G) theory posits individuals’ engagement with social media is a deliberate effort to fulfill various needs, like information seeking, entertainment, and networking. However, prior studies predominantly addressed whether individuals use social media to satisfy their needs, leaving a gap in understanding how individuals behave online to satisfy needs. This study fills this gap by merging survey responses with actual Twitter activity, to investigate how individuals behave online to satisfy distinctive motivations, including (a) self-expression, (b) seeking entertainment, (c) business and working, (d) staying informed with news, and (e) networking. We also investigated how these online behaviors vary among individuals with different demographic features, including socio-economic classes, gender, and age. \u0000Objectives & ApproachOur research addressed questions by linking survey responses with actual Twitter activities within the U.K. Participants were asked to provide survey responses surrounding age, gender, socio-economic class, and motivations for using social media. They were also queried about the existence of Twitter account, willingness to disclose Twitter username, and, if agreeable, the username itself. The survey continued until a total of 2,195 individuals shared Twitter handles. Following the removal of accounts that were either suspended or nonexistent, the study proceeded with a final count of 1,915. \u0000We collected each user’s Twitter metadata with Twitter API, including tweet count, follower count, following count, and bio information, and linked each user’s metadata with survey responses. To ensure respondents’ anonymity, survey, Twitter and linked data are stored separately, and can only be accessed by designated researcher. \u0000Relevance to Digital FootprintsThe study's approach of linking survey responses with actual Twitter activity offers a detailed insight into the digital footprints left by users as they engage with social media to satisfy their diverse needs. By analyzing the behaviors associated with motivations, this research illuminates the specific ways individuals curate their digital presence. \u0000ResultsRegression analysis indicated that individuals motivated by self-expression tend to tweet (b = .28, SE = .06, p < .001), follow account (b = .38, SE = .06, p < .001), gain followers (b = .13, SE = .06, p = .035), and post bio details (b = .89, SE = .13, p < .001). Work and business motivation leads to post bio information (b = .38, SE = .15, p = .012), while networking leads to follow more accounts (b = .28, SE = .06, p < .001). \u0000Social-economic class moderated associations between networking motivation and tweet count (b = -.25, SE = .09, p = .004), and between self-expression and tweet count (b = .20, SE = .08, p = .009). For individuals with higher socio-economic, self-expression has a higher effect on tweet count, whereas networking motivation has a less effect on tweet count","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"109 26","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141362070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2433
Paulo Matos Serodio, Tarek Al Baghal, Luke Sloan, Shujun Liu, C. Jessop
Introduction & BackgroundLinkedIn, with its extensive global network of over 900 million members across more than 200 countries, presents a unique repository for examining labour market dynamics, professional development, and the impact of social networking on employment opportunities. Despite its potential, LinkedIn's wealth of data on professional trajectories, skills, and labour market outcomes remains largely untapped in survey research due to challenges in data collection. Objectives & ApproachThis paper introduces a novel methodology for integrating LinkedIn data with survey responses using data from the fourteenth wave of the Innovation Panel (IP14) of Understanding Society: The UK Household Longitudinal Study (UKHLS), conducted in 2021. In IP14, we probed the extent of LinkedIn usage among the UK population and assessed users' willingness to link their LinkedIn profiles with their survey responses. Those consenting to link their accounts were asked for specific details — namely their first and last names, employer, and job title — to enable profile identification on LinkedIn. Faced with the unavailability of a unique platform identifier and the cessation of LinkedIn’s API, this information was crucial for matching profiles accurately. We crafted a framework using PhantomBuster for ethical data extraction and a probabilistic string-matching technique to ensure precise linkage between survey responses and LinkedIn profiles. PhantomBuster, a cloud-based tool, efficiently scrapes dynamic content using JavaScript in a headless browser environment, sidestepping IP-related restrictions while adhering to website terms of service. It streamlines the data collection process. Identified profiles were subjected to an iterative probabilistic string matching, using respondent-provided metadata alongside supplementary data, to maximize the accuracy of matching the profiles to our survey participants. Relevance to Digital FootprintsThe described method advances digital footprint research in data collection and linkage. It automates the retrieval of vast online data sets; compiles information efficiently in an organized format; saves time and labour by mechanizing monotonous tasks; circumvents platform-imposed IP restrictions; and imposes fewer barriers to entry as it requires less technical skill than other scraping tools like Selenium. Conclusions & ImplicationsThis approach not only facilitates the precise identification and collection of LinkedIn profile data but also sets a precedent for ethical considerations in web scraping practices. By documenting this methodology, we aim to equip researchers with a scalable and replicable tool for future studies, enriching the analysis of labour market outcomes and the interplay between formal education, informal training, and professional success through the integration of LinkedIn and survey data.
{"title":"Augmenting Surveys with Social Media Data: A Probabilistic Framework for LinkedIn Data Linkage.","authors":"Paulo Matos Serodio, Tarek Al Baghal, Luke Sloan, Shujun Liu, C. Jessop","doi":"10.23889/ijpds.v9i4.2433","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2433","url":null,"abstract":"Introduction & BackgroundLinkedIn, with its extensive global network of over 900 million members across more than 200 countries, presents a unique repository for examining labour market dynamics, professional development, and the impact of social networking on employment opportunities. Despite its potential, LinkedIn's wealth of data on professional trajectories, skills, and labour market outcomes remains largely untapped in survey research due to challenges in data collection. \u0000Objectives & ApproachThis paper introduces a novel methodology for integrating LinkedIn data with survey responses using data from the fourteenth wave of the Innovation Panel (IP14) of Understanding Society: The UK Household Longitudinal Study (UKHLS), conducted in 2021. In IP14, we probed the extent of LinkedIn usage among the UK population and assessed users' willingness to link their LinkedIn profiles with their survey responses. Those consenting to link their accounts were asked for specific details — namely their first and last names, employer, and job title — to enable profile identification on LinkedIn. Faced with the unavailability of a unique platform identifier and the cessation of LinkedIn’s API, this information was crucial for matching profiles accurately. \u0000We crafted a framework using PhantomBuster for ethical data extraction and a probabilistic string-matching technique to ensure precise linkage between survey responses and LinkedIn profiles. PhantomBuster, a cloud-based tool, efficiently scrapes dynamic content using JavaScript in a headless browser environment, sidestepping IP-related restrictions while adhering to website terms of service. It streamlines the data collection process. Identified profiles were subjected to an iterative probabilistic string matching, using respondent-provided metadata alongside supplementary data, to maximize the accuracy of matching the profiles to our survey participants. \u0000Relevance to Digital FootprintsThe described method advances digital footprint research in data collection and linkage. It automates the retrieval of vast online data sets; compiles information efficiently in an organized format; saves time and labour by mechanizing monotonous tasks; circumvents platform-imposed IP restrictions; and imposes fewer barriers to entry as it requires less technical skill than other scraping tools like Selenium. \u0000Conclusions & ImplicationsThis approach not only facilitates the precise identification and collection of LinkedIn profile data but also sets a precedent for ethical considerations in web scraping practices. By documenting this methodology, we aim to equip researchers with a scalable and replicable tool for future studies, enriching the analysis of labour market outcomes and the interplay between formal education, informal training, and professional success through the integration of LinkedIn and survey data.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"107 51","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141362074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2417
Sophie V. Eastwood, Michele Orini, Andrew Wong, Scott T Chiesa, Joshua King-Robson, Jonathan Scott, Nishi Chaturvedi
Introduction & BackgroundEpochs of hyperglycaemia and hypoglycaemia may each increase risk of common chronic diseases and impair both cognitive and physical function even in people without diabetes. Older people may have greater frequency of adverse glycaemic excursions, partly due to disordered autonomic function and sleep quality. Data for older, non-diabetic people are however scant. Objectives & Approach1) To describe blood glucose variability (completed) and 2) its socio-demographic and lifestyle correlates in a predominantly non-diabetic cohort of older adults (planned). Participants were recruited during 2021-2023 from an English birth cohort (the 1946 National Survey for Health and Development Study). They wore a continuous glucose monitor (Freestyle libre Abbott), which measured circulating glucose four times/hour, for seven days. Summary statistics and time outside range (4.4-7.8mmol/L) were calculated. Further information on glycaemic excursions and day-to-day variability will be gleaned using the R “iglu” package. For all CGM summary and excursion measures, future analyses will investigate: associations with HbA1c, socio-demographics, body composition, physical activity, diet and alcohol use. Results will be stratified by sleep/ wake time periods estimated from simultaneous actigraphy (Philips Actiwatch Spectrum Plus). Sensitivity analyses will exclude people taking hypo/ hyperglycaemic medications and those with diabetes. Relevance to Digital FootprintsDerived summary measures can be used by future studies to give insights into glycaemic variability as a population-level risk factor. This work will bring together multiple data sources, i.e. from CGM, actigraphy and baseline cohort data. ResultsParticipants were aged 75-76 years, 45% female and 10% had diagnosed diabetes; median (IQR) BMI was 26.8 (24.6-29.2) kg/m2. CGM data from 308 participants was collected, for a median (IQR) of 6.9 (6.7-7.6) days. Average glucose over the recording period was 5.7mmol/L (5.3-6.2mmol/L), standard deviation was 1.0mmol/L (0.8-1.3mmol/L), time outside range was 12.8% (6.2-24.7%) and 16% of participants spent ≥1 hour/day above and ≥1 hour/day below range. Conclusions & ImplicationsCGM was feasible for this cohort of older adults, and demonstrated high levels of time outside range for a predominantly non-diabetic group. Future analysis will determine whether enhanced characterisation of glycaemic variability is a potentially more accurate tool for predicting future disease risk than isolated glucose measurements.
{"title":"Continuous glucose monitoring (CGM) for 308 older-age participants in an English birth cohort: variability and correlates","authors":"Sophie V. Eastwood, Michele Orini, Andrew Wong, Scott T Chiesa, Joshua King-Robson, Jonathan Scott, Nishi Chaturvedi","doi":"10.23889/ijpds.v9i4.2417","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2417","url":null,"abstract":"Introduction & BackgroundEpochs of hyperglycaemia and hypoglycaemia may each increase risk of common chronic diseases and impair both cognitive and physical function even in people without diabetes. Older people may have greater frequency of adverse glycaemic excursions, partly due to disordered autonomic function and sleep quality. Data for older, non-diabetic people are however scant. \u0000Objectives & Approach1) To describe blood glucose variability (completed) and 2) its socio-demographic and lifestyle correlates in a predominantly non-diabetic cohort of older adults (planned). Participants were recruited during 2021-2023 from an English birth cohort (the 1946 National Survey for Health and Development Study). They wore a continuous glucose monitor (Freestyle libre Abbott), which measured circulating glucose four times/hour, for seven days. Summary statistics and time outside range (4.4-7.8mmol/L) were calculated. Further information on glycaemic excursions and day-to-day variability will be gleaned using the R “iglu” package. For all CGM summary and excursion measures, future analyses will investigate: associations with HbA1c, socio-demographics, body composition, physical activity, diet and alcohol use. Results will be stratified by sleep/ wake time periods estimated from simultaneous actigraphy (Philips Actiwatch Spectrum Plus). Sensitivity analyses will exclude people taking hypo/ hyperglycaemic medications and those with diabetes. \u0000Relevance to Digital FootprintsDerived summary measures can be used by future studies to give insights into glycaemic variability as a population-level risk factor. This work will bring together multiple data sources, i.e. from CGM, actigraphy and baseline cohort data. \u0000ResultsParticipants were aged 75-76 years, 45% female and 10% had diagnosed diabetes; median (IQR) BMI was 26.8 (24.6-29.2) kg/m2. CGM data from 308 participants was collected, for a median (IQR) of 6.9 (6.7-7.6) days. Average glucose over the recording period was 5.7mmol/L (5.3-6.2mmol/L), standard deviation was 1.0mmol/L (0.8-1.3mmol/L), time outside range was 12.8% (6.2-24.7%) and 16% of participants spent ≥1 hour/day above and ≥1 hour/day below range. \u0000Conclusions & ImplicationsCGM was feasible for this cohort of older adults, and demonstrated high levels of time outside range for a predominantly non-diabetic group. Future analysis will determine whether enhanced characterisation of glycaemic variability is a potentially more accurate tool for predicting future disease risk than isolated glucose measurements.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" October","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141364401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2437
Daniel Joinson, Oliver Davis, Edwin Simpson
Introduction & BackgroundAn estimated 4.95 billion people used social media in 2023, with the average user active on around seven platforms for over two hours per day. This widespread use leads to abundant digital footprint data around interactions with social media. These data can be collected continuously and reflect real behaviour of users in naturalistic settings. These strengths have led researchers to propose the use of social media data in digital phenotyping, where digital footprints can be used to quantify and predict health conditions. Mental health assessment in particular could benefit, as existing approaches, such as self-report questionnaires and inpatient assessment, are unable to perform the real-time monitoring that digital phenotyping could potentially achieve. Digital phenotyping models for mental health require careful consideration of what aspects of social media data to include. Including all data users generate could result in models that are overfitted and difficult to explain. Studies are required that explore the relationship between specific aspects of social media data, such as the time course of expressed emotion, and gold-standard measures of mental health. Objectives & ApproachWith participants’ consent, we linked Twitter data to self-reported measures of mental health from the Avon Longitudinal Study of Parents and Children. We performed sentiment analysis using three different approaches—LIWC, VADER and RoBERTa—to estimate the amount, variability and instability of positive and negative emotional content in each participant’s Tweets over a one-year period. We explored the association between these measures of emotion expression and self-reported scores of depressive symptoms, anxiety symptoms and wellbeing. These mental health measures are the Short Mood and Feelings Questionnaire, the Generalized Anxiety 7 and the Warwick Edinburgh Mental Wellbeing Scale. Relevance to Digital FootprintsOur research is highly relevant to digital footprint research, as it involves the use of digital footprint data (i.e. Twitter data) to predict mental health outcomes. Conclusions & ImplicationsThe results of our analysis will inform the development of digital footprint based phenotyping for mental health that could one day provide information to supplement clinical assessments.
简介与背景 据估计,2023 年有 49.5 亿人使用社交媒体,平均每个用户每天在七个左右的平台上活跃两个多小时。社交媒体的广泛使用产生了大量与社交媒体互动相关的数字足迹数据。这些数据可以持续收集,并能反映用户在自然环境中的真实行为。这些优势促使研究人员提出在数字表型中使用社交媒体数据,即数字足迹可用于量化和预测健康状况。心理健康评估尤其可以从中受益,因为现有的方法,如自我报告问卷和住院病人评估,都无法进行实时监测,而数字表型有可能实现这一点。心理健康数字表型模型需要仔细考虑社交媒体数据的哪些方面。如果将用户生成的所有数据都包括在内,可能会导致模型拟合过度,难以解释。我们需要开展研究,探索社交媒体数据的特定方面(如表达情绪的时间过程)与心理健康黄金标准测量之间的关系。目标与方法在征得参与者同意后,我们将推特数据与雅芳父母与子女纵向研究(Avon Longitudinal Study of Parents and Children)中自我报告的心理健康指标联系起来。我们使用三种不同的方法(LIWC、VADER 和 RoBERTa)进行了情感分析,以估算每位参与者一年内推文中积极和消极情绪内容的数量、可变性和不稳定性。我们探讨了这些情绪表达测量与自我报告的抑郁症状、焦虑症状和幸福感得分之间的关联。这些心理健康测量方法包括简短情绪和感觉问卷、广泛性焦虑 7 和沃里克-爱丁堡心理健康量表。与数字足迹的相关性我们的研究与数字足迹研究高度相关,因为它涉及使用数字足迹数据(即推特数据)来预测心理健康结果。结论与启示我们的分析结果将为基于数字足迹的心理健康表型的开发提供信息,有朝一日可以为临床评估提供补充信息。
{"title":"The dynamics of emotion expression on Twitter and mental health in a UK longitudinal study","authors":"Daniel Joinson, Oliver Davis, Edwin Simpson","doi":"10.23889/ijpds.v9i4.2437","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2437","url":null,"abstract":"Introduction & BackgroundAn estimated 4.95 billion people used social media in 2023, with the average user active on around seven platforms for over two hours per day. This widespread use leads to abundant digital footprint data around interactions with social media. These data can be collected continuously and reflect real behaviour of users in naturalistic settings. These strengths have led researchers to propose the use of social media data in digital phenotyping, where digital footprints can be used to quantify and predict health conditions. Mental health assessment in particular could benefit, as existing approaches, such as self-report questionnaires and inpatient assessment, are unable to perform the real-time monitoring that digital phenotyping could potentially achieve. \u0000Digital phenotyping models for mental health require careful consideration of what aspects of social media data to include. Including all data users generate could result in models that are overfitted and difficult to explain. Studies are required that explore the relationship between specific aspects of social media data, such as the time course of expressed emotion, and gold-standard measures of mental health. \u0000Objectives & ApproachWith participants’ consent, we linked Twitter data to self-reported measures of mental health from the Avon Longitudinal Study of Parents and Children. We performed sentiment analysis using three different approaches—LIWC, VADER and RoBERTa—to estimate the amount, variability and instability of positive and negative emotional content in each participant’s Tweets over a one-year period. We explored the association between these measures of emotion expression and self-reported scores of depressive symptoms, anxiety symptoms and wellbeing. These mental health measures are the Short Mood and Feelings Questionnaire, the Generalized Anxiety 7 and the Warwick Edinburgh Mental Wellbeing Scale. \u0000Relevance to Digital FootprintsOur research is highly relevant to digital footprint research, as it involves the use of digital footprint data (i.e. Twitter data) to predict mental health outcomes. \u0000Conclusions & ImplicationsThe results of our analysis will inform the development of digital footprint based phenotyping for mental health that could one day provide information to supplement clinical assessments.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141365487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2420
Neo Poon, Claire Haworth, Elizabeth Dolan, A. Skatova
Introduction & BackgroundChronic pain is considered a priority in healthcare and a threat to well-being across the globe, it is thus crucial to accurately measure the national levels of pain conditions and their impacts on workplace productivity and well-being. Chronic pain has traditionally been studied in isolation with either self-reported survey data or standalone shopping records. The former are limited in scale and can be marred by response biases, while the latter lack ‘ground truths’: what research teams can measure are usually the purchase patterns of pain relief products, but neither the severity nor types of pain conditions. Objectives & ApproachData donation tools offer a novel approach to study chronic pain by linking the two aspects and establish statistical relationships between medicine consumptions and the multiple facets of pain experience. In a survey, we asked participants (N = 953) to share their loyalty card data with us, which is made possible with the data portability tool provided by Tesco (i.e., the largest supermarket chain in the United Kingdom) as part of the General Data Protection Regulation (GDPR). Based on questions adopted from popular inventories used in health research (e.g., EQ5D Health States, ONS4 Well-being, WEMWBS scales), we also asked participants to report the details of their pain conditions, hours of employment, and both general and mental health states. This allowed us to associate chronic pain - both subjective and objective (i.e., reflected by medicine consumption) - with its economic and personal consequences. Data collection was conducted via research panel providers, thus should approximate national representativeness. Relevance to Digital FootprintsThis work links digital footprints data donated by individuals to self-reported survey data, also develops an infrastructure for these data to be collected and safely stored. Conclusions & ImplicationsOne key value of this project is to pioneer a measure of chronic pain that can be applied to transactional records that are much bigger in scale in future analytic works. Our research team has access to an array of different digital footprints data, including longitudinal transactional data provided by a major pharmacy chain (~20 million customers and ~429 million baskets). In order to utilise these data to associate them with regional workplace productivity measures and well-being data released by the Office for National Statistics, a metric must be defined to extract the prevalence of chronic pain from shopping data, which is informed by the patterns found by the data donation project.
{"title":"Studying Health and Illness Experience using Linked Data (SHIELD): Empowering customers to donate shopping data for chronic pain research","authors":"Neo Poon, Claire Haworth, Elizabeth Dolan, A. Skatova","doi":"10.23889/ijpds.v9i4.2420","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2420","url":null,"abstract":"Introduction & BackgroundChronic pain is considered a priority in healthcare and a threat to well-being across the globe, it is thus crucial to accurately measure the national levels of pain conditions and their impacts on workplace productivity and well-being.\u0000Chronic pain has traditionally been studied in isolation with either self-reported survey data or standalone shopping records. The former are limited in scale and can be marred by response biases, while the latter lack ‘ground truths’: what research teams can measure are usually the purchase patterns of pain relief products, but neither the severity nor types of pain conditions.\u0000Objectives & ApproachData donation tools offer a novel approach to study chronic pain by linking the two aspects and establish statistical relationships between medicine consumptions and the multiple facets of pain experience. In a survey, we asked participants (N = 953) to share their loyalty card data with us, which is made possible with the data portability tool provided by Tesco (i.e., the largest supermarket chain in the United Kingdom) as part of the General Data Protection Regulation (GDPR). Based on questions adopted from popular inventories used in health research (e.g., EQ5D Health States, ONS4 Well-being, WEMWBS scales), we also asked participants to report the details of their pain conditions, hours of employment, and both general and mental health states. This allowed us to associate chronic pain - both subjective and objective (i.e., reflected by medicine consumption) - with its economic and personal consequences. Data collection was conducted via research panel providers, thus should approximate national representativeness.\u0000Relevance to Digital FootprintsThis work links digital footprints data donated by individuals to self-reported survey data, also develops an infrastructure for these data to be collected and safely stored.\u0000Conclusions & ImplicationsOne key value of this project is to pioneer a measure of chronic pain that can be applied to transactional records that are much bigger in scale in future analytic works. Our research team has access to an array of different digital footprints data, including longitudinal transactional data provided by a major pharmacy chain (~20 million customers and ~429 million baskets). In order to utilise these data to associate them with regional workplace productivity measures and well-being data released by the Office for National Statistics, a metric must be defined to extract the prevalence of chronic pain from shopping data, which is informed by the patterns found by the data donation project.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":" 42","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141366174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.23889/ijpds.v9i4.2436
Nathan Bourne, Michael Spencer, Oliver Berry
Introduction & BackgroundFinancial transaction data are highly valuable sources of digital footprints data for behavioural and economic research, but to properly create impact we must closely consider their limitations. Financial institutions hold a wealth of consumer data with untapped potential for community intelligence. These datasets combine excellent coverage with extremely granular information on consumer finances, income and spending, yet these institutions face great challenges in leveraging this data for social good. Smart Data Foundry is a university-owned, non-profit organisation that facilitates safe access to these datasets for researchers and provides insights to enable government bodies to tackle today's major challenges including the cost-of-living crisis and climate change. Objectives & ApproachWe will explore the opportunities afforded by these datasets for social and economic research. For example, using pseudonymised individual consumer banking data from NatWest Group, we have developed metrics for understanding income volatility and economic insecurity in collaboration with the Joseph Rowntree Foundation. We can also use these data to study consumer spending patterns and responses to economic changes such as interest rate rises and the net zero transition. We will assess the limitations of the data including issues of representativeness, bias, and missing data, and describe methods and mitigations to account for these challenges. We also discuss the barriers to accessing this type of data, in both relationship development with data partners, and privacy and governance concerns. Relevance to Digital FootprintsIndividual level customer transaction data provides a rich and novel form of digital footprint for behavioural and economic analyses. Every point of income or expenditure is recorded in a uniquely valuable digital footprint by financial institutions. These can provide a variety of insights, such as responses to macroeconomic shocks across demographic sets, emerging areas of financial distress, and help us better understand the drivers and risks of financial vulnerability. In both its aggregated and individual form, the data can provide an additional layer of understanding for trends we may see in other data, such as health or administrative data. Conclusions & ImplicationsHaving addressed the challenges of data access and data quality, we demonstrate that consumer banking data is an incredibly valuable form of digital footprints data, capturing key information on consumer behaviour. We conclude with a call for further research to develop use cases of this data for social good.
导言与背景金融交易数据是行为和经济研究中极具价值的数字足迹数据来源,但要产生适当的影响,我们必须仔细考虑其局限性。金融机构拥有丰富的消费者数据,这些数据在社区情报方面具有尚未开发的潜力。这些数据集结合了极好的覆盖面和有关消费者财务、收入和支出的极为细化的信息,但这些机构在利用这些数据为社会造福方面却面临着巨大的挑战。智能数据基金会(Smart Data Foundry)是一家由大学拥有的非营利性组织,它为研究人员安全访问这些数据集提供便利,并为政府机构应对生活成本危机和气候变化等当今重大挑战提供见解。目标和方法我们将探索这些数据集为社会和经济研究带来的机遇。例如,利用 NatWest 集团提供的化名个人消费者银行数据,我们与约瑟夫-罗特里基金会合作开发了用于了解收入波动性和经济不安全性的指标。我们还可以利用这些数据研究消费者的消费模式以及对利率上升和净零过渡等经济变化的反应。我们将评估数据的局限性,包括代表性、偏差和数据缺失等问题,并介绍应对这些挑战的方法和缓解措施。我们还将讨论获取此类数据的障碍,包括与数据合作伙伴的关系发展以及隐私和管理问题。与数字足迹的相关性个人层面的客户交易数据为行为和经济分析提供了丰富而新颖的数字足迹形式。每个收入或支出点都被金融机构记录在独一无二的宝贵数字足迹中。这些数据可以提供各种见解,如不同人口群体对宏观经济冲击的反应、新出现的金融困境领域,并帮助我们更好地了解金融脆弱性的驱动因素和风险。无论是汇总数据还是个体数据,这些数据都能为我们了解其他数据(如健康或行政数据)中的趋势提供额外的视角。结论与启示在解决了数据访问和数据质量的难题之后,我们证明了消费者银行数据是一种非常有价值的数字足迹数据形式,可以捕捉到消费者行为的关键信息。最后,我们呼吁开展进一步的研究,开发此类数据的社会公益用例。
{"title":"Challenges in access, representativeness, and bias in smart financial data relating to income volatility and economic insecurity.","authors":"Nathan Bourne, Michael Spencer, Oliver Berry","doi":"10.23889/ijpds.v9i4.2436","DOIUrl":"https://doi.org/10.23889/ijpds.v9i4.2436","url":null,"abstract":"Introduction & BackgroundFinancial transaction data are highly valuable sources of digital footprints data for behavioural and economic research, but to properly create impact we must closely consider their limitations. \u0000Financial institutions hold a wealth of consumer data with untapped potential for community intelligence. These datasets combine excellent coverage with extremely granular information on consumer finances, income and spending, yet these institutions face great challenges in leveraging this data for social good. Smart Data Foundry is a university-owned, non-profit organisation that facilitates safe access to these datasets for researchers and provides insights to enable government bodies to tackle today's major challenges including the cost-of-living crisis and climate change. \u0000Objectives & ApproachWe will explore the opportunities afforded by these datasets for social and economic research. For example, using pseudonymised individual consumer banking data from NatWest Group, we have developed metrics for understanding income volatility and economic insecurity in collaboration with the Joseph Rowntree Foundation. We can also use these data to study consumer spending patterns and responses to economic changes such as interest rate rises and the net zero transition. We will assess the limitations of the data including issues of representativeness, bias, and missing data, and describe methods and mitigations to account for these challenges. We also discuss the barriers to accessing this type of data, in both relationship development with data partners, and privacy and governance concerns. \u0000Relevance to Digital FootprintsIndividual level customer transaction data provides a rich and novel form of digital footprint for behavioural and economic analyses. Every point of income or expenditure is recorded in a uniquely valuable digital footprint by financial institutions. These can provide a variety of insights, such as responses to macroeconomic shocks across demographic sets, emerging areas of financial distress, and help us better understand the drivers and risks of financial vulnerability. In both its aggregated and individual form, the data can provide an additional layer of understanding for trends we may see in other data, such as health or administrative data. \u0000Conclusions & ImplicationsHaving addressed the challenges of data access and data quality, we demonstrate that consumer banking data is an incredibly valuable form of digital footprints data, capturing key information on consumer behaviour. We conclude with a call for further research to develop use cases of this data for social good.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"122 49","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141361623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}