Although it has become easier for individuals to track their personal health data (e.g., heart rate, step count, and nutrient intake data), there is still a wide chasm between the collection of data and the generation of meaningful summaries to help users better understand what their data means to them. With an increased comprehension of their data, users will be able to act upon the newfound information and work toward striving closer to their health goals. We aim to bridge the gap between data collection and summary generation by mining the data for interesting behavioral findings that may provide hints about a user’s tendencies. Our focus is on improving the explainability of temporal personal health data via a set of informative summary templates, or “protoforms.” These protoforms span both evaluation-based summaries that help users evaluate their health goals and pattern-based summaries that explain their implicit behaviors. In addition to individual-level summaries, the protoforms we use are also designed for population-level summaries. We apply our approach to generate summaries (both univariate and multivariate) from real user health data and show that the summaries our system generates are both interesting and useful.
{"title":"A Framework for Generating Summaries from Temporal Personal Health Data","authors":"Jon Harris, Ching-Hua Chen, Mohammed J. Zaki","doi":"10.1145/3448672","DOIUrl":"https://doi.org/10.1145/3448672","url":null,"abstract":"Although it has become easier for individuals to track their personal health data (e.g., heart rate, step count, and nutrient intake data), there is still a wide chasm between the collection of data and the generation of meaningful summaries to help users better understand what their data means to them. With an increased comprehension of their data, users will be able to act upon the newfound information and work toward striving closer to their health goals. We aim to bridge the gap between data collection and summary generation by mining the data for interesting behavioral findings that may provide hints about a user’s tendencies. Our focus is on improving the explainability of temporal personal health data via a set of informative summary templates, or “protoforms.” These protoforms span both evaluation-based summaries that help users evaluate their health goals and pattern-based summaries that explain their implicit behaviors. In addition to individual-level summaries, the protoforms we use are also designed for population-level summaries. We apply our approach to generate summaries (both univariate and multivariate) from real user health data and show that the summaries our system generates are both interesting and useful.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"2 1","pages":"1 - 43"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3448672","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47393630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenlong Wu, J. Keller, M. Skubic, M. Popescu, Kari R. Lane
The rapid aging of the population worldwide requires increased attention from healthcare providers and the entire society. For the elderly to live independently, many health issues related to old age, such as frailty and risk of falling, need increased attention and monitoring. When monitoring daily routines for older adults, it is desirable to detect the early signs of health changes before serious health events, such as hospitalizations, happen so that timely and adequate preventive care may be provided. By deploying multi-sensor systems in homes of the elderly, we can track trajectories of daily behaviors in a feature space defined using the sensor data. In this article, we investigate a methodology for tracking the evolution of the behavior trajectories over long periods (years) using high-dimensional streaming clustering and provide very early indicators of changes in health. If we assume that habitual behaviors correspond to clusters in feature space and diseases produce a change in behavior, albeit not highly specific, tracking trajectory deviations can provide hints of early illness. Retrospectively, we visualize the streaming clustering results and track how the behavior clusters evolve in feature space with the help of two dimension-reduction algorithms: Principal Component Analysis and t-distributed Stochastic Neighbor Embedding. Moreover, our tracking algorithm in the original high-dimensional feature space generates early health warning alerts if a negative trend is detected in the behavior trajectory. We validated our algorithm on synthetic data and tested it on a pilot dataset of four TigerPlace residents monitored with a collection of motion, bed, and depth sensors over 10 years. We used the TigerPlace electronic health records to understand the residents’ behavior patterns and to evaluate the health warnings generated by our algorithm. The results obtained on the TigerPlace dataset show that most of the warnings produced by our algorithm can be linked to health events documented in the electronic health records, providing strong support for a prospective deployment of the approach.
{"title":"Early Detection of Health Changes in the Elderly Using In-Home Multi-Sensor Data Streams","authors":"Wenlong Wu, J. Keller, M. Skubic, M. Popescu, Kari R. Lane","doi":"10.1145/3448671","DOIUrl":"https://doi.org/10.1145/3448671","url":null,"abstract":"The rapid aging of the population worldwide requires increased attention from healthcare providers and the entire society. For the elderly to live independently, many health issues related to old age, such as frailty and risk of falling, need increased attention and monitoring. When monitoring daily routines for older adults, it is desirable to detect the early signs of health changes before serious health events, such as hospitalizations, happen so that timely and adequate preventive care may be provided. By deploying multi-sensor systems in homes of the elderly, we can track trajectories of daily behaviors in a feature space defined using the sensor data. In this article, we investigate a methodology for tracking the evolution of the behavior trajectories over long periods (years) using high-dimensional streaming clustering and provide very early indicators of changes in health. If we assume that habitual behaviors correspond to clusters in feature space and diseases produce a change in behavior, albeit not highly specific, tracking trajectory deviations can provide hints of early illness. Retrospectively, we visualize the streaming clustering results and track how the behavior clusters evolve in feature space with the help of two dimension-reduction algorithms: Principal Component Analysis and t-distributed Stochastic Neighbor Embedding. Moreover, our tracking algorithm in the original high-dimensional feature space generates early health warning alerts if a negative trend is detected in the behavior trajectory. We validated our algorithm on synthetic data and tested it on a pilot dataset of four TigerPlace residents monitored with a collection of motion, bed, and depth sensors over 10 years. We used the TigerPlace electronic health records to understand the residents’ behavior patterns and to evaluate the health warnings generated by our algorithm. The results obtained on the TigerPlace dataset show that most of the warnings produced by our algorithm can be linked to health events documented in the electronic health records, providing strong support for a prospective deployment of the approach.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"2 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3448671","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49527756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amir Hosein Afandizadeh Zargari, S. A. H. Aqajari, Hadi Khodabandeh, A. Rahmani, Fadi J. Kurdahi
A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique widely used in the healthcare domain to extract valuable health-related information, e.g., heart rate variability, blood pressure, and respiration rate. PPG signals can easily be collected continuously and remotely using portable wearable devices. However, these measuring devices are vulnerable to motion artifacts caused by daily life activities. The most common ways to eliminate motion artifacts use extra accelerometer sensors, which suffer from two limitations: (i) high power consumption, and (ii) the need to integrate an accelerometer sensor in a wearable device (which is not required in certain wearables). This paper proposes a low-power non-accelerometer-based PPG motion artifacts removal method outperforming the accuracy of the existing methods. We use Cycle Generative Adversarial Network to reconstruct clean PPG signals from noisy PPG signals. Our novel machine-learning-based technique achieves 9.5 times improvement in motion artifact removal compared to the state-of-the-art without using extra sensors such as an accelerometer, which leads to 45% improvement in energy efficiency.
{"title":"An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN","authors":"Amir Hosein Afandizadeh Zargari, S. A. H. Aqajari, Hadi Khodabandeh, A. Rahmani, Fadi J. Kurdahi","doi":"10.1145/3563949","DOIUrl":"https://doi.org/10.1145/3563949","url":null,"abstract":"A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique widely used in the healthcare domain to extract valuable health-related information, e.g., heart rate variability, blood pressure, and respiration rate. PPG signals can easily be collected continuously and remotely using portable wearable devices. However, these measuring devices are vulnerable to motion artifacts caused by daily life activities. The most common ways to eliminate motion artifacts use extra accelerometer sensors, which suffer from two limitations: (i) high power consumption, and (ii) the need to integrate an accelerometer sensor in a wearable device (which is not required in certain wearables). This paper proposes a low-power non-accelerometer-based PPG motion artifacts removal method outperforming the accuracy of the existing methods. We use Cycle Generative Adversarial Network to reconstruct clean PPG signals from noisy PPG signals. Our novel machine-learning-based technique achieves 9.5 times improvement in motion artifact removal compared to the state-of-the-art without using extra sensors such as an accelerometer, which leads to 45% improvement in energy efficiency.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"4 1","pages":"1 - 14"},"PeriodicalIF":0.0,"publicationDate":"2021-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45649552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Colm Sweeney, C. Potts, E. Ennis, Raymond R. Bond, M. Mulvenna, S. O’neill, M. Malcolm, L. Kuosmanen, C. Kostenius, A. Vakaloudis, G. Mcconvey, Robin Turkington, D. Hanna, H. Nieminen, A. Vartiainen, A. Robertson, M. McTear
The objective of this study was to understand the attitudes of professionals who work in mental health regarding the use of conversational user interfaces, or chatbots, to support people’s mental health and wellbeing. This study involves an online survey to measure the awareness and attitudes of mental healthcare professionals and experts. The findings from this survey show that more than half of the participants in the survey agreed that there are benefits associated with mental healthcare chatbots (65%, p < 0.01). The perceived importance of chatbots was also relatively high (74%, p < 0.01), with more than three-quarters (79%, p < 0.01) of respondents agreeing that mental healthcare chatbots could help their clients better manage their own health, yet chatbots are overwhelmingly perceived as not adequately understanding or displaying human emotion (86%, p < 0.01). Even though the level of personal experience with chatbots among professionals and experts in mental health has been quite low, this study shows that where they have been used, the experience has been mostly satisfactory. This study has found that as years of experience increased, there was a corresponding increase in the belief that healthcare chatbots could help clients better manage their own mental health.
{"title":"Can Chatbots Help Support a Person’s Mental Health? Perceptions and Views from Mental Healthcare Professionals and Experts","authors":"Colm Sweeney, C. Potts, E. Ennis, Raymond R. Bond, M. Mulvenna, S. O’neill, M. Malcolm, L. Kuosmanen, C. Kostenius, A. Vakaloudis, G. Mcconvey, Robin Turkington, D. Hanna, H. Nieminen, A. Vartiainen, A. Robertson, M. McTear","doi":"10.1145/3453175","DOIUrl":"https://doi.org/10.1145/3453175","url":null,"abstract":"The objective of this study was to understand the attitudes of professionals who work in mental health regarding the use of conversational user interfaces, or chatbots, to support people’s mental health and wellbeing. This study involves an online survey to measure the awareness and attitudes of mental healthcare professionals and experts. The findings from this survey show that more than half of the participants in the survey agreed that there are benefits associated with mental healthcare chatbots (65%, p < 0.01). The perceived importance of chatbots was also relatively high (74%, p < 0.01), with more than three-quarters (79%, p < 0.01) of respondents agreeing that mental healthcare chatbots could help their clients better manage their own health, yet chatbots are overwhelmingly perceived as not adequately understanding or displaying human emotion (86%, p < 0.01). Even though the level of personal experience with chatbots among professionals and experts in mental health has been quite low, this study shows that where they have been used, the experience has been mostly satisfactory. This study has found that as years of experience increased, there was a corresponding increase in the belief that healthcare chatbots could help clients better manage their own mental health.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"2 1","pages":"1 - 15"},"PeriodicalIF":0.0,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3453175","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41777350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In today’s connected society, many people rely on mHealth and self-tracking (ST) technology to help them adopt healthier habits with a focus on breaking their sedentary lifestyle and staying fit. However, there is scarce evidence of such technological interventions’ effectiveness, and there are no standardized methods to evaluate their impact on people’s physical activity and health. This work aims to help ST practitioners and researchers by empowering them with systematic guidelines and a framework for designing and evaluating technological interventions to facilitate health behavior change and user engagement, focusing on increasing physical activity and decreasing sedentariness. To this end, we conduct a literature review of 129 papers between 2008 and 2022, which identifies the core ST design principles and their efficacy, as well as the most comprehensive list to date of user engagement evaluation metrics for ST. Based on the review’s findings, we propose PAST SELF, a framework to guide the design and evaluation of ST technology that has potential applications in industrial and scientific settings. Finally, to facilitate researchers and practitioners, we complement this article with an open corpus and an online, adaptive exploration tool for the PAST SELF data.
{"title":"14 Years of Self-Tracking Technology for mHealth—Literature Review: Lessons Learned and the PAST SELF Framework","authors":"Sofia Yfantidou, Pavlos Sermpezis, A. Vakali","doi":"10.1145/3592621","DOIUrl":"https://doi.org/10.1145/3592621","url":null,"abstract":"In today’s connected society, many people rely on mHealth and self-tracking (ST) technology to help them adopt healthier habits with a focus on breaking their sedentary lifestyle and staying fit. However, there is scarce evidence of such technological interventions’ effectiveness, and there are no standardized methods to evaluate their impact on people’s physical activity and health. This work aims to help ST practitioners and researchers by empowering them with systematic guidelines and a framework for designing and evaluating technological interventions to facilitate health behavior change and user engagement, focusing on increasing physical activity and decreasing sedentariness. To this end, we conduct a literature review of 129 papers between 2008 and 2022, which identifies the core ST design principles and their efficacy, as well as the most comprehensive list to date of user engagement evaluation metrics for ST. Based on the review’s findings, we propose PAST SELF, a framework to guide the design and evaluation of ST technology that has potential applications in industrial and scientific settings. Finally, to facilitate researchers and practitioners, we complement this article with an open corpus and an online, adaptive exploration tool for the PAST SELF data.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"4 1","pages":"1 - 43"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41734916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electronic consent (e-consent) has the potential to solve many paper-based consent approaches. Existing approaches, however, face challenges regarding privacy and security. This literature review aims to provide an overview of privacy and security challenges and requirements proposed by papers discussing e-consent implementations, as well as the manner in which state-of-the-art solutions address them. We conducted a systematic literature search using ACM Digital Library, IEEE Xplore, and PubMed Central. We included papers providing comprehensive discussions of one or more technical aspects of e-consent systems. Thirty-one papers met our inclusion criteria. Two distinct topics were identified, the first being discussions of e-consent representations and the second being implementations of e-consent in data sharing systems. The main challenge for e-consent representations is gathering the requirements for a “valid” consent. For the implementation papers, many provided some requirements but none provided a comprehensive overview. Blockchain is identified as a solution to transparency and trust issues in traditional client-server systems, but several challenges hinder it from being applied in practice. E-consent has the potential to grant data subjects control over their data. However, there is no agreed-upon set of security and privacy requirements that must be addressed by an e-consent platform. Therefore, security- and privacy-by-design techniques should be an essential part of the development lifecycle for such a platform.
{"title":"Security and Privacy Requirements for Electronic Consent","authors":"Stef Verreydt, Koen Yskout, W. Joosen","doi":"10.1145/3433995","DOIUrl":"https://doi.org/10.1145/3433995","url":null,"abstract":"Electronic consent (e-consent) has the potential to solve many paper-based consent approaches. Existing approaches, however, face challenges regarding privacy and security. This literature review aims to provide an overview of privacy and security challenges and requirements proposed by papers discussing e-consent implementations, as well as the manner in which state-of-the-art solutions address them. We conducted a systematic literature search using ACM Digital Library, IEEE Xplore, and PubMed Central. We included papers providing comprehensive discussions of one or more technical aspects of e-consent systems. Thirty-one papers met our inclusion criteria. Two distinct topics were identified, the first being discussions of e-consent representations and the second being implementations of e-consent in data sharing systems. The main challenge for e-consent representations is gathering the requirements for a “valid” consent. For the implementation papers, many provided some requirements but none provided a comprehensive overview. Blockchain is identified as a solution to transparency and trust issues in traditional client-server systems, but several challenges hinder it from being applied in practice. E-consent has the potential to grant data subjects control over their data. However, there is no agreed-upon set of security and privacy requirements that must be addressed by an e-consent platform. Therefore, security- and privacy-by-design techniques should be an essential part of the development lifecycle for such a platform.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"2 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2021-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3433995","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41760627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To combat the ongoing Covid-19 pandemic, many new ways have been proposed on how to automate the process of finding infected people, also called contact tracing. A special focus was put on preserving the privacy of users. Bluetooth Low Energy as base technology has the most promising properties, so this survey focuses on automated contact tracing techniques using Bluetooth Low Energy. We define multiple classes of methods and identify two major groups: systems that rely on a server for finding new infections and systems that distribute this process. Existing approaches are systematically classified regarding security and privacy criteria.
{"title":"A Survey of Automatic Contact Tracing Approaches Using Bluetooth Low Energy","authors":"Leonie Reichert, Samuel Brack, B. Scheuermann","doi":"10.1145/3444847","DOIUrl":"https://doi.org/10.1145/3444847","url":null,"abstract":"To combat the ongoing Covid-19 pandemic, many new ways have been proposed on how to automate the process of finding infected people, also called contact tracing. A special focus was put on preserving the privacy of users. Bluetooth Low Energy as base technology has the most promising properties, so this survey focuses on automated contact tracing techniques using Bluetooth Low Energy. We define multiple classes of methods and identify two major groups: systems that rely on a server for finding new infections and systems that distribute this process. Existing approaches are systematically classified regarding security and privacy criteria.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"2 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3444847","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44943333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mental state assessment by analysing user-generated content is a field that has recently attracted considerable attention. Today, many people are increasingly utilising online social media platforms to share their feelings and moods. This provides a unique opportunity for researchers and health practitioners to proactively identify linguistic markers or patterns that correlate with mental disorders such as depression, schizophrenia or suicide behaviour. This survey describes and reviews the approaches that have been proposed for mental state assessment and identification of disorders using online digital records. The presented studies are organised according to the assessment technology and the feature extraction process conducted. We also present a series of studies which explore different aspects of the language and behaviour of individuals suffering from mental disorders, and discuss various aspects related to the development of experimental frameworks. Furthermore, ethical considerations regarding the treatment of individuals’ data are outlined. The main contributions of this survey are a comprehensive analysis of the proposed approaches for online mental state assessment on social media, a structured categorisation of the methods according to their design principles, lessons learnt over the years and a discussion on possible avenues for future research.
{"title":"A Survey of Computational Methods for Online Mental State Assessment on Social Media","authors":"E. A. Ríssola, D. Losada, F. Crestani","doi":"10.1145/3437259","DOIUrl":"https://doi.org/10.1145/3437259","url":null,"abstract":"Mental state assessment by analysing user-generated content is a field that has recently attracted considerable attention. Today, many people are increasingly utilising online social media platforms to share their feelings and moods. This provides a unique opportunity for researchers and health practitioners to proactively identify linguistic markers or patterns that correlate with mental disorders such as depression, schizophrenia or suicide behaviour. This survey describes and reviews the approaches that have been proposed for mental state assessment and identification of disorders using online digital records. The presented studies are organised according to the assessment technology and the feature extraction process conducted. We also present a series of studies which explore different aspects of the language and behaviour of individuals suffering from mental disorders, and discuss various aspects related to the development of experimental frameworks. Furthermore, ethical considerations regarding the treatment of individuals’ data are outlined. The main contributions of this survey are a comprehensive analysis of the proposed approaches for online mental state assessment on social media, a structured categorisation of the methods according to their design principles, lessons learnt over the years and a discussion on possible avenues for future research.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"2 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3437259","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42482012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Minot, N. Cheney, Marc E. Maier, Danne C. Elbers, C. Danforth, P. Dodds
Medical systems in general, and patient treatment decisions and outcomes in particular, can be affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language models—statistical estimates of the relationships between concepts derived from distant reading of corpora. Building on this work, we investigate how differences in gender-specific word frequency distributions and language models interact with regards to bias. We identify and remove gendered language from two clinical-note datasets and describe a new debiasing procedure using BERT-based gender classifiers. We show minimal degradation in health condition classification tasks for low- to medium-levels of dataset bias removal via data augmentation. Finally, we compare the bias semantically encoded in the language models with the bias empirically observed in health records. This work outlines an interpretable approach for using data augmentation to identify and reduce biases in natural language processing pipelines.
{"title":"Interpretable Bias Mitigation for Textual Data: Reducing Genderization in Patient Notes While Maintaining Classification Performance","authors":"J. Minot, N. Cheney, Marc E. Maier, Danne C. Elbers, C. Danforth, P. Dodds","doi":"10.1145/3524887","DOIUrl":"https://doi.org/10.1145/3524887","url":null,"abstract":"Medical systems in general, and patient treatment decisions and outcomes in particular, can be affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language models—statistical estimates of the relationships between concepts derived from distant reading of corpora. Building on this work, we investigate how differences in gender-specific word frequency distributions and language models interact with regards to bias. We identify and remove gendered language from two clinical-note datasets and describe a new debiasing procedure using BERT-based gender classifiers. We show minimal degradation in health condition classification tasks for low- to medium-levels of dataset bias removal via data augmentation. Finally, we compare the bias semantically encoded in the language models with the bias empirically observed in health records. This work outlines an interpretable approach for using data augmentation to identify and reduce biases in natural language processing pipelines.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"240 1","pages":"1 - 41"},"PeriodicalIF":0.0,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41264692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Momin Al Aziz, Shahin Kamali, N. Mohammed, Xiaoqian Jiang
Digitization of healthcare records contributed to a large volume of functional scientific data that can help researchers to understand the behaviour of many diseases. However, the privacy implications of this data, particularly genomics data, have surfaced recently as the collection, dissemination, and analysis of human genomics data is highly sensitive. There have been multiple privacy attacks relying on the uniqueness of the human genome that reveals a participant or a certain group’s presence in a dataset. Therefore, the current data sharing policies have ruled out any public dissemination and adopted precautionary measures prior to genomics data release, which hinders timely scientific innovation. In this article, we investigate an approach that only releases the statistics from genomic data rather than the whole dataset and propose a generalized Differentially Private mechanism for Genome-wide Association Studies (GWAS). Our method provides a quantifiable privacy guarantee that adds noise to the intermediate outputs but ensures satisfactory accuracy of the private results. Furthermore, the proposed method offers multiple adjustable parameters that the data owners can set based on the optimal privacy requirements. These variables are presented as equalizers that balance between the privacy and utility of the GWAS. The method also incorporates Online Bin Packing technique [1], which further bounds the privacy loss linearly, growing according to the number of open bins and scales with the incoming queries. Finally, we implemented and benchmarked our approach using seven different GWAS studies to test the performance of the proposed methods. The experimental results demonstrate that for 1,000 arbitrary online queries, our algorithms are more than 80% accurate with reasonable privacy loss and exceed the state-of-the-art approaches on multiple studies (i.e., EigenStrat, LMM, TDT).
{"title":"Online Algorithm for Differentially Private Genome-wide Association Studies","authors":"Md Momin Al Aziz, Shahin Kamali, N. Mohammed, Xiaoqian Jiang","doi":"10.1145/3431504","DOIUrl":"https://doi.org/10.1145/3431504","url":null,"abstract":"Digitization of healthcare records contributed to a large volume of functional scientific data that can help researchers to understand the behaviour of many diseases. However, the privacy implications of this data, particularly genomics data, have surfaced recently as the collection, dissemination, and analysis of human genomics data is highly sensitive. There have been multiple privacy attacks relying on the uniqueness of the human genome that reveals a participant or a certain group’s presence in a dataset. Therefore, the current data sharing policies have ruled out any public dissemination and adopted precautionary measures prior to genomics data release, which hinders timely scientific innovation. In this article, we investigate an approach that only releases the statistics from genomic data rather than the whole dataset and propose a generalized Differentially Private mechanism for Genome-wide Association Studies (GWAS). Our method provides a quantifiable privacy guarantee that adds noise to the intermediate outputs but ensures satisfactory accuracy of the private results. Furthermore, the proposed method offers multiple adjustable parameters that the data owners can set based on the optimal privacy requirements. These variables are presented as equalizers that balance between the privacy and utility of the GWAS. The method also incorporates Online Bin Packing technique [1], which further bounds the privacy loss linearly, growing according to the number of open bins and scales with the incoming queries. Finally, we implemented and benchmarked our approach using seven different GWAS studies to test the performance of the proposed methods. The experimental results demonstrate that for 1,000 arbitrary online queries, our algorithms are more than 80% accurate with reasonable privacy loss and exceed the state-of-the-art approaches on multiple studies (i.e., EigenStrat, LMM, TDT).","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":" ","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3431504","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46636696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}