基于元数据的医疗保健时间序列（BAHT）决策支持系统中的偏差分析。

IF 5.9 Q1 Computer Science Journal of Healthcare Informatics Research Pub Date : 2023-06-19 eCollection Date: 2023-06-01 DOI:10.1007/s41666-023-00133-6

Sagnik Dakshit, Sristi Dakshit, Ninad Khargonkar, Balakrishnan Prabhakaran

{"title":"基于元数据的医疗保健时间序列（BAHT）决策支持系统中的偏差分析。","authors":"Sagnik Dakshit, Sristi Dakshit, Ninad Khargonkar, Balakrishnan Prabhakaran","doi":"10.1007/s41666-023-00133-6","DOIUrl":null,"url":null,"abstract":"One of the hindrances in the widespread acceptance of deep learning-based decision support systems in healthcare is bias. Bias in its many forms occurs in the datasets used to train and test deep learning models and is amplified when deployed in the real world, leading to challenges such as model drift. Recent advancements in the field of deep learning have led to the deployment of deployable automated healthcare diagnosis decision support systems at hospitals as well as tele-medicine through IoT devices. Research has been focused primarily on the development and improvement of these systems leaving a gap in the analysis of the fairness. The domain of FAccT ML (fairness, accountability, and transparency) accounts for the analysis of these deployable machine learning systems. In this work, we present a framework for bias analysis in healthcare time series (BAHT) signals such as electrocardiogram (ECG) and electroencephalogram (EEG). BAHT provides a graphical interpretive analysis of bias in the training, testing datasets in terms of protected variables, and analysis of bias amplification by the trained supervised learning model for time series healthcare decision support systems. We thoroughly investigate three prominent time series ECG and EEG healthcare datasets used for model training and research. We show the extensive presence of bias in the datasets leads to potentially biased or unfair machine-learning models. Our experiments also demonstrate the amplification of identified bias with an observed maximum of 66.66%. We investigate the effect of model drift due to unanalyzed bias in datasets and algorithms. Bias mitigation though prudent is a nascent area of research. We present experiments and analyze the most prevalently accepted bias mitigation strategies of under-sampling, oversampling, and the use of synthetic data for balancing the dataset through augmentation. It is important that healthcare models, datasets, and bias mitigation strategies should be properly analyzed for a fair unbiased delivery of service.","PeriodicalId":36444,"journal":{"name":"Journal of Healthcare Informatics Research","volume":"7 2","pages":"225-253"},"PeriodicalIF":5.9000,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290973/pdf/","citationCount":"0","resultStr":"{\"title\":\"Bias Analysis in Healthcare Time Series (BAHT) Decision Support Systems from Meta Data.\",\"authors\":\"Sagnik Dakshit, Sristi Dakshit, Ninad Khargonkar, Balakrishnan Prabhakaran\",\"doi\":\"10.1007/s41666-023-00133-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the hindrances in the widespread acceptance of deep learning-based decision support systems in healthcare is bias. Bias in its many forms occurs in the datasets used to train and test deep learning models and is amplified when deployed in the real world, leading to challenges such as model drift. Recent advancements in the field of deep learning have led to the deployment of deployable automated healthcare diagnosis decision support systems at hospitals as well as tele-medicine through IoT devices. Research has been focused primarily on the development and improvement of these systems leaving a gap in the analysis of the fairness. The domain of FAccT ML (fairness, accountability, and transparency) accounts for the analysis of these deployable machine learning systems. In this work, we present a framework for bias analysis in healthcare time series (BAHT) signals such as electrocardiogram (ECG) and electroencephalogram (EEG). BAHT provides a graphical interpretive analysis of bias in the training, testing datasets in terms of protected variables, and analysis of bias amplification by the trained supervised learning model for time series healthcare decision support systems. We thoroughly investigate three prominent time series ECG and EEG healthcare datasets used for model training and research. We show the extensive presence of bias in the datasets leads to potentially biased or unfair machine-learning models. Our experiments also demonstrate the amplification of identified bias with an observed maximum of 66.66%. We investigate the effect of model drift due to unanalyzed bias in datasets and algorithms. Bias mitigation though prudent is a nascent area of research. We present experiments and analyze the most prevalently accepted bias mitigation strategies of under-sampling, oversampling, and the use of synthetic data for balancing the dataset through augmentation. It is important that healthcare models, datasets, and bias mitigation strategies should be properly analyzed for a fair unbiased delivery of service.\",\"PeriodicalId\":36444,\"journal\":{\"name\":\"Journal of Healthcare Informatics Research\",\"volume\":\"7 2\",\"pages\":\"225-253\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2023-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290973/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Healthcare Informatics Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41666-023-00133-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41666-023-00133-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

医疗保健中广泛接受基于深度学习的决策支持系统的障碍之一是偏见。在用于训练和测试深度学习模型的数据集中会出现多种形式的偏差，并且在现实世界中部署时会被放大，从而导致模型漂移等挑战。深度学习领域的最新进展导致在医院部署了可部署的自动化医疗诊断决策支持系统，并通过物联网设备部署了远程医疗。研究主要集中在这些制度的发展和改进上，在公平性分析方面留下了空白。FAccT ML的领域（公平性、问责制和透明度）负责分析这些可部署的机器学习系统。在这项工作中，我们提出了一个医疗保健时间序列（BAHT）信号（如心电图（ECG）和脑电图（EEG））偏差分析的框架。BAHT为时间序列医疗决策支持系统提供了对训练中偏差的图形解释性分析，根据受保护变量测试数据集，以及通过训练的监督学习模型分析偏差放大。我们深入研究了用于模型训练和研究的三个重要的时间序列ECG和EEG医疗数据集。我们表明，数据集中广泛存在的偏见会导致潜在的偏见或不公平的机器学习模型。我们的实验还证明了已识别偏差的放大率，观察到的最大值为66.66%。我们研究了数据集和算法中未分析偏差导致的模型漂移的影响。尽管谨慎，但减少偏见是一个新兴的研究领域。我们进行了实验，并分析了最普遍接受的偏差缓解策略，即欠采样、过采样，以及使用合成数据通过增强来平衡数据集。重要的是，应正确分析医疗保健模型、数据集和偏见缓解策略，以公平公正地提供服务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bias Analysis in Healthcare Time Series (BAHT) Decision Support Systems from Meta Data.

One of the hindrances in the widespread acceptance of deep learning-based decision support systems in healthcare is bias. Bias in its many forms occurs in the datasets used to train and test deep learning models and is amplified when deployed in the real world, leading to challenges such as model drift. Recent advancements in the field of deep learning have led to the deployment of deployable automated healthcare diagnosis decision support systems at hospitals as well as tele-medicine through IoT devices. Research has been focused primarily on the development and improvement of these systems leaving a gap in the analysis of the fairness. The domain of FAccT ML (fairness, accountability, and transparency) accounts for the analysis of these deployable machine learning systems. In this work, we present a framework for bias analysis in healthcare time series (BAHT) signals such as electrocardiogram (ECG) and electroencephalogram (EEG). BAHT provides a graphical interpretive analysis of bias in the training, testing datasets in terms of protected variables, and analysis of bias amplification by the trained supervised learning model for time series healthcare decision support systems. We thoroughly investigate three prominent time series ECG and EEG healthcare datasets used for model training and research. We show the extensive presence of bias in the datasets leads to potentially biased or unfair machine-learning models. Our experiments also demonstrate the amplification of identified bias with an observed maximum of 66.66%. We investigate the effect of model drift due to unanalyzed bias in datasets and algorithms. Bias mitigation though prudent is a nascent area of research. We present experiments and analyze the most prevalently accepted bias mitigation strategies of under-sampling, oversampling, and the use of synthetic data for balancing the dataset through augmentation. It is important that healthcare models, datasets, and bias mitigation strategies should be properly analyzed for a fair unbiased delivery of service.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Healthcare Informatics Research Computer Science-Computer Science Applications

CiteScore

13.60

自引率

1.70%

发文量

期刊介绍： Journal of Healthcare Informatics Research serves as a publication venue for the innovative technical contributions highlighting analytics, systems, and human factors research in healthcare informatics.Journal of Healthcare Informatics Research is concerned with the application of computer science principles, information science principles, information technology, and communication technology to address problems in healthcare, and everyday wellness. Journal of Healthcare Informatics Research highlights the most cutting-edge technical contributions in computing-oriented healthcare informatics. The journal covers three major tracks: (1) analytics—focuses on data analytics, knowledge discovery, predictive modeling; (2) systems—focuses on building healthcare informatics systems (e.g., architecture, framework, design, engineering, and application); (3) human factors—focuses on understanding users or context, interface design, health behavior, and user studies of healthcare informatics applications. Topics include but are not limited to: · healthcare software architecture, framework, design, and engineering;· electronic health records· medical data mining· predictive modeling· medical information retrieval· medical natural language processing· healthcare information systems· smart health and connected health· social media analytics· mobile healthcare· medical signal processing· human factors in healthcare· usability studies in healthcare· user-interface design for medical devices and healthcare software· health service delivery· health games· security and privacy in healthcare· medical recommender system· healthcare workflow management· disease profiling and personalized treatment· visualization of medical data· intelligent medical devices and sensors· RFID solutions for healthcare· healthcare decision analytics and support systems· epidemiological surveillance systems and intervention modeling· consumer and clinician health information needs, seeking, sharing, and use· semantic Web, linked data, and ontology· collaboration technologies for healthcare· assistive and adaptive ubiquitous computing technologies· statistics and quality of medical data· healthcare delivery in developing countries· health systems modeling and simulation· computer-aided diagnosis