Healthy Bio-Core: A Framework for Selection of Homogeneous Healthy Biomedical Multivariate Time Series Employing Classification Performance.

IF 6.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Journal of Biomedical and Health Informatics Pub Date : 2025-03-03 DOI:10.1109/JBHI.2025.3546844
Abhidnya Patharkar, Firas Al-Hindawi, Teresa Wu
{"title":"Healthy Bio-Core: A Framework for Selection of Homogeneous Healthy Biomedical Multivariate Time Series Employing Classification Performance.","authors":"Abhidnya Patharkar, Firas Al-Hindawi, Teresa Wu","doi":"10.1109/JBHI.2025.3546844","DOIUrl":null,"url":null,"abstract":"<p><p>In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising from inter-subject variation, which affects the effectiveness of classification. To address this issue, we propose a novel methodology for multivariate time series data that discerns a homogeneous sub-cohort of healthy samples, referred to as the 'Healthy Bio-Core' (HBC). The employment of HBC augments the discriminative capacity of classification models. The selection process for HBC integrates dynamic time warping (DTW), and the accuracy of the ROCKET (RandOm Convolutional KErnel Transform) classifier, treating the entire time series as a single instance. Empirical results indicate that utilizing HBC enhances classification performance in comparison to utilizing the complete healthy dataset. We substantiate this approach with three classifiers: HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles), MUSE (Multi-variate Unsupervised Symbols and Derivatives), and DTW-NN (DTW with Nearest Neighbor), assessing metrics such as accuracy, precision, recall, and F1-score. Although our approach relies on DTW, it is limited to cases where a DTW path can be identified; otherwise, another distance metric must be used. Currently, the efficiency depends on the classifier used. Future studies might investigate combining different classifiers for HBC sample selection and devise a method to synthesize their outcomes. Moreover, assuming that the dataset is predominantly healthy may not hold true in contexts with significant noise. Notwithstanding these limitations, our approach results in significant improvements in classification, with average accuracy increases of 5.49%, 14.28%, and 6.16% for the sepsis, gait, and EMO pain datasets, respectively.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3546844","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising from inter-subject variation, which affects the effectiveness of classification. To address this issue, we propose a novel methodology for multivariate time series data that discerns a homogeneous sub-cohort of healthy samples, referred to as the 'Healthy Bio-Core' (HBC). The employment of HBC augments the discriminative capacity of classification models. The selection process for HBC integrates dynamic time warping (DTW), and the accuracy of the ROCKET (RandOm Convolutional KErnel Transform) classifier, treating the entire time series as a single instance. Empirical results indicate that utilizing HBC enhances classification performance in comparison to utilizing the complete healthy dataset. We substantiate this approach with three classifiers: HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles), MUSE (Multi-variate Unsupervised Symbols and Derivatives), and DTW-NN (DTW with Nearest Neighbor), assessing metrics such as accuracy, precision, recall, and F1-score. Although our approach relies on DTW, it is limited to cases where a DTW path can be identified; otherwise, another distance metric must be used. Currently, the efficiency depends on the classifier used. Future studies might investigate combining different classifiers for HBC sample selection and devise a method to synthesize their outcomes. Moreover, assuming that the dataset is predominantly healthy may not hold true in contexts with significant noise. Notwithstanding these limitations, our approach results in significant improvements in classification, with average accuracy increases of 5.49%, 14.28%, and 6.16% for the sepsis, gait, and EMO pain datasets, respectively.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在与疾病检测有关的生物医学数据集中,数据通常分为两类:健康数据和患病数据。由于临床分型的原因,患病队列通常表现出固有的异质性。虽然健康组群被认为是同质的,但它包含由受试者间差异产生的异质性,这会影响分类的有效性。为了解决这个问题,我们提出了一种用于多变量时间序列数据的新方法,该方法可识别健康样本的同质子队列,即 "健康生物核心"(HBC)。HBC 的使用增强了分类模型的判别能力。HBC 的选择过程整合了动态时间扭曲(DTW)和 ROCKET(RandOm Convolutional KErnel Transform)分类器的准确性,将整个时间序列视为单一实例。实证结果表明,与利用完整的健康数据集相比,利用 HBC 可提高分类性能。我们用三种分类器证实了这种方法:HIVE-COTE(基于变换的分层投票集合)、MUSE(多变量无监督符号和衍生物)和 DTW-NN(带近邻的 DTW),评估了准确率、精确度、召回率和 F1 分数等指标。虽然我们的方法依赖于 DTW,但它仅限于可以识别 DTW 路径的情况,否则就必须使用其他距离度量。目前,效率取决于所使用的分类器。未来的研究可能会研究将不同的分类器结合起来进行 HBC 样本选择,并设计一种方法来综合它们的结果。此外,假定数据集主要是健康的,在噪声很大的情况下可能并不成立。尽管存在这些局限性,我们的方法还是显著改善了分类效果,败血症、步态和 EMO 疼痛数据集的平均准确率分别提高了 5.49%、14.28% 和 6.16%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Journal of Biomedical and Health Informatics
IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
13.60
自引率
6.50%
发文量
1151
期刊介绍: IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.
期刊最新文献
Design, Performance Evaluation and Optimization for Intensive Care Networks Based on Non-Hierarchical Overflow Loss Systems. Detection of Early Parkinson's Disease by Leveraging Speech Foundation Models. MMFmiRLocEL: A multi-model fusion and ensemble learning approach for identifying miRNA subcellular localization using RNA structure language model. Table of Contents Front Cover
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1