{"title":"Healthy Bio-Core: A Framework for Selection of Homogeneous Healthy Biomedical Multivariate Time Series Employing Classification Performance.","authors":"Abhidnya Patharkar, Firas Al-Hindawi, Teresa Wu","doi":"10.1109/JBHI.2025.3546844","DOIUrl":null,"url":null,"abstract":"<p><p>In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising from inter-subject variation, which affects the effectiveness of classification. To address this issue, we propose a novel methodology for multivariate time series data that discerns a homogeneous sub-cohort of healthy samples, referred to as the 'Healthy Bio-Core' (HBC). The employment of HBC augments the discriminative capacity of classification models. The selection process for HBC integrates dynamic time warping (DTW), and the accuracy of the ROCKET (RandOm Convolutional KErnel Transform) classifier, treating the entire time series as a single instance. Empirical results indicate that utilizing HBC enhances classification performance in comparison to utilizing the complete healthy dataset. We substantiate this approach with three classifiers: HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles), MUSE (Multi-variate Unsupervised Symbols and Derivatives), and DTW-NN (DTW with Nearest Neighbor), assessing metrics such as accuracy, precision, recall, and F1-score. Although our approach relies on DTW, it is limited to cases where a DTW path can be identified; otherwise, another distance metric must be used. Currently, the efficiency depends on the classifier used. Future studies might investigate combining different classifiers for HBC sample selection and devise a method to synthesize their outcomes. Moreover, assuming that the dataset is predominantly healthy may not hold true in contexts with significant noise. Notwithstanding these limitations, our approach results in significant improvements in classification, with average accuracy increases of 5.49%, 14.28%, and 6.16% for the sepsis, gait, and EMO pain datasets, respectively.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3546844","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Healthy Bio-Core: A Framework for Selection of Homogeneous Healthy Biomedical Multivariate Time Series Employing Classification Performance.
In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising from inter-subject variation, which affects the effectiveness of classification. To address this issue, we propose a novel methodology for multivariate time series data that discerns a homogeneous sub-cohort of healthy samples, referred to as the 'Healthy Bio-Core' (HBC). The employment of HBC augments the discriminative capacity of classification models. The selection process for HBC integrates dynamic time warping (DTW), and the accuracy of the ROCKET (RandOm Convolutional KErnel Transform) classifier, treating the entire time series as a single instance. Empirical results indicate that utilizing HBC enhances classification performance in comparison to utilizing the complete healthy dataset. We substantiate this approach with three classifiers: HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles), MUSE (Multi-variate Unsupervised Symbols and Derivatives), and DTW-NN (DTW with Nearest Neighbor), assessing metrics such as accuracy, precision, recall, and F1-score. Although our approach relies on DTW, it is limited to cases where a DTW path can be identified; otherwise, another distance metric must be used. Currently, the efficiency depends on the classifier used. Future studies might investigate combining different classifiers for HBC sample selection and devise a method to synthesize their outcomes. Moreover, assuming that the dataset is predominantly healthy may not hold true in contexts with significant noise. Notwithstanding these limitations, our approach results in significant improvements in classification, with average accuracy increases of 5.49%, 14.28%, and 6.16% for the sepsis, gait, and EMO pain datasets, respectively.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.