Selection of number of clusters and warping penalty in clustering functional electrocardiogram.

IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Statistics in Medicine Pub Date : 2024-11-20 Epub Date: 2024-09-09 DOI:10.1002/sim.10192
Wei Yang, Harold I Feldman, Wensheng Guo
{"title":"Selection of number of clusters and warping penalty in clustering functional electrocardiogram.","authors":"Wei Yang, Harold I Feldman, Wensheng Guo","doi":"10.1002/sim.10192","DOIUrl":null,"url":null,"abstract":"<p><p>Clustering functional data aims to identify unique functional patterns in the entire domain, but this can be challenging due to phase variability that distorts the observed patterns. Curve registration can be used to remove this variability, but determining the appropriate level of warping flexibility can be complicated. Curve registration also requires a target to which a functional object is aligned, typically the cross-sectional mean of functional objects within the same cluster. However, this mean is unknown prior to clustering. Furthermore, there is a trade-off between flexible warping and the number of resulting clusters. Removing more phase variability through curve registration can lead to fewer remaining variations in the functional data, resulting in a smaller number of clusters. Thus, the optimal number of clusters and warping flexibility cannot be uniquely identified. We propose to use external information to solve the identification issue. We define a cross validated Kullback-Leibler information criterion to select the number of clusters and the warping penalty. The criterion is derived from the predictive classification likelihood considering the joint distribution of both the functional data and external variable and penalizes the uncertainty in the cluster membership. We evaluate our method through simulation and apply it to electrocardiographic data collected in the Chronic Renal Insufficiency Cohort study. We identify two distinct clusters of electrocardiogram (ECG) profiles, with the second cluster exhibiting ST segment depression, an indication of cardiac ischemia, compared to the normal ECG profiles in the first cluster.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4913-4927"},"PeriodicalIF":1.8000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499710/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10192","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/9 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Clustering functional data aims to identify unique functional patterns in the entire domain, but this can be challenging due to phase variability that distorts the observed patterns. Curve registration can be used to remove this variability, but determining the appropriate level of warping flexibility can be complicated. Curve registration also requires a target to which a functional object is aligned, typically the cross-sectional mean of functional objects within the same cluster. However, this mean is unknown prior to clustering. Furthermore, there is a trade-off between flexible warping and the number of resulting clusters. Removing more phase variability through curve registration can lead to fewer remaining variations in the functional data, resulting in a smaller number of clusters. Thus, the optimal number of clusters and warping flexibility cannot be uniquely identified. We propose to use external information to solve the identification issue. We define a cross validated Kullback-Leibler information criterion to select the number of clusters and the warping penalty. The criterion is derived from the predictive classification likelihood considering the joint distribution of both the functional data and external variable and penalizes the uncertainty in the cluster membership. We evaluate our method through simulation and apply it to electrocardiographic data collected in the Chronic Renal Insufficiency Cohort study. We identify two distinct clusters of electrocardiogram (ECG) profiles, with the second cluster exhibiting ST segment depression, an indication of cardiac ischemia, compared to the normal ECG profiles in the first cluster.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
功能性心电图聚类中的聚类数选择和扭曲惩罚
对功能数据进行聚类的目的是识别整个领域中的独特功能模式,但由于相位变异会扭曲观察到的模式,这可能具有挑战性。曲线配准可用于消除这种可变性,但确定适当程度的翘曲灵活性可能比较复杂。曲线配准还需要一个与功能对象对齐的目标,通常是同一群组中功能对象的横截面平均值。然而,在聚类之前,这个平均值是未知的。此外,在灵活翘曲和由此产生的聚类数量之间需要权衡。通过曲线配准去除更多的相位变异会导致功能数据中剩余的变异减少,从而导致聚类数量减少。因此,聚类的最佳数量和翘曲的灵活性无法唯一确定。我们建议使用外部信息来解决识别问题。我们定义了一个经过交叉验证的库尔贝克-莱伯勒信息准则来选择聚类数量和翘曲惩罚。该准则源于预测分类可能性,考虑了功能数据和外部变量的联合分布,并对群组成员的不确定性进行惩罚。我们通过模拟评估了我们的方法,并将其应用于慢性肾功能不全队列研究中收集的心电图数据。我们确定了两个不同的心电图(ECG)集群,与第一个集群中正常的心电图相比,第二个集群表现出 ST 段压低,这是心脏缺血的迹象。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistics in Medicine
Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生
CiteScore
3.40
自引率
10.00%
发文量
334
审稿时长
2-4 weeks
期刊介绍: The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.
期刊最新文献
New Quadratic Discriminant Analysis Algorithms for Correlated Audiometric Data. A Modified Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization. A Brief Introduction on Latent Variable Based Ordinal Regression Models With an Application to Survey Data. Estimands and Cumulative Incidence Function Regression in Clinical Trials: Some New Results on Interpretability and Robustness. Heterogeneous Mediation Analysis for Cox Proportional Hazards Model With Multiple Mediators.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1