线性变换和k-均值聚类算法:聚类曲线的应用。

IF 1.8 4区 数学 Q1 STATISTICS & PROBABILITY American Statistician Pub Date : 2007-02-01 DOI:10.1198/000313007X171016
Thaddeus Tarpey
{"title":"线性变换和k-均值聚类算法:聚类曲线的应用。","authors":"Thaddeus Tarpey","doi":"10.1198/000313007X171016","DOIUrl":null,"url":null,"abstract":"<p><p>Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2007-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/000313007X171016","citationCount":"85","resultStr":"{\"title\":\"Linear Transformations and the k-Means Clustering Algorithm: Applications to Clustering Curves.\",\"authors\":\"Thaddeus Tarpey\",\"doi\":\"10.1198/000313007X171016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.</p>\",\"PeriodicalId\":50801,\"journal\":{\"name\":\"American Statistician\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2007-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1198/000313007X171016\",\"citationCount\":\"85\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Statistician\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1198/000313007X171016\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Statistician","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1198/000313007X171016","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 85

摘要

函数数据可以通过从单个曲线中插入估计的回归系数到k-means算法中来聚类。聚类结果可能会因曲线与数据的拟合程度而有所不同。使用不同的基函数集估计曲线对应于数据的不同线性变换。K-means聚类对数据的线性变换不是不变的。聚类的最优线性变换将拉伸分布,使变异的主要方向与聚类的实际差异保持一致。结果表明,对原始数据进行聚类通常会得到与使用正交设计矩阵得到的聚类回归系数相似的结果。利用函数空间上的L(2)度量聚类函数数据可以通过对回归系数进行适当的线性变换聚类来实现。用抗抑郁药治疗抑郁症患者的例子来说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Linear Transformations and the k-Means Clustering Algorithm: Applications to Clustering Curves.

Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
American Statistician
American Statistician 数学-统计学与概率论
CiteScore
3.50
自引率
5.60%
发文量
64
审稿时长
>12 weeks
期刊介绍: Are you looking for general-interest articles about current national and international statistical problems and programs; interesting and fun articles of a general nature about statistics and its applications; or the teaching of statistics? Then you are looking for The American Statistician (TAS), published quarterly by the American Statistical Association. TAS contains timely articles organized into the following sections: Statistical Practice, General, Teacher''s Corner, History Corner, Interdisciplinary, Statistical Computing and Graphics, Reviews of Books and Teaching Materials, and Letters to the Editor.
期刊最新文献
Causal Inference with Complex Surveys: A Unified Perspective on Sample Selection and Exposure Selection Performance Analysis of NSUM Estimators in Social-Network Topologies Cross-validatory Z-Residual for Diagnosing Shared Frailty Models A Pareto tail plot without moment restrictions Sparse-group boosting: Unbiased group and variable selection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1