Mining Approximate Order Preserving Clusters in the Presence of Noise.

Mengsheng Zhang, Wei Wang, Jinze Liu
{"title":"Mining Approximate Order Preserving Clusters in the Presence of Noise.","authors":"Mengsheng Zhang, Wei Wang, Jinze Liu","doi":"10.1109/ICDE.2008.4497424","DOIUrl":null,"url":null,"abstract":"<p><p>Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression patterns can be represented by consistent orderings of attributes. Existing order preserving cluster models require all objects in a cluster have identical attribute order without deviation. However, real data are noisy due to measurement technology limitation and experimental variability which prohibits these strict models from revealing true clusters corrupted by noise. In this paper, we study the problem of revealing the order preserving clusters in the presence of noise. We propose a noise-tolerant model called approximate order preserving cluster (AOPC). Instead of requiring all objects in a cluster have identical attribute order, we require that (1) at least a certain fraction of the objects have identical attribute order; (2) other objects in the cluster may deviate from the consensus order by up to a certain fraction of attributes. We also propose an algorithm to mine AOPC. Experiments on gene expression data demonstrate the efficiency and effectiveness of our algorithm.</p>","PeriodicalId":74570,"journal":{"name":"Proceedings. International Conference on Data Engineering","volume":"2008 ","pages":"160-168"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2916184/pdf/nihms132004.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2008.4497424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression patterns can be represented by consistent orderings of attributes. Existing order preserving cluster models require all objects in a cluster have identical attribute order without deviation. However, real data are noisy due to measurement technology limitation and experimental variability which prohibits these strict models from revealing true clusters corrupted by noise. In this paper, we study the problem of revealing the order preserving clusters in the presence of noise. We propose a noise-tolerant model called approximate order preserving cluster (AOPC). Instead of requiring all objects in a cluster have identical attribute order, we require that (1) at least a certain fraction of the objects have identical attribute order; (2) other objects in the cluster may deviate from the consensus order by up to a certain fraction of attributes. We also propose an algorithm to mine AOPC. Experiments on gene expression data demonstrate the efficiency and effectiveness of our algorithm.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在存在噪声的情况下挖掘近似保序聚类。
子空间聚类因其在高维数据中发现突出模式的能力而备受关注。在高通量基因表达分析中,有序保留子空间聚类已被证明非常重要,因为功能相关的基因在一组实验条件下通常会共同表达。这种共同表达模式可以通过一致的属性排序来表示。现有的顺序保持聚类模型要求聚类中的所有对象都具有相同的属性顺序,不能有偏差。然而,由于测量技术的限制和实验的可变性,真实数据是有噪声的,这使得这些严格的模型无法揭示被噪声干扰的真实聚类。在本文中,我们研究了在存在噪声的情况下如何揭示保持顺序的聚类问题。我们提出了一种噪声容忍模型,称为近似秩序保持聚类(AOPC)。我们不要求集群中的所有对象都具有完全相同的属性顺序,而是要求:(1) 至少有一部分对象具有完全相同的属性顺序;(2) 集群中的其他对象可以偏离共识顺序,但最多只能偏离一定数量的属性。我们还提出了一种挖掘 AOPC 的算法。基因表达数据实验证明了我们算法的效率和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
0.00%
发文量
0
期刊最新文献
Wearables for Health (W4H) Toolkit for Acquisition, Storage, Analysis and Visualization of Data from Various Wearable Devices. A Neural Database for Answering Aggregate Queries on Incomplete Relational Data (Extended Abstract). Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis. A Mortality Study for ICU Patients using Bursty Medical Events. Secure Skyline Queries on Cloud Platform.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1