A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics.

Tonny J Oyana
{"title":"A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics.","authors":"Tonny J Oyana","doi":"10.1155/2010/746021","DOIUrl":null,"url":null,"abstract":"<p><p>The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique-the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2010/746021","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EURASIP journal on bioinformatics & systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2010/746021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2010/6/27 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique-the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种用于疾病发现和视觉分析的新型FES-k-Means聚类算法。
本研究的中心目的是进一步评估新算法的性能质量。该研究为该算法提供了额外的证据,该算法旨在提高原始k-means聚类技术的整体效率-快速,高效和可扩展的k-means算法(FES-k-means)。FES-k-means算法使用一种混合方法,该方法包括增强最近邻查询的k-d树数据结构、原始k-means算法和Mashor提出的自适应率。用两个真实数据集和一个合成数据集对该算法进行了测试。它在所有三个数据集上使用了两次:一次是通过创新的MIL-SOM方法训练的数据,然后是实际未训练的数据,以评估其能力。这种在聚类之前进行数据训练的两步方法为知识发现和数据挖掘提供了坚实的基础,否则仅用聚类方法是无法做到的。这种方法的好处是,运行时比较数据显示,它以更快的速度产生与原始k-means方法相似的聚类;它还提供了对具有疾病机制发现意义的大型地理空间数据的有效分析。从疾病机制发现的角度来看,假设在芝加哥市发现的血铅水平升高的线性模式可能与城市供水管道在空间上有关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
From protein-protein interactions to protein co-expression networks: a new perspective to evaluate large-scale proteomic data. On biometric systems: electrocardiogram Gaussianity and data synthesis. BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. Review of stochastic hybrid systems with applications in biological systems modeling and analysis. Bayesian inference for biomarker discovery in proteomics: an analytic solution.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1