加权插值多维尺度的鲁棒可伸缩解

Yang Ruan, G. Fox
{"title":"加权插值多维尺度的鲁棒可伸缩解","authors":"Yang Ruan, G. Fox","doi":"10.1109/eScience.2013.30","DOIUrl":null,"url":null,"abstract":"Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pair wise Clustering, and to classify the data. We have shown that interpolative MDS, which is an online technique for real-time streaming in Big Data, can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pair wise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"A Robust and Scalable Solution for Interpolative Multidimensional Scaling with Weighting\",\"authors\":\"Yang Ruan, G. Fox\",\"doi\":\"10.1109/eScience.2013.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pair wise Clustering, and to classify the data. We have shown that interpolative MDS, which is an online technique for real-time streaming in Big Data, can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pair wise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.\",\"PeriodicalId\":325272,\"journal\":{\"name\":\"2013 IEEE 9th International Conference on e-Science\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 9th International Conference on e-Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/eScience.2013.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 9th International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2013.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

现代生物测序技术的进步导致了原始基因组数据的激增,这为数据挖掘提供了前所未有的机会。为了分析如此大容量、高维的科学数据,人们开发了许多高性能的降维和聚类算法。在已知的算法中,我们使用多维尺度(MDS)对原始数据进行降维,并使用成对聚类对数据进行分类。我们已经证明,插值MDS是一种用于大数据实时流的在线技术,可以在海量数据上获得更好的性能。然而,SMACOF MDS方法仅直接适用于使用所有成对距离并且每个项的权重为1的情况。在本文中,我们提出了一种鲁棒且可扩展的MDS和插值算法,该算法使用确定性退火技术来解决缺失距离或非平凡权函数的问题。我们将自己的方法与三种最先进的技术进行了比较。通过对三种常见的生物信息学数据集的实验,结果表明,我们的算法的精度优于其他算法,并且加权解具有更低的计算时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Robust and Scalable Solution for Interpolative Multidimensional Scaling with Weighting
Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pair wise Clustering, and to classify the data. We have shown that interpolative MDS, which is an online technique for real-time streaming in Big Data, can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pair wise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Policy Derived Access Rights in the Social Cloud Accelerating In-memory Cross Match of Astronomical Catalogs Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store Malleable Access Rights to Establish and Enable Scientific Collaboration An Autonomous Security Storage Solution for Data-Intensive Cooperative Cloud Computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1