{"title":"加权插值多维尺度的鲁棒可伸缩解","authors":"Yang Ruan, G. Fox","doi":"10.1109/eScience.2013.30","DOIUrl":null,"url":null,"abstract":"Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pair wise Clustering, and to classify the data. We have shown that interpolative MDS, which is an online technique for real-time streaming in Big Data, can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pair wise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"A Robust and Scalable Solution for Interpolative Multidimensional Scaling with Weighting\",\"authors\":\"Yang Ruan, G. Fox\",\"doi\":\"10.1109/eScience.2013.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pair wise Clustering, and to classify the data. We have shown that interpolative MDS, which is an online technique for real-time streaming in Big Data, can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pair wise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.\",\"PeriodicalId\":325272,\"journal\":{\"name\":\"2013 IEEE 9th International Conference on e-Science\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 9th International Conference on e-Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/eScience.2013.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 9th International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2013.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Robust and Scalable Solution for Interpolative Multidimensional Scaling with Weighting
Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pair wise Clustering, and to classify the data. We have shown that interpolative MDS, which is an online technique for real-time streaming in Big Data, can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pair wise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.