{"title":"EKSS:一种高效的相似性搜索方法","authors":"S. Gupta, A. Dwivedi, R. Issac, S. K. Agrawal","doi":"10.1109/ICCICT.2012.6398194","DOIUrl":null,"url":null,"abstract":"Nowadays, crucial task in data mining field in large multidimensional data has always been the similarity search problems. Similarity search involves both subsequences matching and whole sequence matching. In this paper, we present an approach which consider on how many dimensions the data point is similiar to the query point, the average distance of these dimensions of data point to the query point as well as efficiency with respect to time and space required with the dramatic increment of data size. The proposed approach involves dynamic selection of input parameters, covering both subsequences matching and whole sequence matching, suppressing the impact of high dissimilarities in few dimensions. Thus our proposed approach can help improving performance of existing data analysis technologies, such as financial market analysis, medical diagnosis and scientific and engineering database analysis as tremendous amount of data is generated in these disciplines.","PeriodicalId":319467,"journal":{"name":"2012 International Conference on Communication, Information & Computing Technology (ICCICT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EKSS: An efficient approach for similarity search\",\"authors\":\"S. Gupta, A. Dwivedi, R. Issac, S. K. Agrawal\",\"doi\":\"10.1109/ICCICT.2012.6398194\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, crucial task in data mining field in large multidimensional data has always been the similarity search problems. Similarity search involves both subsequences matching and whole sequence matching. In this paper, we present an approach which consider on how many dimensions the data point is similiar to the query point, the average distance of these dimensions of data point to the query point as well as efficiency with respect to time and space required with the dramatic increment of data size. The proposed approach involves dynamic selection of input parameters, covering both subsequences matching and whole sequence matching, suppressing the impact of high dissimilarities in few dimensions. Thus our proposed approach can help improving performance of existing data analysis technologies, such as financial market analysis, medical diagnosis and scientific and engineering database analysis as tremendous amount of data is generated in these disciplines.\",\"PeriodicalId\":319467,\"journal\":{\"name\":\"2012 International Conference on Communication, Information & Computing Technology (ICCICT)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Communication, Information & Computing Technology (ICCICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCICT.2012.6398194\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Communication, Information & Computing Technology (ICCICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCICT.2012.6398194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nowadays, crucial task in data mining field in large multidimensional data has always been the similarity search problems. Similarity search involves both subsequences matching and whole sequence matching. In this paper, we present an approach which consider on how many dimensions the data point is similiar to the query point, the average distance of these dimensions of data point to the query point as well as efficiency with respect to time and space required with the dramatic increment of data size. The proposed approach involves dynamic selection of input parameters, covering both subsequences matching and whole sequence matching, suppressing the impact of high dissimilarities in few dimensions. Thus our proposed approach can help improving performance of existing data analysis technologies, such as financial market analysis, medical diagnosis and scientific and engineering database analysis as tremendous amount of data is generated in these disciplines.