{"title":"基于交叉列表的通配符间隙模式匹配","authors":"Jun-yan Zhang, Chenhui Yang","doi":"10.1109/ISCID.2013.152","DOIUrl":null,"url":null,"abstract":"Pattern matching is a fundamental application text retrieval, string query, biological sequence analysis, etc. Therefore, the effective algorithm performing this kind of matching is in great need. In this paper, the wildcard is defines to match any one character in a sequence. Multiple wildcards form a gap. The length of a flexible gap is arbitrary. We design CLPM algorithm by use of cross list index structure to realize pattern matching with flexible wildcard gaps. The preprocessing algorithm is designed to initialize cross list so as to reduce searching space. In CLPM algorithm, the effective intervals is defined and computed based on the start positions of each sub pattern in each string, which help to obtain matching result set. Moreover, the approximate pattern matching is converted to short extract pattern matching. The contrast experiments are done based on DBLP tile data set. The results show that CLMP algorithm has better performance in the same fields.","PeriodicalId":297027,"journal":{"name":"2013 Sixth International Symposium on Computational Intelligence and Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Pattern Matching with Wildcard Gaps Based on Cross List\",\"authors\":\"Jun-yan Zhang, Chenhui Yang\",\"doi\":\"10.1109/ISCID.2013.152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pattern matching is a fundamental application text retrieval, string query, biological sequence analysis, etc. Therefore, the effective algorithm performing this kind of matching is in great need. In this paper, the wildcard is defines to match any one character in a sequence. Multiple wildcards form a gap. The length of a flexible gap is arbitrary. We design CLPM algorithm by use of cross list index structure to realize pattern matching with flexible wildcard gaps. The preprocessing algorithm is designed to initialize cross list so as to reduce searching space. In CLPM algorithm, the effective intervals is defined and computed based on the start positions of each sub pattern in each string, which help to obtain matching result set. Moreover, the approximate pattern matching is converted to short extract pattern matching. The contrast experiments are done based on DBLP tile data set. The results show that CLMP algorithm has better performance in the same fields.\",\"PeriodicalId\":297027,\"journal\":{\"name\":\"2013 Sixth International Symposium on Computational Intelligence and Design\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Sixth International Symposium on Computational Intelligence and Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCID.2013.152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Sixth International Symposium on Computational Intelligence and Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCID.2013.152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pattern Matching with Wildcard Gaps Based on Cross List
Pattern matching is a fundamental application text retrieval, string query, biological sequence analysis, etc. Therefore, the effective algorithm performing this kind of matching is in great need. In this paper, the wildcard is defines to match any one character in a sequence. Multiple wildcards form a gap. The length of a flexible gap is arbitrary. We design CLPM algorithm by use of cross list index structure to realize pattern matching with flexible wildcard gaps. The preprocessing algorithm is designed to initialize cross list so as to reduce searching space. In CLPM algorithm, the effective intervals is defined and computed based on the start positions of each sub pattern in each string, which help to obtain matching result set. Moreover, the approximate pattern matching is converted to short extract pattern matching. The contrast experiments are done based on DBLP tile data set. The results show that CLMP algorithm has better performance in the same fields.