{"title":"Extending Cross Motif Search with Heuristic Data Mining","authors":"Teo Argentieri, V. Cantoni, M. Musci","doi":"10.1109/DEXA.2017.28","DOIUrl":null,"url":null,"abstract":"In previous works we have presented Cross Motif Search (CMS), a MP/MPI parallel tool for geometrical motif extraction in the secondary structure of proteins. We proved that our algorithm is capable of retrieving previously unknown motifs, thanks to its innovative approach based on the generalized Hough transform. We have also presented a GUI to CMS, called MotifVisualizer, which was introduced to improve software usability and to encourage collaboration with the biology community. In this paper we address the main shortcoming of CMS: with a simple approach based on heuristic data mining we show how we can classify the candidate motifs according to their statistical significance in the data set. We also present two extensions to MotifVisualizer, one to include the new data mining functions in the GUI, and a second one to allow for an easier retrieval of testing data sets.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2017.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In previous works we have presented Cross Motif Search (CMS), a MP/MPI parallel tool for geometrical motif extraction in the secondary structure of proteins. We proved that our algorithm is capable of retrieving previously unknown motifs, thanks to its innovative approach based on the generalized Hough transform. We have also presented a GUI to CMS, called MotifVisualizer, which was introduced to improve software usability and to encourage collaboration with the biology community. In this paper we address the main shortcoming of CMS: with a simple approach based on heuristic data mining we show how we can classify the candidate motifs according to their statistical significance in the data set. We also present two extensions to MotifVisualizer, one to include the new data mining functions in the GUI, and a second one to allow for an easier retrieval of testing data sets.