{"title":"SDAF: Symbolic data classification using variations in frequent terms","authors":"M. Mahfouz, Y. El-Sonbaty, M. Ismail","doi":"10.1109/JEC-ECC.2016.7518982","DOIUrl":null,"url":null,"abstract":"Symbolic data classification is of great importance in classification of massive high dimensional data that may exist in domains such as bioinformatics and web mining. Feature values (events) of symbolic data are generally not single values, as in the classical case, but rather list of values, intervals or, more generally, distributions. This study proposes a symbolic classification algorithm that uses distinguished variations in the frequency of frequent one and two item-sets extracted from each class sample. The events and events pairs that have enough high support in one class and very low in others are identified. A symbolic profile is built using identified item-sets. Incoming pattern is compared to the profile of each class and the class that achieves maximum similarity with the object to be classified is selected. Experimental study on two standard datasets shows that the proposed algorithm uses a small subset of events pairs in building a profile of each class and is able to achieve a comparable accuracy with less computational complexity compared to variants of the state of the art SVM.","PeriodicalId":362288,"journal":{"name":"2016 Fourth International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Fourth International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JEC-ECC.2016.7518982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Symbolic data classification is of great importance in classification of massive high dimensional data that may exist in domains such as bioinformatics and web mining. Feature values (events) of symbolic data are generally not single values, as in the classical case, but rather list of values, intervals or, more generally, distributions. This study proposes a symbolic classification algorithm that uses distinguished variations in the frequency of frequent one and two item-sets extracted from each class sample. The events and events pairs that have enough high support in one class and very low in others are identified. A symbolic profile is built using identified item-sets. Incoming pattern is compared to the profile of each class and the class that achieves maximum similarity with the object to be classified is selected. Experimental study on two standard datasets shows that the proposed algorithm uses a small subset of events pairs in building a profile of each class and is able to achieve a comparable accuracy with less computational complexity compared to variants of the state of the art SVM.