{"title":"Effect of different feature types on age based classification of short texts","authors":"Avar Pentel","doi":"10.1109/IISA.2015.7388069","DOIUrl":null,"url":null,"abstract":"The aim of the current study is to compare the effect of three different feature types for age-based categorization of short texts as average 85 words per author. Besides widely used word and character n-grams, text readability features are proposed as an alternative. By readability features we mean different relative ratios of text elements as characters per word, words per sentence, etc. Support Vector Machines, Logistic Regression, and Bayesian algorithms were used to build models. Most effective features were readability features and character n-grams. Model generated by Support Vector Machine and combined feature set yield to f-score 0.968. Age prediction application was built using a model with readability features.","PeriodicalId":433872,"journal":{"name":"2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2015.7388069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The aim of the current study is to compare the effect of three different feature types for age-based categorization of short texts as average 85 words per author. Besides widely used word and character n-grams, text readability features are proposed as an alternative. By readability features we mean different relative ratios of text elements as characters per word, words per sentence, etc. Support Vector Machines, Logistic Regression, and Bayesian algorithms were used to build models. Most effective features were readability features and character n-grams. Model generated by Support Vector Machine and combined feature set yield to f-score 0.968. Age prediction application was built using a model with readability features.