{"title":"Question identification on Turkish tweets","authors":"Zeynep Banu Ozger, B. Diri, Canan Girgin","doi":"10.1109/INISTA.2014.6873608","DOIUrl":null,"url":null,"abstract":"Question identification is a field Natural Language Processing and also Information Extraction. The aim of work is detecting Turkish tweets which are including question expressions. The application contains three stages: applying some pre-processing steps to data set for cleaning unnecessary data like Retweet, determining candidate tweets via a rule-based method and extracting tweets which are really include questions using Conditional Random Fields. For this purpose one million tweets were collected and labeled. Tweets are ungrammatical data type. According to results; the model developed has been largely successful on tweets. Additionally, it is a first study about identifying questions on Turkish tweets.","PeriodicalId":339652,"journal":{"name":"2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INISTA.2014.6873608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Question identification is a field Natural Language Processing and also Information Extraction. The aim of work is detecting Turkish tweets which are including question expressions. The application contains three stages: applying some pre-processing steps to data set for cleaning unnecessary data like Retweet, determining candidate tweets via a rule-based method and extracting tweets which are really include questions using Conditional Random Fields. For this purpose one million tweets were collected and labeled. Tweets are ungrammatical data type. According to results; the model developed has been largely successful on tweets. Additionally, it is a first study about identifying questions on Turkish tweets.