{"title":"A Cascade Approach for Gender Prediction from Texts in Portuguese Language","authors":"João Pedro Moreira de Morais, L. Merschmann","doi":"10.1145/3539637.3557057","DOIUrl":null,"url":null,"abstract":"Author Profiling is a prominent research area in which computational approaches have been proposed to predict authors’ characteristics from their texts. Gender, age, personality traits, and occupation are examples of commonly analyzed characteristics. It is a task of growing importance, with applications in different areas such as forensics, marketing, and e-commerce. Although a lot of research has been conducted on this task for some widely used languages (e.g., English), there is still a lot of room for improvement in studies involving the Portuguese language. Thus, this work contributes by proposing and evaluating a cascading approach, which combines a weighted lexical approach, a heuristic, and a classifier, for the gender prediction problem using only textual content written in the Portuguese language. The proposed approach considers both specificities of the Portuguese language and domain characteristics of the texts. The results obtained from the proposed approach showed that exploring the specificities of the Portuguese language and domain characteristics of the texts can positively contribute to the performance of the gender prediction task.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Brazilian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539637.3557057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Author Profiling is a prominent research area in which computational approaches have been proposed to predict authors’ characteristics from their texts. Gender, age, personality traits, and occupation are examples of commonly analyzed characteristics. It is a task of growing importance, with applications in different areas such as forensics, marketing, and e-commerce. Although a lot of research has been conducted on this task for some widely used languages (e.g., English), there is still a lot of room for improvement in studies involving the Portuguese language. Thus, this work contributes by proposing and evaluating a cascading approach, which combines a weighted lexical approach, a heuristic, and a classifier, for the gender prediction problem using only textual content written in the Portuguese language. The proposed approach considers both specificities of the Portuguese language and domain characteristics of the texts. The results obtained from the proposed approach showed that exploring the specificities of the Portuguese language and domain characteristics of the texts can positively contribute to the performance of the gender prediction task.