M. Francisco, Miguel-Ángel Benítez-Castro, E. Hidalgo-Tenorio, J. Castro
{"title":"A semi-supervised algorithm for detecting extremism propaganda diffusion on social media","authors":"M. Francisco, Miguel-Ángel Benítez-Castro, E. Hidalgo-Tenorio, J. Castro","doi":"10.1075/ps.21009.fra","DOIUrl":null,"url":null,"abstract":"\n Extremist online networks reportedly tend to use Twitter and other Social Networking Sites (SNS) in order to issue\n propaganda and recruitment statements. Traditional machine learning models may encounter problems when used in such a context, due\n to the peculiarities of microblogging sites and the manner in which these networks interact (both between themselves and with\n other networks). Moreover, state-of-the-art approaches have focused on non-transparent techniques that cannot be audited; so,\n despite the fact that they are top performing techniques, it is impossible to check if the models are actually fair. In this\n paper, we present a semi-supervised methodology that uses our Discriminatory Expressions algorithm for feature\n selection to detect expressions that are biased towards extremist content (Francisco and\n Castro 2020). With the help of human experts, the relevant expressions are filtered and used to retrieve further\n extremist content in order to iteratively provide a set of relevant and accurate expressions. These discriminatory expressions\n have been proved to produce less complex models that are easier to comprehend, and thus improve model transparency. In the\n following, we present close to 70 expressions that were discovered by using this method alongside the validation test of the\n algorithm in several different contexts.","PeriodicalId":44036,"journal":{"name":"Pragmatics and Society","volume":" ","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pragmatics and Society","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1075/ps.21009.fra","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
Extremist online networks reportedly tend to use Twitter and other Social Networking Sites (SNS) in order to issue
propaganda and recruitment statements. Traditional machine learning models may encounter problems when used in such a context, due
to the peculiarities of microblogging sites and the manner in which these networks interact (both between themselves and with
other networks). Moreover, state-of-the-art approaches have focused on non-transparent techniques that cannot be audited; so,
despite the fact that they are top performing techniques, it is impossible to check if the models are actually fair. In this
paper, we present a semi-supervised methodology that uses our Discriminatory Expressions algorithm for feature
selection to detect expressions that are biased towards extremist content (Francisco and
Castro 2020). With the help of human experts, the relevant expressions are filtered and used to retrieve further
extremist content in order to iteratively provide a set of relevant and accurate expressions. These discriminatory expressions
have been proved to produce less complex models that are easier to comprehend, and thus improve model transparency. In the
following, we present close to 70 expressions that were discovered by using this method alongside the validation test of the
algorithm in several different contexts.
据报道,极端主义网络倾向于使用Twitter和其他社交网站(SNS)来发布宣传和招募声明。传统的机器学习模型在这种情况下使用时可能会遇到问题,因为微博网站的特殊性和这些网络交互的方式(包括它们自己之间和与其他网络之间)。此外,最先进的方法侧重于无法审计的不透明技术;因此,尽管它们是表现最好的技术,但不可能检查这些模型是否真正公平。在本文中,我们提出了一种半监督方法,该方法使用我们的歧视性表达算法进行特征选择,以检测偏向极端主义内容的表达(Francisco and Castro 2020)。在人类专家的帮助下,对相关表达进行过滤,并用于进一步检索极端主义内容,从而迭代地提供一组相关且准确的表达。事实证明,这些歧视性表达产生的模型不那么复杂,更容易理解,从而提高了模型的透明度。在下面,我们展示了近70个表达式,这些表达式是通过使用这种方法以及在几个不同的上下文中对算法进行验证测试而发现的。