Pub Date : 2022-12-29DOI: 10.4312/slo2.0.2022.2.5-61
Elena Volodina, David Alfter, Therese Lindström Tiedemann
In this study, we investigate theoretical and practical issues connected to differentiating between core and peripheral vocabulary at different levels of linguistic proficiency using statistical approaches combined with crowdsourcing. We also investigate whether crowdsourcing second language learners’ rankings can be used for assigning levels to unseen vocabulary. The study is performed on Swedish single-word items. The four hypotheses we examine are: (1) there is core vocabulary for each proficiency level, but this is only true until CEFR level B2 (upper-intermediate); (2) core vocabulary shows more systematicity in its behavior and usage, whereas peripheral items have more idiosyncratic behavior; (3) given that we have truly core items (aka anchor items) for each level, we can place any new unseen item in relation to the identified core items by using a series of comparative judgment tasks, this way assigning a “target” level for a previously unseen item; and (4) non-experts will perform on par with experts in a comparative judgment setting. The hypotheses have been largely confirmed: In relation to (1) and (2), our results show that there seems to be some systematicity in core vocabulary for early to mid-levels (A1-B1) while we find less systematicity for higher levels (B2-C1). In relation to (3), we suggest crowdsourcing word rankings using comparative judgment with known anchor words as a method to assign a “target” level to unseen words. With regard to (4), we confirm the previous findings that non-experts, in our case language learners, can be effectively used for the linguistic annotation tasks in a comparative judgment setting.
{"title":"Crowdsourcing ratings for single lexical items","authors":"Elena Volodina, David Alfter, Therese Lindström Tiedemann","doi":"10.4312/slo2.0.2022.2.5-61","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.2.5-61","url":null,"abstract":"In this study, we investigate theoretical and practical issues connected to differentiating between core and peripheral vocabulary at different levels of linguistic proficiency using statistical approaches combined with crowdsourcing. We also investigate whether crowdsourcing second language learners’ rankings can be used for assigning levels to unseen vocabulary. The study is performed on Swedish single-word items.\u0000The four hypotheses we examine are: (1) there is core vocabulary for each proficiency level, but this is only true until CEFR level B2 (upper-intermediate); (2) core vocabulary shows more systematicity in its behavior and usage, whereas peripheral items have more idiosyncratic behavior; (3) given that we have truly core items (aka anchor items) for each level, we can place any new unseen item in relation to the identified core items by using a series of comparative judgment tasks, this way assigning a “target” level for a previously unseen item; and (4) non-experts will perform on par with experts in a comparative judgment setting. The hypotheses have been largely confirmed: In relation to (1) and (2), our results show that there seems to be some systematicity in core vocabulary for early to mid-levels (A1-B1) while we find less systematicity for higher levels (B2-C1). In relation to (3), we suggest crowdsourcing word rankings using comparative judgment with known anchor words as a method to assign a “target” level to unseen words. With regard to (4), we confirm the previous findings that non-experts, in our case language learners, can be effectively used for the linguistic annotation tasks in a comparative judgment setting.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126894459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-29DOI: 10.4312/slo2.0.2022.2.132-183
Çiler Hatipoğlu, Nihada Delibegović Džanić, Elżbieta Gajek, Lina Miloshevska
The popularity of online crowdsourcing platforms was slowly increasing among language learners before the pandemic, but COVID-19 changed the educational systems worldwide. This study aims to uncover whether or not, and if ‘YES’, how the attitudes and habits of language learners concerning the use of crowdsourcing materials in Turkey, Bosnia and Herzegovina, the Republic of North Macedonia and Poland changed during the pandemic. To compare the pre-and during the covid crowdsourcing tool usage, the cross-culturally appropriate questionnaire utilised in the pre-COVID-19 period was used again. The collected data were analysed qualitatively and quantitatively to identify the differences between the periods. The study’s findings showed that the shift from face-to-face to online learning significantly affected the development of crowdsourcing platforms worldwide and their employment in the studied countries. The results also demonstrated that a combination of factors, such as reduced interactions with teachers and peers, an increase in workload, and a lack of support on the part of institutions, led to students taking responsibility for their learning. The number and characteristics of the popular platforms changed from country to country since expectations from students varied.
{"title":"Crowdsourcing and language learning habits and practices in Turkey, Bosnia and Herzegovina, the Republic of North Macedonia and Poland in the pre-pandemic and pandemic periods","authors":"Çiler Hatipoğlu, Nihada Delibegović Džanić, Elżbieta Gajek, Lina Miloshevska","doi":"10.4312/slo2.0.2022.2.132-183","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.2.132-183","url":null,"abstract":"The popularity of online crowdsourcing platforms was slowly increasing among language learners before the pandemic, but COVID-19 changed the educational systems worldwide. This study aims to uncover whether or not, and if ‘YES’, how the attitudes and habits of language learners concerning the use of crowdsourcing materials in Turkey, Bosnia and Herzegovina, the Republic of North Macedonia and Poland changed during the pandemic. \u0000To compare the pre-and during the covid crowdsourcing tool usage, the cross-culturally appropriate questionnaire utilised in the pre-COVID-19 period was used again. The collected data were analysed qualitatively and quantitatively to identify the differences between the periods. \u0000The study’s findings showed that the shift from face-to-face to online learning significantly affected the development of crowdsourcing platforms worldwide and their employment in the studied countries. The results also demonstrated that a combination of factors, such as reduced interactions with teachers and peers, an increase in workload, and a lack of support on the part of institutions, led to students taking responsibility for their learning. The number and characteristics of the popular platforms changed from country to country since expectations from students varied.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126394019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-29DOI: 10.4312/slo2.0.2022.2.184-201
Elżbieta Gajek
While eTwinning is focused on facilitating collaboration among schools in Europe and beyond, the extensive participation of over one million teachers from 44 countries makes the program an extensive educational crowdsourcing activity. In this paper the program which structures the related pedagogical approaches and practices will be analyzed and discussed in light of the crowdsourcing principles. Teachers and students participate in the program voluntarily. All collaborative activities, material production and publication of results which take place online and emphasize language learning fulfil the characteristics of the effective use of crowdsourcing in education. Two kinds of analyses are undertaken, a global analysis of the program features and local analysis of the selected projects. The global analysis relates the crowdsourcing practices to the eTwinning activities. The local analysis is based on the outstanding projects submitted for evaluation for national awards in Poland, further exemplified by activities and reference to the public sites of the projects. The aim of the text is to show that teachers may effectively use crowdsourcing in educational practice even when not primarily focused on its application.
{"title":"Application of crowdsourcing in education on the example of eTwinning","authors":"Elżbieta Gajek","doi":"10.4312/slo2.0.2022.2.184-201","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.2.184-201","url":null,"abstract":"While eTwinning is focused on facilitating collaboration among schools in Europe and beyond, the extensive participation of over one million teachers from 44 countries makes the program an extensive educational crowdsourcing activity. In this paper the program which structures the related pedagogical approaches and practices will be analyzed and discussed in light of the crowdsourcing principles. Teachers and students participate in the program voluntarily. All collaborative activities, material production and publication of results which take place online and emphasize language learning fulfil the characteristics of the effective use of crowdsourcing in education. Two kinds of analyses are undertaken, a global analysis of the program features and local analysis of the selected projects. The global analysis relates the crowdsourcing practices to the eTwinning activities. The local analysis is based on the outstanding projects submitted for evaluation for national awards in Poland, further exemplified by activities and reference to the public sites of the projects. The aim of the text is to show that teachers may effectively use crowdsourcing in educational practice even when not primarily focused on its application.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"38 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120902320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One way to stimulate the use of corpora in language education is by making pedagogically appropriate corpora, labeled with different types of problems (sensitive content, offensive language, structural problems). However, manually labeling corpora is extremely time-consuming and a better approach should be found. We thus propose a combination of two approaches to the creation of problem-labeled pedagogical corpora of Dutch, Estonian, Slovene and Brazilian Portuguese: the use of games with a purpose and of crowdsourcing for the task. We conducted initial experiments to establish the suitability of the crowdsourcing task, and used the lessons learned to design the Crowdsourcing for Language Learning (CrowLL) game in which players identify problematic sentences, classify them, and indicate problematic excerpts. The focus of this paper is on data preparation, given the crucial role that such a stage plays in any crowdsourcing project dealing with the creation of language learning resources. We present the methodology for data preparation, offering a detailed presentation of source corpora selection, pedagogically oriented GDEX configurations, and the creation of lemma lists, with a special focus on common and language-dependent decisions. Finally, we offer a discussion of the challenges that emerged and the solutions that have been implemented so far.
{"title":"Data preparation in crowdsourcing for pedagogical purposes","authors":"Tanara Zingano Kuhn, Špela Arhar Holdt, Iztok Kosem, Carole Tiberius, Kristina Koppel, R. Zviel-Girshin","doi":"10.4312/slo2.0.2022.2.62-100","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.2.62-100","url":null,"abstract":"One way to stimulate the use of corpora in language education is by making pedagogically appropriate corpora, labeled with different types of problems (sensitive content, offensive language, structural problems). However, manually labeling corpora is extremely time-consuming and a better approach should be found. We thus propose a combination of two approaches to the creation of problem-labeled pedagogical corpora of Dutch, Estonian, Slovene and Brazilian Portuguese: the use of games with a purpose and of crowdsourcing for the task. We conducted initial experiments to establish the suitability of the crowdsourcing task, and used the lessons learned to design the Crowdsourcing for Language Learning (CrowLL) game in which players identify problematic sentences, classify them, and indicate problematic excerpts. The focus of this paper is on data preparation, given the crucial role that such a stage plays in any crowdsourcing project dealing with the creation of language learning resources. We present the methodology for data preparation, offering a detailed presentation of source corpora selection, pedagogically oriented GDEX configurations, and the creation of lemma lists, with a special focus on common and language-dependent decisions. Finally, we offer a discussion of the challenges that emerged and the solutions that have been implemented so far.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127902044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-29DOI: 10.4312/slo2.0.2022.2.202-225
Lionel Nicolas, V. Lyding
This article reviews the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect), an extensive network project created to foster research and innovation (R&I) on the combination of crowdsourcing and language learning. Accordingly, we explain how it began, introduce its overall logic and organization, and discuss its achievements in terms of both (1) creating a new R&I community through a concluded large network project, and (2) fostering R&I on a high-potential and mostly unexplored subject. We also discuss the challenges involved and lessons learned, whether in orchestrating and leading a new R&I community or the challenges we faced and generally observed in the efforts of enetCollect members, as they explored the many facets of such a versatile enterprise.
{"title":"EnetCollect – European Network for Combining Language Learning with Crowdsourcing Techniques (COST Action CA16105)","authors":"Lionel Nicolas, V. Lyding","doi":"10.4312/slo2.0.2022.2.202-225","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.2.202-225","url":null,"abstract":"This article reviews the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect), an extensive network project created to foster research and innovation (R&I) on the combination of crowdsourcing and language learning. Accordingly, we explain how it began, introduce its overall logic and organization, and discuss its achievements in terms of both (1) creating a new R&I community through a concluded large network project, and (2) fostering R&I on a high-potential and mostly unexplored subject.\u0000We also discuss the challenges involved and lessons learned, whether in orchestrating and leading a new R&I community or the challenges we faced and generally observed in the efforts of enetCollect members, as they explored the many facets of such a versatile enterprise.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124572358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-29DOI: 10.4312/slo2.0.2022.2.101-131
Johannes Graën
This work describes a blueprint for an application that generates language learning exercises from parallel corpora. Word alignment and parallel structures allow for the automatic assessment of sentence pairs in the source and target languages, while users of the application continuously improve the quality of the data with their interactions, thus crowdsourcing parallel language learning material. Through triangulation, their assessment can be transferred to language pairs other than the original ones if multiparallel corpora are used as a source. Several challenges need to be addressed for such an application to work, and we will discuss three of them here. First, the question of how adequate learning material can be identified in corpora has received some attention in the last decade, and we will detail what the structure of parallel corpora implies for that selection. Secondly, we will consider which type of exercises can be generated automatically from parallel corpora such that they foster learning and keep learners motivated. And thirdly, we will highlight the potential of employing users, that is both teachers and learners, as crowdsourcers to help improve the material.
{"title":"Learning languages from parallel corpora","authors":"Johannes Graën","doi":"10.4312/slo2.0.2022.2.101-131","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.2.101-131","url":null,"abstract":"This work describes a blueprint for an application that generates language learning exercises from parallel corpora. Word alignment and parallel structures allow for the automatic assessment of sentence pairs in the source and target languages, while users of the application continuously improve the quality of the data with their interactions, thus crowdsourcing parallel language learning material. Through triangulation, their assessment can be transferred to language pairs other than the original ones if multiparallel corpora are used as a source.\u0000Several challenges need to be addressed for such an application to work, and we will discuss three of them here. First, the question of how adequate learning material can be identified in corpora has received some attention in the last decade, and we will detail what the structure of parallel corpora implies for that selection. Secondly, we will consider which type of exercises can be generated automatically from parallel corpora such that they foster learning and keep learners motivated. And thirdly, we will highlight the potential of employing users, that is both teachers and learners, as crowdsourcers to help improve the material.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133434538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-29DOI: 10.4312/slo2.0.2022.2.1-4
Lionel Nicolas, V. Lyding
The current special issue of the journal Slovenščina 2.0 focuses on the newly explored combination of crowdsourcing, language learning and linguistic resource creation. It contains five articles and one project report, providing insightful discussions on several aspects of this combination, as well as results which help us understand its versatile potential and the challenges to address in order to better exploit it.
{"title":"Crowdsourcing for language learning and linguistic resource creation","authors":"Lionel Nicolas, V. Lyding","doi":"10.4312/slo2.0.2022.2.1-4","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.2.1-4","url":null,"abstract":"The current special issue of the journal Slovenščina 2.0 focuses on the newly explored combination of crowdsourcing, language learning and linguistic resource creation. It contains five articles and one project report, providing insightful discussions on several aspects of this combination, as well as results which help us understand its versatile potential and the challenges to address in order to better exploit it.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133728990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-21DOI: 10.4312/slo2.0.2022.1.131-135
David Bordon
Poročilo s konference Jezikovne tehnologije in digitalna humanistika (JTDH) 2022, ki je potekala septembra 2022 v prostorih Fakultete za družbene vede Univerze v Ljubljani. Konferenco je priredilo Slovensko društvo za jezikovne tehnologije (SDJT), v soorganizaciji s Centrom za jezikovne vire in tehnologije Univerze v Ljubljani (CJVT), Inštitutom za novejšo zgodovino (INZ) ter raziskovalnima infrastrukturama CLARIN.SI in DARIAH-SI.
{"title":"Konferenca Jezikovne tehnologije in digitalna humanistika 2022","authors":"David Bordon","doi":"10.4312/slo2.0.2022.1.131-135","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.1.131-135","url":null,"abstract":"Poročilo s konference Jezikovne tehnologije in digitalna humanistika (JTDH) 2022, ki je potekala septembra 2022 v prostorih Fakultete za družbene vede Univerze v Ljubljani. Konferenco je priredilo Slovensko društvo za jezikovne tehnologije (SDJT), v soorganizaciji s Centrom za jezikovne vire in tehnologije Univerze v Ljubljani (CJVT), Inštitutom za novejšo zgodovino (INZ) ter raziskovalnima infrastrukturama CLARIN.SI in DARIAH-SI.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127753843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-21DOI: 10.4312/slo2.0.2022.1.126-130
M. Brglez
{"title":"33. evropska poletna šola logike, jezika in informatike ESSLLI 2022","authors":"M. Brglez","doi":"10.4312/slo2.0.2022.1.126-130","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.1.126-130","url":null,"abstract":"","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132533771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-21DOI: 10.4312/slo2.0.2022.1.1-22
Jasmin Franza, Bojan Evkoski, Darja Fišer
Texts often express the writer’s emotional state, and it was shown that emotion information has potential for hate speech detection and analysis. In this work, we present a methodology for quantitative analysis of emotion in text. We define a simple, yet effective metric for an overall emotional charge of text based on the NRC Emotion Lexicon and Plutchik’s eight basic emotions. Using this methodology, we investigate the emotional charge of content with socially unacceptable discourse (SUD), as a distinct and potentially harmful type of text which is spreading on social media. We experiment with the proposed method on a corpus of Facebook comments, resulting in four datasets in two languages, namely English and Slovene, and two discussion topics, LGBT+ rights, and the European Migrants crisis. We reveal that SUD content is significantly more emotional than non-SUD comments. Moreover, we show differences in the expression of emotions depending on the language, topic, and target of the comments. Finally, to underpin the findings of the quantitative investigation of emotions, we perform a qualitative analysis of the corpus, exploring in more detail the most frequent emotional words of each emotion, for all four datasets. The qualitative analysis shows that the source of emotions in SUD texts heavily depends on the topic of discussion, with substantial overlaps between languages.
{"title":"Emotion analysis in socially unacceptable discourse","authors":"Jasmin Franza, Bojan Evkoski, Darja Fišer","doi":"10.4312/slo2.0.2022.1.1-22","DOIUrl":"https://doi.org/10.4312/slo2.0.2022.1.1-22","url":null,"abstract":"Texts often express the writer’s emotional state, and it was shown that emotion information has potential for hate speech detection and analysis. In this work, we present a methodology for quantitative analysis of emotion in text. We define a simple, yet effective metric for an overall emotional charge of text based on the NRC Emotion Lexicon and Plutchik’s eight basic emotions. Using this methodology, we investigate the emotional charge of content with socially unacceptable discourse (SUD), as a distinct and potentially harmful type of text which is spreading on social media. We experiment with the proposed method on a corpus of Facebook comments, resulting in four datasets in two languages, namely English and Slovene, and two discussion topics, LGBT+ rights, and the European Migrants crisis. We reveal that SUD content is significantly more emotional than non-SUD comments. Moreover, we show differences in the expression of emotions depending on the language, topic, and target of the comments. Finally, to underpin the findings of the quantitative investigation of emotions, we perform a qualitative analysis of the corpus, exploring in more detail the most frequent emotional words of each emotion, for all four datasets. The qualitative analysis shows that the source of emotions in SUD texts heavily depends on the topic of discussion, with substantial overlaps between languages.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121778980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}