{"title":"为德语在线讨论开发一个不文明词典——一种结合人类和人工知识的半自动方法","authors":"Anke Stoll, L. Wilms, Marc Ziegele","doi":"10.1080/19312458.2023.2166028","DOIUrl":null,"url":null,"abstract":"ABSTRACT Incivility in online discussions has become an important issue in political communication research. Instruments and tools for the automated analysis of uncivil content, however, are rare, especially for non-English user-generated text. In this study, we present a) an extensive dictionary (DIKI - Diktionär für Inzivilität, English: Dictionary for Incivility) to detect incivility in German-language online discussions, and b) a semi-automated two-step-approach that combines manual content analysis with automated keyword collection using a pre-trained word embedding model. We show that DIKI clearly outperforms comparable dictionaries that have been used as alternative instruments to measure incivility (e.g., the LIWC) as well as basic machine learning approaches to text classification. Further, we provide evidence that pre-trained word embeddings can fruitfully be employed in the explorative phase of creating dictionaries. Still, the manual evaluation of DIKI confirms that detecting complex and context-dependent forms of incivility remains challenging and constant update would be needed to maintain performance. Finally, the detailed documentation of the developing and evaluation process of DIKI may serve as a guideline for further research. We therefore provide DIKI as a freely available instrument that also will be applicable in a web interface for drag-and-drop data analysis (diki.limitedminds.org).","PeriodicalId":47552,"journal":{"name":"Communication Methods and Measures","volume":null,"pages":null},"PeriodicalIF":6.3000,"publicationDate":"2023-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Developing an Incivility Dictionary for German Online Discussions – a Semi-Automated Approach Combining Human and Artificial Knowledge\",\"authors\":\"Anke Stoll, L. Wilms, Marc Ziegele\",\"doi\":\"10.1080/19312458.2023.2166028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Incivility in online discussions has become an important issue in political communication research. Instruments and tools for the automated analysis of uncivil content, however, are rare, especially for non-English user-generated text. In this study, we present a) an extensive dictionary (DIKI - Diktionär für Inzivilität, English: Dictionary for Incivility) to detect incivility in German-language online discussions, and b) a semi-automated two-step-approach that combines manual content analysis with automated keyword collection using a pre-trained word embedding model. We show that DIKI clearly outperforms comparable dictionaries that have been used as alternative instruments to measure incivility (e.g., the LIWC) as well as basic machine learning approaches to text classification. Further, we provide evidence that pre-trained word embeddings can fruitfully be employed in the explorative phase of creating dictionaries. Still, the manual evaluation of DIKI confirms that detecting complex and context-dependent forms of incivility remains challenging and constant update would be needed to maintain performance. Finally, the detailed documentation of the developing and evaluation process of DIKI may serve as a guideline for further research. We therefore provide DIKI as a freely available instrument that also will be applicable in a web interface for drag-and-drop data analysis (diki.limitedminds.org).\",\"PeriodicalId\":47552,\"journal\":{\"name\":\"Communication Methods and Measures\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2023-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communication Methods and Measures\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1080/19312458.2023.2166028\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication Methods and Measures","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/19312458.2023.2166028","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 1
摘要
网络讨论中的不文明行为已经成为政治传播研究中的一个重要问题。然而,用于自动分析不文明内容的仪器和工具很少,特别是对于非英语用户生成的文本。在这项研究中,我们提出了a)一个广泛的词典(DIKI - Diktionär f r Inzivilität,英语:dictionary for Incivility)来检测德语在线讨论中的不文明行为,以及b)一种半自动的两步方法,该方法将人工内容分析与使用预训练词嵌入模型的自动关键字收集相结合。我们表明,DIKI明显优于可比较的词典,这些词典已被用作衡量不文明的替代工具(例如,LIWC),以及用于文本分类的基本机器学习方法。此外,我们提供的证据表明,预训练的词嵌入可以有效地用于创建字典的探索阶段。尽管如此,DIKI的人工评估证实,检测复杂和依赖于上下文的不文明形式仍然具有挑战性,需要不断更新以保持性能。最后,对DIKI的发展和评价过程的详细记录可以作为进一步研究的指导。因此,我们提供DIKI作为一个免费的工具,也将适用于拖放数据分析的web界面(diki.limitedminds.org)。
Developing an Incivility Dictionary for German Online Discussions – a Semi-Automated Approach Combining Human and Artificial Knowledge
ABSTRACT Incivility in online discussions has become an important issue in political communication research. Instruments and tools for the automated analysis of uncivil content, however, are rare, especially for non-English user-generated text. In this study, we present a) an extensive dictionary (DIKI - Diktionär für Inzivilität, English: Dictionary for Incivility) to detect incivility in German-language online discussions, and b) a semi-automated two-step-approach that combines manual content analysis with automated keyword collection using a pre-trained word embedding model. We show that DIKI clearly outperforms comparable dictionaries that have been used as alternative instruments to measure incivility (e.g., the LIWC) as well as basic machine learning approaches to text classification. Further, we provide evidence that pre-trained word embeddings can fruitfully be employed in the explorative phase of creating dictionaries. Still, the manual evaluation of DIKI confirms that detecting complex and context-dependent forms of incivility remains challenging and constant update would be needed to maintain performance. Finally, the detailed documentation of the developing and evaluation process of DIKI may serve as a guideline for further research. We therefore provide DIKI as a freely available instrument that also will be applicable in a web interface for drag-and-drop data analysis (diki.limitedminds.org).
期刊介绍:
Communication Methods and Measures aims to achieve several goals in the field of communication research. Firstly, it aims to bring attention to and showcase developments in both qualitative and quantitative research methodologies to communication scholars. This journal serves as a platform for researchers across the field to discuss and disseminate methodological tools and approaches.
Additionally, Communication Methods and Measures seeks to improve research design and analysis practices by offering suggestions for improvement. It aims to introduce new methods of measurement that are valuable to communication scientists or enhance existing methods. The journal encourages submissions that focus on methods for enhancing research design and theory testing, employing both quantitative and qualitative approaches.
Furthermore, the journal is open to articles devoted to exploring the epistemological aspects relevant to communication research methodologies. It welcomes well-written manuscripts that demonstrate the use of methods and articles that highlight the advantages of lesser-known or newer methods over those traditionally used in communication.
In summary, Communication Methods and Measures strives to advance the field of communication research by showcasing and discussing innovative methodologies, improving research practices, and introducing new measurement methods.