In recent years there have been a growing interest in online auditing of information flow over social networks with the goal of monitoring undesirable effects, such as, misinformation and fake news. Most previous work on the subject, focus on the binary classification problem of classifying information as fake or genuine. Nonetheless, in many practical scenarios, the multi-class/label setting is of particular importance. For example, it could be the case that a social media platform may want to distinguish between ``true", ``partly-true", and ``false" information. Accordingly, in this paper, we consider the problem of online multiclass classification of information flow. To that end, driven by empirical studies on information flow over real-world social media networks, we propose a probabilistic information flow model over graphs. Then, the learning task is to detect the label of the information flow, with the goal of minimizing a combination of the classification error and the detection time. For this problem, we propose two detection algorithms; the first is based on the well-known multiple sequential probability ratio test, while the second is a novel graph neural network based sequential decision algorithm. For both algorithms, we prove several strong statistical guarantees. We also construct a data driven algorithm for learning the proposed probabilistic model. Finally, we test our algorithms over two real-world datasets, and show that they outperform other state-of-the-art misinformation detection algorithms, in terms of detection time and classification error.
{"title":"Sequential Classification of Misinformation","authors":"Daniel Toma, Wasim Huleihel","doi":"arxiv-2409.04860","DOIUrl":"https://doi.org/arxiv-2409.04860","url":null,"abstract":"In recent years there have been a growing interest in online auditing of\u0000information flow over social networks with the goal of monitoring undesirable\u0000effects, such as, misinformation and fake news. Most previous work on the\u0000subject, focus on the binary classification problem of classifying information\u0000as fake or genuine. Nonetheless, in many practical scenarios, the\u0000multi-class/label setting is of particular importance. For example, it could be\u0000the case that a social media platform may want to distinguish between ``true\",\u0000``partly-true\", and ``false\" information. Accordingly, in this paper, we\u0000consider the problem of online multiclass classification of information flow.\u0000To that end, driven by empirical studies on information flow over real-world\u0000social media networks, we propose a probabilistic information flow model over\u0000graphs. Then, the learning task is to detect the label of the information flow,\u0000with the goal of minimizing a combination of the classification error and the\u0000detection time. For this problem, we propose two detection algorithms; the\u0000first is based on the well-known multiple sequential probability ratio test,\u0000while the second is a novel graph neural network based sequential decision\u0000algorithm. For both algorithms, we prove several strong statistical guarantees.\u0000We also construct a data driven algorithm for learning the proposed\u0000probabilistic model. Finally, we test our algorithms over two real-world\u0000datasets, and show that they outperform other state-of-the-art misinformation\u0000detection algorithms, in terms of detection time and classification error.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scale-free networks are ubiquitous in social, biological and technological networked systems. Dynamic Scale-free networks and their synchronizations are important to understand and predict the behavior of social, biological and technological networked systems. In this research, computational experiments have been conducted to understand the sustainability of scale-free properties during the time of synchronizations in dynamic scale-free networks. Two synchronization phenomena which are synchronization based on states of nodes with coupling configuration matrix and synchronization based on states of nodes with network centralities have been implemented for the synchronization in dynamic scale-free networks. In experiments, dynamic scale-free networks have been generated with a network generation algorithm and analyzed to understand the fluctuation from the scale-free properties in their phases during the time of synchronizations.
{"title":"Sustainability of Scale-Free Properties in Synchronizations of Dynamic Scale-Free Networks","authors":"Rakib Hassan Pran","doi":"arxiv-2409.08298","DOIUrl":"https://doi.org/arxiv-2409.08298","url":null,"abstract":"Scale-free networks are ubiquitous in social, biological and technological\u0000networked systems. Dynamic Scale-free networks and their synchronizations are\u0000important to understand and predict the behavior of social, biological and\u0000technological networked systems. In this research, computational experiments\u0000have been conducted to understand the sustainability of scale-free properties\u0000during the time of synchronizations in dynamic scale-free networks. Two\u0000synchronization phenomena which are synchronization based on states of nodes\u0000with coupling configuration matrix and synchronization based on states of nodes\u0000with network centralities have been implemented for the synchronization in\u0000dynamic scale-free networks. In experiments, dynamic scale-free networks have\u0000been generated with a network generation algorithm and analyzed to understand\u0000the fluctuation from the scale-free properties in their phases during the time\u0000of synchronizations.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detecting false information on social media is critical in mitigating its negative societal impacts. To reduce the propagation of false information, automated detection provide scalable, unbiased, and cost-effective methods. However, there are three potential research areas identified which once solved improve detection. First, current AI-based solutions often provide a uni-dimensional analysis on a complex, multi-dimensional issue, with solutions differing based on the features used. Furthermore, these methods do not account for the temporal and dynamic changes observed within the document's life cycle. Second, there has been little research on the detection of coordinated information campaigns and in understanding the intent of the actors and the campaign. Thirdly, there is a lack of consideration of cross-platform analysis, with existing datasets focusing on a single platform, such as X, and detection models designed for specific platform. This work aims to develop methods for effective detection of false information and its propagation. To this end, firstly we aim to propose the creation of an ensemble multi-faceted framework that leverages multiple aspects of false information. Secondly, we propose a method to identify actors and their intent when working in coordination to manipulate a narrative. Thirdly, we aim to analyse the impact of cross-platform interactions on the propagation of false information via the creation of a new dataset.
检测社交媒体上的虚假信息对于减轻其负面社会影响至关重要。为了减少虚假信息的传播,自动检测提供了可扩展、无偏见和具有成本效益的方法。然而,目前发现了三个潜在的研究领域,一旦解决了这些问题,就能提高检测能力。首先,当前基于人工智能的解决方案通常会对复杂的多维问题进行单维分析,根据所使用的特征,解决方案会有所不同。此外,这些方法没有考虑到在文档生命周期内观察到的时间和动态变化。第二,在检测协调信息活动以及理解参与者和活动意图方面的研究很少。第三,缺乏对跨平台分析的考虑,现有的数据集主要集中在 X 等单一平台上,检测模型也是针对特定平台设计的。这项工作旨在开发有效检测虚假信息及其传播的方法。为此,我们首先提出创建一个多元集合框架,以利用虚假信息的多个方面。其次,我们提出了一种方法来识别行动者以及他们在协同操纵叙事时的意图。第三,我们旨在通过创建一个新的数据集,分析跨平台互动对虚假信息传播的影响。
{"title":"The Veracity Problem: Detecting False Information and its Propagation on Online Social Media Networks","authors":"Sarah Condran","doi":"arxiv-2409.03948","DOIUrl":"https://doi.org/arxiv-2409.03948","url":null,"abstract":"Detecting false information on social media is critical in mitigating its\u0000negative societal impacts. To reduce the propagation of false information,\u0000automated detection provide scalable, unbiased, and cost-effective methods.\u0000However, there are three potential research areas identified which once solved\u0000improve detection. First, current AI-based solutions often provide a\u0000uni-dimensional analysis on a complex, multi-dimensional issue, with solutions\u0000differing based on the features used. Furthermore, these methods do not account\u0000for the temporal and dynamic changes observed within the document's life cycle.\u0000Second, there has been little research on the detection of coordinated\u0000information campaigns and in understanding the intent of the actors and the\u0000campaign. Thirdly, there is a lack of consideration of cross-platform analysis,\u0000with existing datasets focusing on a single platform, such as X, and detection\u0000models designed for specific platform. This work aims to develop methods for effective detection of false\u0000information and its propagation. To this end, firstly we aim to propose the\u0000creation of an ensemble multi-faceted framework that leverages multiple aspects\u0000of false information. Secondly, we propose a method to identify actors and\u0000their intent when working in coordination to manipulate a narrative. Thirdly,\u0000we aim to analyse the impact of cross-platform interactions on the propagation\u0000of false information via the creation of a new dataset.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Wang, Shubham Jain, Yingtong Dou, Junpeng Wang, Chin-Chia Michael Yeh, Yujie Fan, Prince Aboagye, Yan Zheng, Xin Dai, Zhongfang Zhuang, Uday Singh Saini, Wei Zhang
Numerous algorithms have been developed for online product rating prediction, but the specific influence of user and product information in determining the final prediction score remains largely unexplored. Existing research often relies on narrowly defined data settings, which overlooks real-world challenges such as the cold-start problem, cross-category information utilization, and scalability and deployment issues. To delve deeper into these aspects, and particularly to uncover the roles of individual user taste and collective wisdom, we propose a unique and practical approach that emphasizes historical ratings at both the user and product levels, encapsulated using a continuously updated dynamic tree representation. This representation effectively captures the temporal dynamics of users and products, leverages user information across product categories, and provides a natural solution to the cold-start problem. Furthermore, we have developed an efficient data processing strategy that makes this approach highly scalable and easily deployable. Comprehensive experiments in real industry settings demonstrate the effectiveness of our approach. Notably, our findings reveal that individual taste dominates over collective wisdom in online product rating prediction, a perspective that contrasts with the commonly observed wisdom of the crowd phenomenon in other domains. This dominance of individual user taste is consistent across various model types, including the boosting tree model, recurrent neural network (RNN), and transformer-based architectures. This observation holds true across the overall population, within individual product categories, and in cold-start scenarios. Our findings underscore the significance of individual user tastes in the context of online product rating prediction and the robustness of our approach across different model architectures.
{"title":"Preserving Individuality while Following the Crowd: Understanding the Role of User Taste and Crowd Wisdom in Online Product Rating Prediction","authors":"Liang Wang, Shubham Jain, Yingtong Dou, Junpeng Wang, Chin-Chia Michael Yeh, Yujie Fan, Prince Aboagye, Yan Zheng, Xin Dai, Zhongfang Zhuang, Uday Singh Saini, Wei Zhang","doi":"arxiv-2409.04649","DOIUrl":"https://doi.org/arxiv-2409.04649","url":null,"abstract":"Numerous algorithms have been developed for online product rating prediction,\u0000but the specific influence of user and product information in determining the\u0000final prediction score remains largely unexplored. Existing research often\u0000relies on narrowly defined data settings, which overlooks real-world challenges\u0000such as the cold-start problem, cross-category information utilization, and\u0000scalability and deployment issues. To delve deeper into these aspects, and\u0000particularly to uncover the roles of individual user taste and collective\u0000wisdom, we propose a unique and practical approach that emphasizes historical\u0000ratings at both the user and product levels, encapsulated using a continuously\u0000updated dynamic tree representation. This representation effectively captures\u0000the temporal dynamics of users and products, leverages user information across\u0000product categories, and provides a natural solution to the cold-start problem.\u0000Furthermore, we have developed an efficient data processing strategy that makes\u0000this approach highly scalable and easily deployable. Comprehensive experiments\u0000in real industry settings demonstrate the effectiveness of our approach.\u0000Notably, our findings reveal that individual taste dominates over collective\u0000wisdom in online product rating prediction, a perspective that contrasts with\u0000the commonly observed wisdom of the crowd phenomenon in other domains. This\u0000dominance of individual user taste is consistent across various model types,\u0000including the boosting tree model, recurrent neural network (RNN), and\u0000transformer-based architectures. This observation holds true across the overall\u0000population, within individual product categories, and in cold-start scenarios.\u0000Our findings underscore the significance of individual user tastes in the\u0000context of online product rating prediction and the robustness of our approach\u0000across different model architectures.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When discussing difficult topics online, is it common to meaningfully engage with people from diverse perspectives? Why or why not? Could features of the online environment be redesigned to encourage civil conversation across difference? In this paper, we study discussions of gun policy on Reddit, with the overarching goal of developing insights into the potential of the internet to support understanding across difference. We use two methods: a clustering analysis of Reddit posts to contribute insights about what people discuss, and an interview study of twenty Reddit users to help us understand why certain kinds of conversation take place and others don't. We find that the discussion of gun politics falls into three groups: conservative pro-gun, liberal pro-gun, and liberal anti-gun. Each type of group has its own characteristic topics. While our subjects state that they would be willing to engage with others across the ideological divide, in practice they rarely do. Subjects are siloed into like-minded subreddits through a two-pronged effect, where they are simultaneously pushed away from opposing-view communities while actively seeking belonging in like-minded ones. Another contributing factor is Reddit's "karma" mechanism: fear of being downvoted and losing karma points and social approval of peers causes our subjects to hesitate to say anything in conflict with group norms. The pseudonymous nature of discussion on Reddit plays a complex role, with some subjects finding it freeing and others fearing reprisal from others not bound by face-to-face norms of politeness. Our subjects believe that content moderation can help ameliorate these issues; however, our findings suggest that moderators need different tools to do so effectively. We conclude by suggesting platform design changes that might increase discussion across difference.
{"title":"Understanding Online Discussion Across Difference: Insights from Gun Discourse on Reddit","authors":"Rijul Magu, Nivedhitha Mathan Kumar, Yihe Liu, Xander Koo, Diyi Yang, Amy Bruckman","doi":"arxiv-2409.03989","DOIUrl":"https://doi.org/arxiv-2409.03989","url":null,"abstract":"When discussing difficult topics online, is it common to meaningfully engage\u0000with people from diverse perspectives? Why or why not? Could features of the\u0000online environment be redesigned to encourage civil conversation across\u0000difference? In this paper, we study discussions of gun policy on Reddit, with\u0000the overarching goal of developing insights into the potential of the internet\u0000to support understanding across difference. We use two methods: a clustering\u0000analysis of Reddit posts to contribute insights about what people discuss, and\u0000an interview study of twenty Reddit users to help us understand why certain\u0000kinds of conversation take place and others don't. We find that the discussion\u0000of gun politics falls into three groups: conservative pro-gun, liberal pro-gun,\u0000and liberal anti-gun. Each type of group has its own characteristic topics.\u0000While our subjects state that they would be willing to engage with others\u0000across the ideological divide, in practice they rarely do. Subjects are siloed\u0000into like-minded subreddits through a two-pronged effect, where they are\u0000simultaneously pushed away from opposing-view communities while actively\u0000seeking belonging in like-minded ones. Another contributing factor is Reddit's\u0000\"karma\" mechanism: fear of being downvoted and losing karma points and social\u0000approval of peers causes our subjects to hesitate to say anything in conflict\u0000with group norms. The pseudonymous nature of discussion on Reddit plays a\u0000complex role, with some subjects finding it freeing and others fearing reprisal\u0000from others not bound by face-to-face norms of politeness. Our subjects believe\u0000that content moderation can help ameliorate these issues; however, our findings\u0000suggest that moderators need different tools to do so effectively. We conclude\u0000by suggesting platform design changes that might increase discussion across\u0000difference.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"416 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Millions of people use online social networks to reinforce their sense of belonging, for example by giving and asking for feedback as a form of social validation and self-recognition. It is common to observe disagreement among people beliefs and points of view when expressing this feedback. Modeling and analyzing such interactions is crucial to understand social phenomena that happen when people face different opinions while expressing and discussing their values. In this work, we study a Reddit community in which people participate to judge or be judged with respect to some behavior, as it represents a valuable source to study how users express judgments online. We model threads of this community as complex networks of user interactions growing in time, and we analyze the evolution of their structural properties. We show that the evolution of Reddit networks differ from other real social networks, despite falling in the same category. This happens because their global clustering coefficient is extremely small and the average shortest path length increases over time. Such properties reveal how users discuss in threads, i.e. with mostly one other user and often by a single message. We strengthen such result by analyzing the role that disagreement and reciprocity play in such conversations. We also show that Reddit thread's evolution over time is governed by two subgraphs growing at different speeds. We discover that, in the studied community, the difference of such speed is higher than in other communities because of the user guidelines enforcing specific user interactions. Finally, we interpret the obtained results on user behavior drawing back to Social Judgment Theory.
{"title":"Structure and dynamics of growing networks of Reddit threads","authors":"Diletta Goglia, Davide Vega","doi":"arxiv-2409.04085","DOIUrl":"https://doi.org/arxiv-2409.04085","url":null,"abstract":"Millions of people use online social networks to reinforce their sense of\u0000belonging, for example by giving and asking for feedback as a form of social\u0000validation and self-recognition. It is common to observe disagreement among\u0000people beliefs and points of view when expressing this feedback. Modeling and\u0000analyzing such interactions is crucial to understand social phenomena that\u0000happen when people face different opinions while expressing and discussing\u0000their values. In this work, we study a Reddit community in which people\u0000participate to judge or be judged with respect to some behavior, as it\u0000represents a valuable source to study how users express judgments online. We\u0000model threads of this community as complex networks of user interactions\u0000growing in time, and we analyze the evolution of their structural properties.\u0000We show that the evolution of Reddit networks differ from other real social\u0000networks, despite falling in the same category. This happens because their\u0000global clustering coefficient is extremely small and the average shortest path\u0000length increases over time. Such properties reveal how users discuss in\u0000threads, i.e. with mostly one other user and often by a single message. We\u0000strengthen such result by analyzing the role that disagreement and reciprocity\u0000play in such conversations. We also show that Reddit thread's evolution over\u0000time is governed by two subgraphs growing at different speeds. We discover\u0000that, in the studied community, the difference of such speed is higher than in\u0000other communities because of the user guidelines enforcing specific user\u0000interactions. Finally, we interpret the obtained results on user behavior\u0000drawing back to Social Judgment Theory.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A signed graph (SG) is a graph where edges carry sign information attached to it. The sign of a network can be positive, negative, or neutral. A signed network is ubiquitous in a real-world network like social networks, citation networks, and various technical networks. There are many network embedding models have been proposed and developed for signed networks for both homogeneous and heterogeneous types. SG embedding learns low-dimensional vector representations for nodes of a network, which helps to do many network analysis tasks such as link prediction, node classification, and community detection. In this survey, we perform a comprehensive study of SG embedding methods and applications. We introduce here the basic theories and methods of SGs and survey the current state of the art of signed graph embedding methods. In addition, we explore the applications of different types of SG embedding methods in real-world scenarios. As an application, we have explored the citation network to analyze authorship networks. We also provide source code and datasets to give future direction. Lastly, we explore the challenges of SG embedding and forecast various future research directions in this field.
{"title":"A Survey on Signed Graph Embedding: Methods and Applications","authors":"Shrabani Ghosh","doi":"arxiv-2409.03916","DOIUrl":"https://doi.org/arxiv-2409.03916","url":null,"abstract":"A signed graph (SG) is a graph where edges carry sign information attached to\u0000it. The sign of a network can be positive, negative, or neutral. A signed\u0000network is ubiquitous in a real-world network like social networks, citation\u0000networks, and various technical networks. There are many network embedding\u0000models have been proposed and developed for signed networks for both\u0000homogeneous and heterogeneous types. SG embedding learns low-dimensional vector\u0000representations for nodes of a network, which helps to do many network analysis\u0000tasks such as link prediction, node classification, and community detection. In\u0000this survey, we perform a comprehensive study of SG embedding methods and\u0000applications. We introduce here the basic theories and methods of SGs and\u0000survey the current state of the art of signed graph embedding methods. In\u0000addition, we explore the applications of different types of SG embedding\u0000methods in real-world scenarios. As an application, we have explored the\u0000citation network to analyze authorship networks. We also provide source code\u0000and datasets to give future direction. Lastly, we explore the challenges of SG\u0000embedding and forecast various future research directions in this field.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although currently one of the most popular instant messaging apps worldwide, Telegram has been largely understudied in the past years. In this paper, we aim to address this gap by presenting an analysis of publicly accessible groups covering discussions encompassing different topics, as diverse as Education, Erotic, Politics, and Cryptocurrencies. We engineer and offer an open-source tool to automate the collection of messages from Telegram groups, a non-straightforward problem. We use it to collect more than 50 million messages from 669 groups. Here, we present a first-of-its-kind, per-topic analysis, contrasting the characteristics of the messages sent on the platform from different angles -- the language, the presence of bots, the type and volume of shared media content. Our results confirm some anecdotal evidence, e.g., clues that Telegram is used to share possibly illicit content, and unveil some unexpected findings, e.g., the different sharing patterns of video and stickers in groups of different topics. While preliminary, we hope that our work paves the road for several avenues of future research on the understudied Telegram platform.
{"title":"A Topic-wise Exploration of the Telegram Group-verse","authors":"Alessandro Perlo, Giordano Paoletti, Nikhil Jha, Luca Vassio, Jussara Almeida, Marco Mellia","doi":"arxiv-2409.02525","DOIUrl":"https://doi.org/arxiv-2409.02525","url":null,"abstract":"Although currently one of the most popular instant messaging apps worldwide,\u0000Telegram has been largely understudied in the past years. In this paper, we aim\u0000to address this gap by presenting an analysis of publicly accessible groups\u0000covering discussions encompassing different topics, as diverse as Education,\u0000Erotic, Politics, and Cryptocurrencies. We engineer and offer an open-source\u0000tool to automate the collection of messages from Telegram groups, a\u0000non-straightforward problem. We use it to collect more than 50 million messages\u0000from 669 groups. Here, we present a first-of-its-kind, per-topic analysis,\u0000contrasting the characteristics of the messages sent on the platform from\u0000different angles -- the language, the presence of bots, the type and volume of\u0000shared media content. Our results confirm some anecdotal evidence, e.g., clues\u0000that Telegram is used to share possibly illicit content, and unveil some\u0000unexpected findings, e.g., the different sharing patterns of video and stickers\u0000in groups of different topics. While preliminary, we hope that our work paves\u0000the road for several avenues of future research on the understudied Telegram\u0000platform.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fake news detection remains a critical challenge in today's rapidly evolving digital landscape, where misinformation can spread faster than ever before. Traditional fake news detection models often rely on static datasets and auxiliary information, such as metadata or social media interactions, which limits their adaptability to real-time scenarios. Recent advancements in Large Language Models (LLMs) have demonstrated significant potential in addressing these challenges due to their extensive pre-trained knowledge and ability to analyze textual content without relying on auxiliary data. However, many of these LLM-based approaches are still rooted in static datasets, with limited exploration into their real-time processing capabilities. This paper presents a systematic evaluation of both traditional offline models and state-of-the-art LLMs for real-time fake news detection. We demonstrate the limitations of existing offline models, including their inability to adapt to dynamic misinformation patterns. Furthermore, we show that newer LLM models with online capabilities, such as GPT-4, Claude, and Gemini, are better suited for detecting emerging fake news in real-time contexts. Our findings emphasize the importance of transitioning from offline to online LLM models for real-time fake news detection. Additionally, the public accessibility of LLMs enhances their scalability and democratizes the tools needed to combat misinformation. By leveraging real-time data, our work marks a significant step toward more adaptive, effective, and scalable fake news detection systems.
{"title":"A Comparative Study of Offline Models and Online LLMs in Fake News Detection","authors":"Ruoyu Xu, Gaoxiang Li","doi":"arxiv-2409.03067","DOIUrl":"https://doi.org/arxiv-2409.03067","url":null,"abstract":"Fake news detection remains a critical challenge in today's rapidly evolving\u0000digital landscape, where misinformation can spread faster than ever before.\u0000Traditional fake news detection models often rely on static datasets and\u0000auxiliary information, such as metadata or social media interactions, which\u0000limits their adaptability to real-time scenarios. Recent advancements in Large\u0000Language Models (LLMs) have demonstrated significant potential in addressing\u0000these challenges due to their extensive pre-trained knowledge and ability to\u0000analyze textual content without relying on auxiliary data. However, many of\u0000these LLM-based approaches are still rooted in static datasets, with limited\u0000exploration into their real-time processing capabilities. This paper presents a\u0000systematic evaluation of both traditional offline models and state-of-the-art\u0000LLMs for real-time fake news detection. We demonstrate the limitations of\u0000existing offline models, including their inability to adapt to dynamic\u0000misinformation patterns. Furthermore, we show that newer LLM models with online\u0000capabilities, such as GPT-4, Claude, and Gemini, are better suited for\u0000detecting emerging fake news in real-time contexts. Our findings emphasize the\u0000importance of transitioning from offline to online LLM models for real-time\u0000fake news detection. Additionally, the public accessibility of LLMs enhances\u0000their scalability and democratizes the tools needed to combat misinformation.\u0000By leveraging real-time data, our work marks a significant step toward more\u0000adaptive, effective, and scalable fake news detection systems.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arianna Muti, Federico Ruggeri, Khalid Al-Khatib, Alberto Barrón-Cedeño, Tommaso Caselli
We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to understand the implicit reasoning used to convey misogyny in both Italian and English. The central aim is to generate the missing reasoning link between a message and the implied meanings encoding the misogyny. Our study uses argumentation theory as a foundation to form a collection of prompts in both zero-shot and few-shot settings. These prompts integrate different techniques, including chain-of-thought reasoning and augmented knowledge. Our findings show that LLMs fall short on reasoning capabilities about misogynistic comments and that they mostly rely on their implicit knowledge derived from internalized common stereotypes about women to generate implied assumptions, rather than on inductive reasoning.
{"title":"Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts","authors":"Arianna Muti, Federico Ruggeri, Khalid Al-Khatib, Alberto Barrón-Cedeño, Tommaso Caselli","doi":"arxiv-2409.02519","DOIUrl":"https://doi.org/arxiv-2409.02519","url":null,"abstract":"We propose misogyny detection as an Argumentative Reasoning task and we\u0000investigate the capacity of large language models (LLMs) to understand the\u0000implicit reasoning used to convey misogyny in both Italian and English. The\u0000central aim is to generate the missing reasoning link between a message and the\u0000implied meanings encoding the misogyny. Our study uses argumentation theory as\u0000a foundation to form a collection of prompts in both zero-shot and few-shot\u0000settings. These prompts integrate different techniques, including\u0000chain-of-thought reasoning and augmented knowledge. Our findings show that LLMs\u0000fall short on reasoning capabilities about misogynistic comments and that they\u0000mostly rely on their implicit knowledge derived from internalized common\u0000stereotypes about women to generate implied assumptions, rather than on\u0000inductive reasoning.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}