Amirhossein Dezhboro, Jose Emmanuel Ramirez-Marquez, Aleksandra Krstikj
This research presents a framework for analyzing the dynamics of online communities in social media platforms, utilizing a temporal fusion of text and network data. By combining text classification and dynamic social network analysis, we uncover mechanisms driving community formation and evolution, revealing the influence of real-world events. We introduced fourteen key elements based on social science theories to evaluate social media dynamics, validating our framework through a case study of Twitter data during major U.S. events in 2020. Our analysis centers on discrimination discourse, identifying sexism, racism, xenophobia, ableism, homophobia, and religious intolerance as main fragments. Results demonstrate rapid community emergence and dissolution cycles representative of discourse fragments. We reveal how real-world circumstances impact discourse dominance and how social media contributes to echo chamber formation and societal polarization. Our comprehensive approach provides insights into discourse fragmentation, opinion dynamics, and structural aspects of online communities, offering a methodology for understanding the complex interplay between online interactions and societal trends.
{"title":"Community Shaping in the Digital Age: A Temporal Fusion Framework for Analyzing Discourse Fragmentation in Online Social Networks","authors":"Amirhossein Dezhboro, Jose Emmanuel Ramirez-Marquez, Aleksandra Krstikj","doi":"arxiv-2409.11665","DOIUrl":"https://doi.org/arxiv-2409.11665","url":null,"abstract":"This research presents a framework for analyzing the dynamics of online\u0000communities in social media platforms, utilizing a temporal fusion of text and\u0000network data. By combining text classification and dynamic social network\u0000analysis, we uncover mechanisms driving community formation and evolution,\u0000revealing the influence of real-world events. We introduced fourteen key\u0000elements based on social science theories to evaluate social media dynamics,\u0000validating our framework through a case study of Twitter data during major U.S.\u0000events in 2020. Our analysis centers on discrimination discourse, identifying\u0000sexism, racism, xenophobia, ableism, homophobia, and religious intolerance as\u0000main fragments. Results demonstrate rapid community emergence and dissolution\u0000cycles representative of discourse fragments. We reveal how real-world\u0000circumstances impact discourse dominance and how social media contributes to\u0000echo chamber formation and societal polarization. Our comprehensive approach\u0000provides insights into discourse fragmentation, opinion dynamics, and\u0000structural aspects of online communities, offering a methodology for\u0000understanding the complex interplay between online interactions and societal\u0000trends.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qing Xiao, Xianzhe Fan, Felix M. Simon, Bingbing Zhang, Motahhare Eslami
Recently, an increasing number of news organizations have integrated artificial intelligence (AI) into their workflows, leading to a further influx of AI technologists and data workers into the news industry. This has initiated cross-functional collaborations between these professionals and journalists. While prior research has explored the impact of AI-related roles entering the news industry, there is a lack of studies on how cross-functional collaboration unfolds between AI professionals and journalists. Through interviews with 17 journalists, 6 AI technologists, and 3 AI workers with cross-functional experience from leading news organizations, we investigate the current practices, challenges, and opportunities for cross-functional collaboration around AI in today's news industry. We first study how journalists and AI professionals perceive existing cross-collaboration strategies. We further explore the challenges of cross-functional collaboration and provide recommendations for enhancing future cross-functional collaboration around AI in the news industry.
{"title":"\"It Might be Technically Impressive, But It's Practically Useless to Us\": Practices, Challenges, and Opportunities for Cross-Functional Collaboration around AI within the News Industry","authors":"Qing Xiao, Xianzhe Fan, Felix M. Simon, Bingbing Zhang, Motahhare Eslami","doi":"arxiv-2409.12000","DOIUrl":"https://doi.org/arxiv-2409.12000","url":null,"abstract":"Recently, an increasing number of news organizations have integrated\u0000artificial intelligence (AI) into their workflows, leading to a further influx\u0000of AI technologists and data workers into the news industry. This has initiated\u0000cross-functional collaborations between these professionals and journalists.\u0000While prior research has explored the impact of AI-related roles entering the\u0000news industry, there is a lack of studies on how cross-functional collaboration\u0000unfolds between AI professionals and journalists. Through interviews with 17\u0000journalists, 6 AI technologists, and 3 AI workers with cross-functional\u0000experience from leading news organizations, we investigate the current\u0000practices, challenges, and opportunities for cross-functional collaboration\u0000around AI in today's news industry. We first study how journalists and AI\u0000professionals perceive existing cross-collaboration strategies. We further\u0000explore the challenges of cross-functional collaboration and provide\u0000recommendations for enhancing future cross-functional collaboration around AI\u0000in the news industry.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge graphs have been shown to play a significant role in current knowledge mining fields, including life sciences, bioinformatics, computational social sciences, and social network analysis. The problem of link prediction bears many applications and has been extensively studied. However, most methods are restricted to dimension reduction, probabilistic model, or similarity-based approaches and are inherently biased. In this paper, we provide a definition of graph prediction for link prediction and outline related work to support our novel approach, which integrates centrality measures with classical machine learning methods. We examine our experimental results in detail and identify areas for potential further research. Our method shows promise, particularly when utilizing randomly selected nodes and degree centrality.
{"title":"A novel DFS/BFS approach towards link prediction","authors":"Jens Dörpinghaus, Tobias Hübenthal, Denis Stepanov","doi":"arxiv-2409.11687","DOIUrl":"https://doi.org/arxiv-2409.11687","url":null,"abstract":"Knowledge graphs have been shown to play a significant role in current\u0000knowledge mining fields, including life sciences, bioinformatics, computational\u0000social sciences, and social network analysis. The problem of link prediction\u0000bears many applications and has been extensively studied. However, most methods\u0000are restricted to dimension reduction, probabilistic model, or similarity-based\u0000approaches and are inherently biased. In this paper, we provide a definition of\u0000graph prediction for link prediction and outline related work to support our\u0000novel approach, which integrates centrality measures with classical machine\u0000learning methods. We examine our experimental results in detail and identify\u0000areas for potential further research. Our method shows promise, particularly\u0000when utilizing randomly selected nodes and degree centrality.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Finding the perfect match between a job proposal and a set of freelancers is not an easy task to perform at scale, especially in multiple languages. In this paper, we propose a novel neural retriever architecture that tackles this problem in a multilingual setting. Our method encodes project descriptions and freelancer profiles by leveraging pre-trained multilingual language models. The latter are used as backbone for a custom transformer architecture that aims to keep the structure of the profiles and project. This model is trained with a contrastive loss on historical data. Thanks to several experiments, we show that this approach effectively captures skill matching similarity and facilitates efficient matching, outperforming traditional methods.
{"title":"Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval","authors":"Warren Jouanneau, Marc Palyart, Emma Jouffroy","doi":"arxiv-2409.12097","DOIUrl":"https://doi.org/arxiv-2409.12097","url":null,"abstract":"Finding the perfect match between a job proposal and a set of freelancers is\u0000not an easy task to perform at scale, especially in multiple languages. In this\u0000paper, we propose a novel neural retriever architecture that tackles this\u0000problem in a multilingual setting. Our method encodes project descriptions and\u0000freelancer profiles by leveraging pre-trained multilingual language models. The\u0000latter are used as backbone for a custom transformer architecture that aims to\u0000keep the structure of the profiles and project. This model is trained with a\u0000contrastive loss on historical data. Thanks to several experiments, we show\u0000that this approach effectively captures skill matching similarity and\u0000facilitates efficient matching, outperforming traditional methods.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esa Palosaari, Ted Hsuan Yun Chen, Arttu Malkamäki, Mikko Kivelä
On social media, the boundaries between people's private and public lives often blur. The need to navigate both roles, which are governed by distinct norms, impacts how individuals conduct themselves online, and presents methodological challenges for researchers. We conduct a systematic exploration on how an organization's official Twitter accounts and its members' personal accounts differ. Using a climate change Twitter data set as our case, we find substantial differences in activity and connectivity across the organizational levels we examined. The levels differed considerably in their overall retweet network structures, and accounts within each level were more likely to have similar connections than accounts at different levels. We illustrate the implications of these differences for applied research by showing that the levels closer to the core of the organization display more sectoral homophily but less triadic closure, and how each level consists of very different group structures. Our results show that the common practice of solely analyzing accounts from a single organizational level, grouping together all levels, or excluding certain levels can lead to a skewed understanding of how organizations are represented on social media.
{"title":"My Views Do Not Reflect Those of My Employer: Differences in Behavior of Organizations' Official and Personal Social Media Accounts","authors":"Esa Palosaari, Ted Hsuan Yun Chen, Arttu Malkamäki, Mikko Kivelä","doi":"arxiv-2409.11759","DOIUrl":"https://doi.org/arxiv-2409.11759","url":null,"abstract":"On social media, the boundaries between people's private and public lives\u0000often blur. The need to navigate both roles, which are governed by distinct\u0000norms, impacts how individuals conduct themselves online, and presents\u0000methodological challenges for researchers. We conduct a systematic exploration\u0000on how an organization's official Twitter accounts and its members' personal\u0000accounts differ. Using a climate change Twitter data set as our case, we find\u0000substantial differences in activity and connectivity across the organizational\u0000levels we examined. The levels differed considerably in their overall retweet\u0000network structures, and accounts within each level were more likely to have\u0000similar connections than accounts at different levels. We illustrate the\u0000implications of these differences for applied research by showing that the\u0000levels closer to the core of the organization display more sectoral homophily\u0000but less triadic closure, and how each level consists of very different group\u0000structures. Our results show that the common practice of solely analyzing\u0000accounts from a single organizational level, grouping together all levels, or\u0000excluding certain levels can lead to a skewed understanding of how\u0000organizations are represented on social media.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We analyze the token transfer network on Ethereum, focusing on accounts associated with Alameda Research, a cryptocurrency trading firm implicated in the misuse of FTX customer funds. Using a multi-token network representation, we examine node centralities and the network backbone to identify critical accounts, tokens, and activity groups. The temporal evolution of Alameda accounts reveals shifts in token accumulation and distribution patterns leading up to its bankruptcy in November 2022. Through network analysis, our work offers insights into the activities and dynamics that shape the DeFi ecosystem.
我们分析了以太坊上的代币转移网络,重点关注与 Alameda Research 相关的账户,Alameda Research 是一家加密货币交易公司,卷入了 FTX 客户资金滥用事件。利用多代币网络表示法,我们研究了节点中心性和网络主干,以识别关键账户、代币和活动组。Alameda 账户的时间演化揭示了代币积累和分配模式的变化,这些变化导致其于 2022 年 11 月破产。通过网络分析,我们的研究深入揭示了形成 DeFi 生态系统的活动和动态。
{"title":"Inside Alameda Research: A Multi-Token Network Analysis","authors":"Célestin Coquidé, Rémy Cazabet, Natkamon Tovanich","doi":"arxiv-2409.10949","DOIUrl":"https://doi.org/arxiv-2409.10949","url":null,"abstract":"We analyze the token transfer network on Ethereum, focusing on accounts\u0000associated with Alameda Research, a cryptocurrency trading firm implicated in\u0000the misuse of FTX customer funds. Using a multi-token network representation,\u0000we examine node centralities and the network backbone to identify critical\u0000accounts, tokens, and activity groups. The temporal evolution of Alameda\u0000accounts reveals shifts in token accumulation and distribution patterns leading\u0000up to its bankruptcy in November 2022. Through network analysis, our work\u0000offers insights into the activities and dynamics that shape the DeFi ecosystem.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph machine learning, particularly using graph neural networks, fundamentally relies on node features. Nevertheless, numerous real-world systems, such as social and biological networks, often lack node features due to various reasons, including privacy concerns, incomplete or missing data, and limitations in data collection. In such scenarios, researchers typically resort to methods like structural and positional encoding to construct node features. However, the length of such features is contingent on the maximum value within the property being encoded, for example, the highest node degree, which can be exceedingly large in applications like scale-free networks. Furthermore, these encoding schemes are limited to categorical data and might not be able to encode metrics returning other type of values. In this paper, we introduce a novel, universally applicable encoder, termed PropEnc, which constructs expressive node embedding from any given graph metric. PropEnc leverages histogram construction combined with reverse index encoding, offering a flexible method for node features initialization. It supports flexible encoding in terms of both dimensionality and type of input, demonstrating its effectiveness across diverse applications. PropEnc allows encoding metrics in low-dimensional space which effectively avoids the issue of sparsity and enhances the efficiency of the models. We show that emph{PropEnc} can construct node features that either exactly replicate one-hot encoding or closely approximate indices under various settings. Our extensive evaluations in graph classification setting across multiple social networks that lack node features support our hypothesis. The empirical results conclusively demonstrate that PropEnc is both an efficient and effective mechanism for constructing node features from diverse set of graph metrics.
{"title":"A Property Encoder for Graph Neural Networks","authors":"Anwar Said, Xenofon Koutsoukos","doi":"arxiv-2409.11554","DOIUrl":"https://doi.org/arxiv-2409.11554","url":null,"abstract":"Graph machine learning, particularly using graph neural networks,\u0000fundamentally relies on node features. Nevertheless, numerous real-world\u0000systems, such as social and biological networks, often lack node features due\u0000to various reasons, including privacy concerns, incomplete or missing data, and\u0000limitations in data collection. In such scenarios, researchers typically resort\u0000to methods like structural and positional encoding to construct node features.\u0000However, the length of such features is contingent on the maximum value within\u0000the property being encoded, for example, the highest node degree, which can be\u0000exceedingly large in applications like scale-free networks. Furthermore, these\u0000encoding schemes are limited to categorical data and might not be able to\u0000encode metrics returning other type of values. In this paper, we introduce a\u0000novel, universally applicable encoder, termed PropEnc, which constructs\u0000expressive node embedding from any given graph metric. PropEnc leverages\u0000histogram construction combined with reverse index encoding, offering a\u0000flexible method for node features initialization. It supports flexible encoding\u0000in terms of both dimensionality and type of input, demonstrating its\u0000effectiveness across diverse applications. PropEnc allows encoding metrics in\u0000low-dimensional space which effectively avoids the issue of sparsity and\u0000enhances the efficiency of the models. We show that emph{PropEnc} can\u0000construct node features that either exactly replicate one-hot encoding or\u0000closely approximate indices under various settings. Our extensive evaluations\u0000in graph classification setting across multiple social networks that lack node\u0000features support our hypothesis. The empirical results conclusively demonstrate\u0000that PropEnc is both an efficient and effective mechanism for constructing node\u0000features from diverse set of graph metrics.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sociolinguistic theories have highlighted how narratives are often retold, co-constructed and reconceptualized in collaborative settings. This working paper focuses on the re-interpretation of characters, an integral part of the narrative story-world, and attempts to study how this may be computationally compared between online communities. Using online fandom - a highly communal phenomenon that has been largely studied qualitatively - as data, computational methods were applied to explore shifts in character representations between two communities and the original text. Specifically, text from the Harry Potter novels, r/HarryPotter subreddit, and fanfiction on Archive of Our Own were analyzed for changes in character mentions, centrality measures from co-occurrence networks, and semantic associations. While fandom elevates secondary characters as found in past work, the two fan communities prioritize different subsets of characters. Word embedding tests reveal starkly different associations of the same characters between communities on the gendered concepts of femininity/masculinity, cruelty, and beauty. Furthermore, fanfiction descriptions of a male character analyzed between romance pairings scored higher for feminine-coded characteristics in male-male romance, matching past qualitative theorizing. The results high-light the potential for computational methods to assist in capturing the re-conceptualization of narrative elements across communities and in supporting qualitative research on fandom.
社会语言学理论强调了叙事如何在协作环境中经常被重述、共同构建和重新概念化。本工作文件的重点是对人物的重新诠释,这也是叙事故事世界不可或缺的一部分,并试图研究如何在网络社区之间进行计算比较。以网络粉丝(一种高度社区化的现象,主要以定性研究为主)为数据,计算方法被应用于探索两个社区和原始文本之间角色表述的变化。具体来说,我们分析了《哈利-波特》小说、r/HarryPotter subreddit 和 Archive of Our Own 上的同人小说中人物提及的变化、共同发生网络的中心度量以及语义关联。正如过去的研究发现的那样,虽然粉丝会提升次要角色的地位,但这两个粉丝社区优先考虑的角色子集却不同。词语嵌入测试显示,在女性/男性、残忍和美丽等性别概念上,两个社群对相同角色的关联截然不同。此外,通过分析恋人配对之间对男性角色的粉丝小说描述,男性与男性恋人之间的女性编码特征得分更高,这与过去的定性理论相吻合。这些结果凸显了计算方法的潜力,有助于捕捉不同社区对叙事元素的重新概念化,并为有关粉丝的定性研究提供支持。
{"title":"Capturing Differences in Character Representations Between Communities: An Initial Study with Fandom","authors":"Bianca N. Y. Kang","doi":"arxiv-2409.11170","DOIUrl":"https://doi.org/arxiv-2409.11170","url":null,"abstract":"Sociolinguistic theories have highlighted how narratives are often retold,\u0000co-constructed and reconceptualized in collaborative settings. This working\u0000paper focuses on the re-interpretation of characters, an integral part of the\u0000narrative story-world, and attempts to study how this may be computationally\u0000compared between online communities. Using online fandom - a highly communal\u0000phenomenon that has been largely studied qualitatively - as data, computational\u0000methods were applied to explore shifts in character representations between two\u0000communities and the original text. Specifically, text from the Harry Potter\u0000novels, r/HarryPotter subreddit, and fanfiction on Archive of Our Own were\u0000analyzed for changes in character mentions, centrality measures from\u0000co-occurrence networks, and semantic associations. While fandom elevates\u0000secondary characters as found in past work, the two fan communities prioritize\u0000different subsets of characters. Word embedding tests reveal starkly different\u0000associations of the same characters between communities on the gendered\u0000concepts of femininity/masculinity, cruelty, and beauty. Furthermore,\u0000fanfiction descriptions of a male character analyzed between romance pairings\u0000scored higher for feminine-coded characteristics in male-male romance, matching\u0000past qualitative theorizing. The results high-light the potential for\u0000computational methods to assist in capturing the re-conceptualization of\u0000narrative elements across communities and in supporting qualitative research on\u0000fandom.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hypergraphs tackle the limitations of traditional graphs by introducing {em hyperedges}. While graph edges connect only two nodes, hyperedges connect an arbitrary number of nodes along their edges. Also, the underlying message-passing mechanisms in Hypergraph Neural Networks (HGNNs) are in the form of vertex-hyperedge-vertex, which let HGNNs capture and utilize richer and more complex structural information than traditional Graph Neural Networks (GNNs). More recently, the idea of overlapping subgraphs has emerged. These subgraphs can capture more information about subgroups of vertices without limiting one vertex belonging to just one group, allowing vertices to belong to multiple groups or subgraphs. In addition, one of the most important problems in graph clustering is to find densest overlapping subgraphs (DOS). In this paper, we propose a solution to the DOS problem via Agglomerative Greedy Enumeration (DOSAGE) algorithm as a novel approach to enhance the process of generating the densest overlapping subgraphs and, hence, a robust construction of the hypergraphs. Experiments on standard benchmarks show that the DOSAGE algorithm significantly outperforms the HGNNs and six other methods on the node classification task.
超图通过引入{emhyperedges}解决了传统图的局限性。图的边只连接两个节点,而超图则沿边连接任意数量的节点。此外,超图神经网络(HGNN)的基本信息传递机制是顶点-超边-顶点的形式,这使得 HGNN 能够捕捉和利用比传统图神经网络(GNN)更丰富、更复杂的结构信息。最近,出现了重叠子图的概念。这些子图可以捕捉更多的顶点子群信息,而不会限制一个顶点只属于一个群组,从而允许顶点属于多个群组或子图。此外,图聚类中最重要的问题之一是找到最密集的重叠子图(DOS)。在本文中,我们提出了一种通过聚合贪婪枚举(Agglomerative GreedyEnumeration,简称 "DABA")算法解决 DOS 问题的方法,这是一种新颖的方法,可以增强最密集重叠子图的生成过程,从而稳健地构建超图。在标准基准上进行的实验表明,在节点分类任务上,该算法明显优于 HGNN 和其他六种方法。
{"title":"Hyperedge Modeling in Hypergraph Neural Networks by using Densest Overlapping Subgraphs","authors":"Mehrad Soltani, Luis Rueda","doi":"arxiv-2409.10340","DOIUrl":"https://doi.org/arxiv-2409.10340","url":null,"abstract":"Hypergraphs tackle the limitations of traditional graphs by introducing {em\u0000hyperedges}. While graph edges connect only two nodes, hyperedges connect an\u0000arbitrary number of nodes along their edges. Also, the underlying\u0000message-passing mechanisms in Hypergraph Neural Networks (HGNNs) are in the\u0000form of vertex-hyperedge-vertex, which let HGNNs capture and utilize richer and\u0000more complex structural information than traditional Graph Neural Networks\u0000(GNNs). More recently, the idea of overlapping subgraphs has emerged. These\u0000subgraphs can capture more information about subgroups of vertices without\u0000limiting one vertex belonging to just one group, allowing vertices to belong to\u0000multiple groups or subgraphs. In addition, one of the most important problems\u0000in graph clustering is to find densest overlapping subgraphs (DOS). In this\u0000paper, we propose a solution to the DOS problem via Agglomerative Greedy\u0000Enumeration (DOSAGE) algorithm as a novel approach to enhance the process of\u0000generating the densest overlapping subgraphs and, hence, a robust construction\u0000of the hypergraphs. Experiments on standard benchmarks show that the DOSAGE\u0000algorithm significantly outperforms the HGNNs and six other methods on the node\u0000classification task.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dohee Kim, Unggi Lee, Sookbun Lee, Jiyeong Bae, Taekyung Ahn, Jaekwon Park, Gunho Lee, Hyeoncheol Kim
This paper introduces ES-KT-24, a novel multimodal Knowledge Tracing (KT) dataset for intelligent tutoring systems in educational game contexts. Although KT is crucial in adaptive learning, existing datasets often lack game-based and multimodal elements. ES-KT-24 addresses these limitations by incorporating educational game-playing videos, synthetically generated question text, and detailed game logs. The dataset covers Mathematics, English, Indonesian, and Malaysian subjects, emphasizing diversity and including non-English content. The synthetic text component, generated using a large language model, encompasses 28 distinct knowledge concepts and 182 questions, featuring 15,032 users and 7,782,928 interactions. Our benchmark experiments demonstrate the dataset's utility for KT research by comparing Deep learning-based KT models with Language Model-based Knowledge Tracing (LKT) approaches. Notably, LKT models showed slightly higher performance than traditional DKT models, highlighting the potential of language model-based approaches in this field. Furthermore, ES-KT-24 has the potential to significantly advance research in multimodal KT models and learning analytics. By integrating game-playing videos and detailed game logs, this dataset offers a unique approach to dissecting student learning patterns through advanced data analysis and machine-learning techniques. It has the potential to unearth new insights into the learning process and inspire further exploration in the field.
{"title":"ES-KT-24: A Multimodal Knowledge Tracing Benchmark Dataset with Educational Game Playing Video and Synthetic Text Generation","authors":"Dohee Kim, Unggi Lee, Sookbun Lee, Jiyeong Bae, Taekyung Ahn, Jaekwon Park, Gunho Lee, Hyeoncheol Kim","doi":"arxiv-2409.10244","DOIUrl":"https://doi.org/arxiv-2409.10244","url":null,"abstract":"This paper introduces ES-KT-24, a novel multimodal Knowledge Tracing (KT)\u0000dataset for intelligent tutoring systems in educational game contexts. Although\u0000KT is crucial in adaptive learning, existing datasets often lack game-based and\u0000multimodal elements. ES-KT-24 addresses these limitations by incorporating\u0000educational game-playing videos, synthetically generated question text, and\u0000detailed game logs. The dataset covers Mathematics, English, Indonesian, and\u0000Malaysian subjects, emphasizing diversity and including non-English content.\u0000The synthetic text component, generated using a large language model,\u0000encompasses 28 distinct knowledge concepts and 182 questions, featuring 15,032\u0000users and 7,782,928 interactions. Our benchmark experiments demonstrate the\u0000dataset's utility for KT research by comparing Deep learning-based KT models\u0000with Language Model-based Knowledge Tracing (LKT) approaches. Notably, LKT\u0000models showed slightly higher performance than traditional DKT models,\u0000highlighting the potential of language model-based approaches in this field.\u0000Furthermore, ES-KT-24 has the potential to significantly advance research in\u0000multimodal KT models and learning analytics. By integrating game-playing videos\u0000and detailed game logs, this dataset offers a unique approach to dissecting\u0000student learning patterns through advanced data analysis and machine-learning\u0000techniques. It has the potential to unearth new insights into the learning\u0000process and inspire further exploration in the field.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}