arXiv - CS - Social and Information Networks最新文献_第10页

Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction 用于通勤流量预测的可解释分层城市表征学习

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-27 DOI: arxiv-2408.14762

Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto

Commuting flow prediction is an essential task for municipal operations inthe real world. Previous studies have revealed that it is feasible to estimatethe commuting origin-destination (OD) demand within a city using multipleauxiliary data. However, most existing methods are not suitable to deal with asimilar task at a large scale, namely within a prefecture or the whole nation,owing to the increased number of geographical units that need to be maintained.In addition, region representation learning is a universal approach for gainingurban knowledge for diverse metropolitan downstream tasks. Although manyresearchers have developed comprehensive frameworks to describe urban unitsfrom multi-source data, they have not clarified the relationship between theselected geographical elements. Furthermore, metropolitan areas naturallypreserve ranked structures, like cities and their inclusive districts, whichmakes elucidating relations between cross-level urban units necessary.Therefore, we develop a heterogeneous graph-based model to generate meaningfulregion embeddings at multiple spatial resolutions for predicting differenttypes of inter-level OD flows. To demonstrate the effectiveness of the proposedmethod, extensive experiments were conducted using real-world aggregated mobilephone datasets collected from Shizuoka Prefecture, Japan. The results indicatethat our proposed model outperforms existing models in terms of a uniform urbanstructure. We extend the understanding of predicted results using reasonableexplanations to enhance the credibility of the model.

通勤流量预测是现实世界中市政运营的一项重要任务。以往的研究表明，利用多种辅助数据估算城市内的通勤起点-终点（OD）需求是可行的。然而，由于需要维护的地理单元数量增加，大多数现有方法并不适合处理大规模的类似任务，即县级或全国范围内的类似任务。尽管许多研究人员已经开发了综合框架来从多源数据中描述城市单元，但他们并没有阐明这些选定的地理要素之间的关系。因此，我们开发了一种基于异构图的模型，在多种空间分辨率下生成有意义的区域嵌入，用于预测不同类型的跨层级 OD 流量。为了证明所提方法的有效性，我们使用从日本静冈县收集的真实世界聚合移动电话数据集进行了大量实验。结果表明，我们提出的模型在统一城市结构方面优于现有模型。我们通过合理的解释扩展了对预测结果的理解，从而提高了模型的可信度。

{"title":"Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction","authors":"Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto","doi":"arxiv-2408.14762","DOIUrl":"https://doi.org/arxiv-2408.14762","url":null,"abstract":"Commuting flow prediction is an essential task for municipal operations in\u0000the real world. Previous studies have revealed that it is feasible to estimate\u0000the commuting origin-destination (OD) demand within a city using multiple\u0000auxiliary data. However, most existing methods are not suitable to deal with a\u0000similar task at a large scale, namely within a prefecture or the whole nation,\u0000owing to the increased number of geographical units that need to be maintained.\u0000In addition, region representation learning is a universal approach for gaining\u0000urban knowledge for diverse metropolitan downstream tasks. Although many\u0000researchers have developed comprehensive frameworks to describe urban units\u0000from multi-source data, they have not clarified the relationship between the\u0000selected geographical elements. Furthermore, metropolitan areas naturally\u0000preserve ranked structures, like cities and their inclusive districts, which\u0000makes elucidating relations between cross-level urban units necessary.\u0000Therefore, we develop a heterogeneous graph-based model to generate meaningful\u0000region embeddings at multiple spatial resolutions for predicting different\u0000types of inter-level OD flows. To demonstrate the effectiveness of the proposed\u0000method, extensive experiments were conducted using real-world aggregated mobile\u0000phone datasets collected from Shizuoka Prefecture, Japan. The results indicate\u0000that our proposed model outperforms existing models in terms of a uniform urban\u0000structure. We extend the understanding of predicted results using reasonable\u0000explanations to enhance the credibility of the model.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Graph Prompt Learning: A Survey and Beyond 实现图形提示学习：调查及其他

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-26 DOI: arxiv-2408.14520

Qingqing Long, Yuchen Yan, Peiyan Zhang, Chen Fang, Wentao Cui, Zhiyuan Ning, Meng Xiao, Ning Cao, Xiao Luo, Lingjun Xu, Shiyue Jiang, Zheng Fang, Chong Chen, Xian-Sheng Hua, Yuanchun Zhou

Large-scale "pre-train and prompt learning" paradigms have demonstratedremarkable adaptability, enabling broad applications across diverse domainssuch as question answering, image recognition, and multimodal retrieval. Thisapproach fully leverages the potential of large-scale pre-trained models,reducing downstream data requirements and computational costs while enhancingmodel applicability across various tasks. Graphs, as versatile data structuresthat capture relationships between entities, play pivotal roles in fields suchas social network analysis, recommender systems, and biological graphs. Despitethe success of pre-train and prompt learning paradigms in Natural LanguageProcessing (NLP) and Computer Vision (CV), their application in graph domainsremains nascent. In graph-structured data, not only do the node and edgefeatures often have disparate distributions, but the topological structuresalso differ significantly. This diversity in graph data can lead toincompatible patterns or gaps between pre-training and fine-tuning ondownstream graphs. We aim to bridge this gap by summarizing methods foralleviating these disparities. This includes exploring prompt designmethodologies, comparing related techniques, assessing application scenariosand datasets, and identifying unresolved problems and challenges. This surveycategorizes over 100 relevant works in this field, summarizing general designprinciples and the latest applications, including text-attributed graphs,molecules, proteins, and recommendation systems. Through this extensive review,we provide a foundational understanding of graph prompt learning, aiming toimpact not only the graph mining community but also the broader ArtificialGeneral Intelligence (AGI) community.

大规模的 "预训练和提示学习 "范式已经展示出了惊人的适应性，在问题解答、图像识别和多模态检索等不同领域得到了广泛应用。这种方法充分利用了大规模预训练模型的潜力，减少了下游数据需求和计算成本，同时提高了模型在各种任务中的适用性。图作为捕捉实体间关系的通用数据结构，在社交网络分析、推荐系统和生物图等领域发挥着举足轻重的作用。尽管预训练和提示学习范式在自然语言处理（NLP）和计算机视觉（CV）领域取得了成功，但它们在图领域的应用仍然刚刚起步。在图结构数据中，不仅节点和边的特征往往具有不同的分布，拓扑结构也有很大差异。图数据的这种多样性会导致预训练和下游图的微调之间出现不兼容的模式或差距。我们旨在通过总结消除这些差异的方法来弥合这一差距。这包括探索及时的设计方法、比较相关技术、评估应用场景和数据集，以及确定尚未解决的问题和挑战。本调查报告对该领域的 100 多部相关著作进行了分类，总结了一般设计原理和最新应用，包括文本归属图、分子、蛋白质和推荐系统。通过这篇广泛的综述，我们提供了对图提示学习的基础性理解，旨在不仅影响图挖掘领域，而且影响更广泛的人工智能（AGI）领域。

{"title":"Towards Graph Prompt Learning: A Survey and Beyond","authors":"Qingqing Long, Yuchen Yan, Peiyan Zhang, Chen Fang, Wentao Cui, Zhiyuan Ning, Meng Xiao, Ning Cao, Xiao Luo, Lingjun Xu, Shiyue Jiang, Zheng Fang, Chong Chen, Xian-Sheng Hua, Yuanchun Zhou","doi":"arxiv-2408.14520","DOIUrl":"https://doi.org/arxiv-2408.14520","url":null,"abstract":"Large-scale \"pre-train and prompt learning\" paradigms have demonstrated\u0000remarkable adaptability, enabling broad applications across diverse domains\u0000such as question answering, image recognition, and multimodal retrieval. This\u0000approach fully leverages the potential of large-scale pre-trained models,\u0000reducing downstream data requirements and computational costs while enhancing\u0000model applicability across various tasks. Graphs, as versatile data structures\u0000that capture relationships between entities, play pivotal roles in fields such\u0000as social network analysis, recommender systems, and biological graphs. Despite\u0000the success of pre-train and prompt learning paradigms in Natural Language\u0000Processing (NLP) and Computer Vision (CV), their application in graph domains\u0000remains nascent. In graph-structured data, not only do the node and edge\u0000features often have disparate distributions, but the topological structures\u0000also differ significantly. This diversity in graph data can lead to\u0000incompatible patterns or gaps between pre-training and fine-tuning on\u0000downstream graphs. We aim to bridge this gap by summarizing methods for\u0000alleviating these disparities. This includes exploring prompt design\u0000methodologies, comparing related techniques, assessing application scenarios\u0000and datasets, and identifying unresolved problems and challenges. This survey\u0000categorizes over 100 relevant works in this field, summarizing general design\u0000principles and the latest applications, including text-attributed graphs,\u0000molecules, proteins, and recommendation systems. Through this extensive review,\u0000we provide a foundational understanding of graph prompt learning, aiming to\u0000impact not only the graph mining community but also the broader Artificial\u0000General Intelligence (AGI) community.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthetic Networks That Preserve Edge Connectivity 保持边缘连接的合成网络

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-24 DOI: arxiv-2408.13647

Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, George Chacko

Since true communities within real-world networks are rarely known, syntheticnetworks with planted ground truths are valuable for evaluating the performanceof community detection methods. Of the synthetic network generation toolsavailable, Stochastic Block Models (SBMs) produce networks with ground truthclusters that well approximate input parameters from real-world networks andclusterings. However, we show that SBMs can produce disconnected ground truthclusters, even when given parameters from clusterings where all clusters areconnected. Here we describe the REalistic Cluster Connectivity Simulator(RECCS), a technique that modifies an SBM synthetic network to improve the fitto a given clustered real-world network with respect to edge connectivitywithin clusters, while maintaining the good fit with respect to other networkand cluster statistics. Using real-world networks up to 13.9 million nodes insize, we show that RECCS, applied to stochastic block models, results insynthetic networks that have a better fit to cluster edge connectivity thanunmodified SBMs, while providing roughly the same quality fit for other networkand clustering parameters as unmodified SBMs.

由于真实世界网络中的真实群落很少为人所知，因此具有基本事实的合成网络对于评估群落检测方法的性能非常有价值。在现有的合成网络生成工具中，随机块模型（SBM）能生成具有地面实况聚类的网络，这些聚类能很好地近似真实世界网络和聚类的输入参数。然而，我们发现，即使给定的参数来自所有簇都相互连接的聚类，随机块模型也能生成断开的地面实况簇。在这里，我们介绍了现实簇连接模拟器（RECCS），它是一种修改 SBM 合成网络的技术，可以在簇间边缘连接性方面提高与给定聚类真实世界网络的拟合度，同时保持与其他网络和簇统计数据的良好拟合度。通过使用规模高达 1,390 万节点的真实世界网络，我们发现 RECCS 应用于随机块模型后，合成网络与未修改的 SBM 相比，能更好地拟合聚类边缘连通性，同时在其他网络和聚类参数方面提供与未修改 SBM 大致相同的拟合质量。

{"title":"Synthetic Networks That Preserve Edge Connectivity","authors":"Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, George Chacko","doi":"arxiv-2408.13647","DOIUrl":"https://doi.org/arxiv-2408.13647","url":null,"abstract":"Since true communities within real-world networks are rarely known, synthetic\u0000networks with planted ground truths are valuable for evaluating the performance\u0000of community detection methods. Of the synthetic network generation tools\u0000available, Stochastic Block Models (SBMs) produce networks with ground truth\u0000clusters that well approximate input parameters from real-world networks and\u0000clusterings. However, we show that SBMs can produce disconnected ground truth\u0000clusters, even when given parameters from clusterings where all clusters are\u0000connected. Here we describe the REalistic Cluster Connectivity Simulator\u0000(RECCS), a technique that modifies an SBM synthetic network to improve the fit\u0000to a given clustered real-world network with respect to edge connectivity\u0000within clusters, while maintaining the good fit with respect to other network\u0000and cluster statistics. Using real-world networks up to 13.9 million nodes in\u0000size, we show that RECCS, applied to stochastic block models, results in\u0000synthetic networks that have a better fit to cluster edge connectivity than\u0000unmodified SBMs, while providing roughly the same quality fit for other network\u0000and clustering parameters as unmodified SBMs.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Random Walk Diffusion for Efficient Large-Scale Graph Generation 高效大规模图形生成的随机漫步扩散

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-08 DOI: arxiv-2408.04461

Tobias Bernecker, Ghalia Rehawi, Francesco Paolo Casale, Janine Knauer-Arloth, Annalisa Marsico

Graph generation addresses the problem of generating new graphs that have adata distribution similar to real-world graphs. While previous diffusion-basedgraph generation methods have shown promising results, they often struggle toscale to large graphs. In this work, we propose ARROW-Diff (AutoRegressiveRandOm Walk Diffusion), a novel random walk-based diffusion approach forefficient large-scale graph generation. Our method encompasses two componentsin an iterative process of random walk sampling and graph pruning. Wedemonstrate that ARROW-Diff can scale to large graphs efficiently, surpassingother baseline methods in terms of both generation time and multiple graphstatistics, reflecting the high quality of the generated graphs.

图形生成解决的问题是生成数据分布与现实世界图形相似的新图形。虽然之前基于扩散的图生成方法已经取得了可喜的成果，但它们往往难以扩展到大型图。在这项工作中，我们提出了 ARROW-Diff（AutoRegressiveRandOm Walk Diffusion，自动回归随机漫步扩散），这是一种基于随机漫步的新型扩散方法，可用于高效的大规模图生成。我们的方法包括随机漫步采样和图剪枝迭代过程中的两个部分。我们证明，ARROW-Diff 可以高效地扩展到大型图，在生成时间和多个图统计方面都超过了其他基线方法，反映出生成图的高质量。

引用次数: 0

Academic collaboration on large language model studies increases overall but varies across disciplines 大型语言模型研究的学术合作总体上有所增加，但各学科之间存在差异

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-08 DOI: arxiv-2408.04163

Lingyao Li, Ly Dinh, Songhua Hu, Libby Hemphill

Interdisciplinary collaboration is crucial for addressing complex scientificchallenges. Recent advancements in large language models (LLMs) have shownsignificant potential in benefiting researchers across various fields. Toexplore the application of LLMs in scientific disciplines and theirimplications for interdisciplinary collaboration, we collect and analyze 50,391papers from OpenAlex, an open-source platform for scholarly metadata. We firstemploy Shannon entropy to assess the diversity of collaboration in terms ofauthors' institutions and departments. Our results reveal that most fields haveexhibited varying degrees of increased entropy following the release ofChatGPT, with Computer Science displaying a consistent increase. Other fieldssuch as Social Science, Decision Science, Psychology, Engineering, HealthProfessions, and Business, Management & Accounting have shown minor tosignificant increases in entropy in 2024 compared to 2023. Statistical testingfurther indicates that the entropy in Computer Science, Decision Science, andEngineering is significantly lower than that in health-related fields likeMedicine and Biochemistry, Genetics & Molecular Biology. In addition, ournetwork analysis based on authors' affiliation information highlights theprominence of Computer Science, Medicine, and other Computer Science-relateddepartments in LLM research. Regarding authors' institutions, our analysisreveals that entities such as Stanford University, Harvard University,University College London, and Google are key players, either dominatingcentrality measures or playing crucial roles in connecting research networks.Overall, this study provides valuable insights into the current landscape andevolving dynamics of collaboration networks in LLM research.

跨学科合作对于解决复杂的科学挑战至关重要。最近在大型语言模型（LLM）方面取得的进展显示出了巨大的潜力，使各个领域的研究人员受益匪浅。为了探索 LLM 在科学学科中的应用及其对跨学科合作的影响，我们从学术元数据开源平台 OpenAlex 收集并分析了 50,391 篇论文。我们首先使用香农熵从作者所在机构和院系的角度评估了合作的多样性。我们的结果表明，在 ChatGPT 发布后，大多数领域的熵值都有不同程度的增加，其中计算机科学领域的熵值持续上升。其他领域，如社会科学、决策科学、心理学、工程学、健康专业以及商业、管理与会计，与 2023 年相比，2024 年的熵值都有轻微到显著的增加。统计测试进一步表明，计算机科学、决策科学和工程学领域的熵值明显低于医学和生物化学、遗传学与分子生物学等健康相关领域。此外，我们根据作者的隶属关系信息进行的网络分析突出表明，计算机科学、医学和其他计算机科学相关部门在法学硕士研究中占据主导地位。在作者所在机构方面，我们的分析表明，斯坦福大学、哈佛大学、伦敦大学学院和谷歌等机构都是关键参与者，它们要么在中心度量中占据主导地位，要么在连接研究网络方面发挥着关键作用。

{"title":"Academic collaboration on large language model studies increases overall but varies across disciplines","authors":"Lingyao Li, Ly Dinh, Songhua Hu, Libby Hemphill","doi":"arxiv-2408.04163","DOIUrl":"https://doi.org/arxiv-2408.04163","url":null,"abstract":"Interdisciplinary collaboration is crucial for addressing complex scientific\u0000challenges. Recent advancements in large language models (LLMs) have shown\u0000significant potential in benefiting researchers across various fields. To\u0000explore the application of LLMs in scientific disciplines and their\u0000implications for interdisciplinary collaboration, we collect and analyze 50,391\u0000papers from OpenAlex, an open-source platform for scholarly metadata. We first\u0000employ Shannon entropy to assess the diversity of collaboration in terms of\u0000authors' institutions and departments. Our results reveal that most fields have\u0000exhibited varying degrees of increased entropy following the release of\u0000ChatGPT, with Computer Science displaying a consistent increase. Other fields\u0000such as Social Science, Decision Science, Psychology, Engineering, Health\u0000Professions, and Business, Management & Accounting have shown minor to\u0000significant increases in entropy in 2024 compared to 2023. Statistical testing\u0000further indicates that the entropy in Computer Science, Decision Science, and\u0000Engineering is significantly lower than that in health-related fields like\u0000Medicine and Biochemistry, Genetics & Molecular Biology. In addition, our\u0000network analysis based on authors' affiliation information highlights the\u0000prominence of Computer Science, Medicine, and other Computer Science-related\u0000departments in LLM research. Regarding authors' institutions, our analysis\u0000reveals that entities such as Stanford University, Harvard University,\u0000University College London, and Google are key players, either dominating\u0000centrality measures or playing crucial roles in connecting research networks.\u0000Overall, this study provides valuable insights into the current landscape and\u0000evolving dynamics of collaboration networks in LLM research.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

More than 'Left and Right': Revealing Multilevel Online Political Selective Exposure 超越 "左右"：揭示多层次的网络政治选择性曝光

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-07 DOI: arxiv-2408.03828

Yuan Zhang, Laia Castro Herrero, Frank Esser, Alexandre Bovet

Selective exposure, individuals' inclination to seek out information thatsupports their beliefs while avoiding information that contradicts them, playsan important role in the emergence of polarization. In the political domain,selective exposure is usually measured on a left-right ideology scale, ignoringfiner details. Here, we combine survey and Twitter data collected during the2022 Brazilian Presidential Election and investigate selective exposurepatterns between the survey respondents and political influencers. We analyzethe followship network between survey respondents and political influencers andfind a multilevel community structure that reveals a hierarchical organizationmore complex than a simple split between left and right. Moreover, depending onthe level we consider, we find different associations between network indicesof exposure patterns and 189 individual attributes of the survey respondents.For example, at finer levels, the number of influencer communities a surveyrespondent follows is associated with several factors, such as demographics,news consumption frequency, and incivility perception. In comparison, onlytheir political ideology is a significant factor at coarser levels. Our workdemonstrates that measuring selective exposure at a single level, such as leftand right, misses important information necessary to capture this phenomenoncorrectly.

选择性接触，即个人倾向于寻找支持其信念的信息，而回避与之相悖的信息，在两极分化的出现中起着重要作用。在政治领域，选择性接触通常用左右意识形态量表来衡量，忽略了更多细节。在此，我们结合了在 2022 年巴西总统大选期间收集的调查和 Twitter 数据，研究了调查对象和政治影响者之间的选择性接触模式。我们分析了调查对象和政治影响者之间的关注网络，发现了一个多层次的社区结构，揭示了一个比简单的左右分裂更复杂的分层组织。例如，在更细的层次上，调查对象关注的有影响力者社区的数量与人口统计、新闻消费频率和不文明感知等多个因素相关。相比之下，在较粗的层次上，只有他们的政治意识形态是一个重要因素。我们的研究表明，在单一层面（如左派和右派）测量选择性接触会忽略正确捕捉这一现象所需的重要信息。

{"title":"More than 'Left and Right': Revealing Multilevel Online Political Selective Exposure","authors":"Yuan Zhang, Laia Castro Herrero, Frank Esser, Alexandre Bovet","doi":"arxiv-2408.03828","DOIUrl":"https://doi.org/arxiv-2408.03828","url":null,"abstract":"Selective exposure, individuals' inclination to seek out information that\u0000supports their beliefs while avoiding information that contradicts them, plays\u0000an important role in the emergence of polarization. In the political domain,\u0000selective exposure is usually measured on a left-right ideology scale, ignoring\u0000finer details. Here, we combine survey and Twitter data collected during the\u00002022 Brazilian Presidential Election and investigate selective exposure\u0000patterns between the survey respondents and political influencers. We analyze\u0000the followship network between survey respondents and political influencers and\u0000find a multilevel community structure that reveals a hierarchical organization\u0000more complex than a simple split between left and right. Moreover, depending on\u0000the level we consider, we find different associations between network indices\u0000of exposure patterns and 189 individual attributes of the survey respondents.\u0000For example, at finer levels, the number of influencer communities a survey\u0000respondent follows is associated with several factors, such as demographics,\u0000news consumption frequency, and incivility perception. In comparison, only\u0000their political ideology is a significant factor at coarser levels. Our work\u0000demonstrates that measuring selective exposure at a single level, such as left\u0000and right, misses important information necessary to capture this phenomenon\u0000correctly.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Role Identification based Method for Cyberbullying Analysis in Social Edge Computing 基于角色识别的社交边缘计算网络欺凌分析方法

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-07 DOI: arxiv-2408.03502

Runyu Wang, Tun Lu, Peng Zhang, Ning Gu

Over the past few years, many efforts have been dedicated to studyingcyberbullying in social edge computing devices, and most of them focus on threeroles: victims, perpetrators, and bystanders. If we want to obtain a deepinsight into the formation, evolution, and intervention of cyberbullying indevices at the edge of the Internet, it is necessary to explore morefine-grained roles. This paper presents a multi-level method for role featuremodeling and proposes a differential evolution-assisted K-means (DEK) method toidentify diverse roles. Our work aims to provide a role identification schemefor cyberbullying scenarios for social edge computing environments to alleviatethe general safety issues that cyberbullying brings. The experiments on tenreal-world datasets obtained from Weibo and five public datasets show that theproposed DEK outperforms the existing approaches on the method level. Afterclustering, we obtained nine roles and analyzed the characteristics of eachrole and their evolution trends under different cyberbullying scenarios. Ourwork in this paper can be placed in devices at the edge of the Internet,leading to better real-time identification performance and adapting to thebroad geographic location and high mobility of mobile devices.

在过去的几年里，人们致力于研究社会边缘计算设备中的网络欺凌问题，其中大部分都集中在三个角色上：受害者、施暴者和旁观者。如果我们想深入了解互联网边缘设备中网络欺凌的形成、演变和干预，就有必要探索更精细的角色。本文介绍了一种多层次的角色特征建模方法，并提出了一种差分进化辅助 K-均值（DEK）方法来识别多样化的角色。我们的工作旨在为社交边缘计算环境中的网络欺凌场景提供一种角色识别方案，以缓解网络欺凌带来的普遍安全问题。在从微博获取的 10 个真实世界数据集和 5 个公开数据集上进行的实验表明，所提出的 DEK 在方法层面上优于现有方法。经过聚类，我们得到了九个角色，并分析了每个角色在不同网络欺凌场景下的特征及其演变趋势。本文的研究成果可以应用于互联网边缘的设备，从而获得更好的实时识别性能，并适应移动设备广泛的地理位置和高流动性的特点。

{"title":"Role Identification based Method for Cyberbullying Analysis in Social Edge Computing","authors":"Runyu Wang, Tun Lu, Peng Zhang, Ning Gu","doi":"arxiv-2408.03502","DOIUrl":"https://doi.org/arxiv-2408.03502","url":null,"abstract":"Over the past few years, many efforts have been dedicated to studying\u0000cyberbullying in social edge computing devices, and most of them focus on three\u0000roles: victims, perpetrators, and bystanders. If we want to obtain a deep\u0000insight into the formation, evolution, and intervention of cyberbullying in\u0000devices at the edge of the Internet, it is necessary to explore more\u0000fine-grained roles. This paper presents a multi-level method for role feature\u0000modeling and proposes a differential evolution-assisted K-means (DEK) method to\u0000identify diverse roles. Our work aims to provide a role identification scheme\u0000for cyberbullying scenarios for social edge computing environments to alleviate\u0000the general safety issues that cyberbullying brings. The experiments on ten\u0000real-world datasets obtained from Weibo and five public datasets show that the\u0000proposed DEK outperforms the existing approaches on the method level. After\u0000clustering, we obtained nine roles and analyzed the characteristics of each\u0000role and their evolution trends under different cyberbullying scenarios. Our\u0000work in this paper can be placed in devices at the edge of the Internet,\u0000leading to better real-time identification performance and adapting to the\u0000broad geographic location and high mobility of mobile devices.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-Scale Graphs Community Detection using Spark GraphFrames 使用 Spark GraphFrames 进行大规模图形群落检测

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-06 DOI: arxiv-2408.03966

Elena-Simona Apostol, Adrian-Cosmin Cojocaru, Ciprian-Octavian Truică

With the emergence of social networks, online platforms dedicated todifferent use cases, and sensor networks, the emergence of large-scale graphcommunity detection has become a steady field of research with real-worldapplications. Community detection algorithms have numerous practicalapplications, particularly due to their scalability with data size.Nonetheless, a notable drawback of community detection algorithms is theircomputational intensity~cite{Apostol2014}, resulting in decreasing performanceas data size increases. For this purpose, new frameworks that employdistributed systems such as Apache Hadoop and Apache Spark which can seamlesslyhandle large-scale graphs must be developed. In this paper, we propose a novelframework for community detection algorithms, i.e., K-Cliques, Louvain, andFast Greedy, developed using Apache Spark GraphFrames. We test theirperformance and scalability on two real-world datasets. The experimentalresults prove the feasibility of developing graph mining algorithms usingApache Spark GraphFrames.

随着社交网络、专用于不同用例的在线平台以及传感器网络的出现，大规模图社区检测已成为一个具有实际应用价值的稳定研究领域。然而，社群检测算法的一个显著缺点是计算强度大~cite{Apostol2014}，导致性能随着数据量的增加而下降。为此，必须开发新的框架，采用分布式系统（如 Apache Hadoop 和 Apache Spark）来无缝处理大规模图。在本文中，我们为使用 Apache Spark GraphFrames 开发的社区检测算法（即 K-Cliques、Louvain 和 Fast Greedy）提出了一个新框架。我们在两个实际数据集上测试了它们的性能和可扩展性。实验结果证明了使用 Apache Spark GraphFrames 开发图挖掘算法的可行性。

引用次数: 0

The Dawn of Decentralized Social Media: An Exploration of Bluesky's Public Opening 去中心化社交媒体的黎明：对蓝天公开课的探索

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-06 DOI: arxiv-2408.03146

Erfan Samieyan Sahneh, Gianluca Nogara, Matthew R. DeVerna, Nick Liu, Luca Luceri, Filippo Menczer, Francesco Pierri, Silvia Giordano

Bluesky is a Twitter-like decentralized social media platform that hasrecently grown in popularity. After an invite-only period, it opened to thepublic worldwide on February 6th, 2024. In this paper, we provide alongitudinal analysis of user activity in the two months around the opening,studying changes in the general characteristics of the platform due to therapid growth of the user base. We observe a broad distribution of activitysimilar to more established platforms, but a higher volume of original thanreshared content, and very low toxicity. After opening to the public, Blueskyexperienced a large surge in new users and activity, especially posting Englishand Japanese content. In particular, several accounts entered the discussionwith suspicious behavior, like following many accounts and sharing content fromlow-credibility news outlets. Some of these have already been classified asspam or suspended, suggesting effective moderation.

蓝天（Bluesky）是一个类似于 Twitter 的去中心化社交媒体平台，最近越来越受欢迎。在经过邀请期后，它于 2024 年 2 月 6 日向全球公众开放。在本文中，我们对开放前后两个月内的用户活动进行了纵向分析，研究了由于用户群的快速增长而导致的平台总体特征的变化。我们观察到活动的广泛分布与更成熟的平台相似，但原创内容的数量高于共享内容，而且毒性很低。向公众开放后，Blueskye 的新用户和活跃度激增，尤其是发布英语和日语内容的用户。特别是，一些账户以可疑的行为进入讨论，如关注许多账户和分享低可信度新闻机构的内容。其中一些已被列为垃圾邮件或被暂停，这表明审核工作卓有成效。

引用次数: 0

Enhancing Twitter Bot Detection via Multimodal Invariant Representations 通过多模态不变表征加强 Twitter 机器人检测

arXiv - CS - Social and Information Networks

Pub Date : 2024-08-06 DOI: arxiv-2408.03096

Jibing Gong, Jiquan Peng, Jin Qu, ShuYing Du, Kaiyu Wang

Detecting Twitter Bots is crucial for maintaining the integrity of onlinediscourse, safeguarding democratic processes, and preventing the spread ofmalicious propaganda. However, advanced Twitter Bots today often employsophisticated feature manipulation and account farming techniques to blendseamlessly with genuine user interactions, posing significant challenges toexisting detection models. In response to these challenges, this paper proposesa novel Twitter Bot Detection framework called BotSAI. This framework enhancesthe consistency of multimodal user features, accurately characterizing variousmodalities to distinguish between real users and bots. Specifically, thearchitecture integrates information from users, textual content, andheterogeneous network topologies, leveraging customized encoders to obtaincomprehensive user feature representations. The heterogeneous network encoderefficiently aggregates information from neighboring nodes through oversamplingtechniques and local relationship transformers. Subsequently, a multi-channelrepresentation mechanism maps user representations into invariant and specificsubspaces, enhancing the feature vectors. Finally, a self-attention mechanismis introduced to integrate and refine the enhanced user representations,enabling efficient information interaction. Extensive experiments demonstratethat BotSAI outperforms existing state-of-the-art methods on two major TwitterBot Detection benchmarks, exhibiting superior performance. Additionally,systematic experiments reveal the impact of different social relationships ondetection accuracy, providing novel insights for the identification of socialbots.

检测推特机器人对于维护在线言论的完整性、保障民主进程和防止恶意宣传的传播至关重要。然而，当今先进的 Twitter Bots 通常采用复杂的特征操纵和账户养殖技术，与真正的用户互动完美融合，这给现有的检测模型带来了巨大挑战。为了应对这些挑战，本文提出了一个名为 BotSAI 的新型 Twitter Bot 检测框架。该框架增强了多模态用户特征的一致性，准确描述了各种模态特征，以区分真实用户和机器人。具体来说，该架构整合了来自用户、文本内容和异构网络拓扑的信息，利用定制编码器获得全面的用户特征表征。异构网络编码器通过超采样技术和局部关系转换器有效地聚合了来自相邻节点的信息。随后，多通道表征机制将用户表征映射到不变空间和特定子空间，从而增强特征向量。最后，引入自我关注机制来整合和完善增强的用户表征，从而实现高效的信息交互。广泛的实验证明，BotSAI 在两个主要的 Twitter 机器人检测基准上的表现优于现有的最先进方法，表现出了卓越的性能。此外，系统实验还揭示了不同社交关系对检测准确性的影响，为识别社交机器人提供了新的见解。

{"title":"Enhancing Twitter Bot Detection via Multimodal Invariant Representations","authors":"Jibing Gong, Jiquan Peng, Jin Qu, ShuYing Du, Kaiyu Wang","doi":"arxiv-2408.03096","DOIUrl":"https://doi.org/arxiv-2408.03096","url":null,"abstract":"Detecting Twitter Bots is crucial for maintaining the integrity of online\u0000discourse, safeguarding democratic processes, and preventing the spread of\u0000malicious propaganda. However, advanced Twitter Bots today often employ\u0000sophisticated feature manipulation and account farming techniques to blend\u0000seamlessly with genuine user interactions, posing significant challenges to\u0000existing detection models. In response to these challenges, this paper proposes\u0000a novel Twitter Bot Detection framework called BotSAI. This framework enhances\u0000the consistency of multimodal user features, accurately characterizing various\u0000modalities to distinguish between real users and bots. Specifically, the\u0000architecture integrates information from users, textual content, and\u0000heterogeneous network topologies, leveraging customized encoders to obtain\u0000comprehensive user feature representations. The heterogeneous network encoder\u0000efficiently aggregates information from neighboring nodes through oversampling\u0000techniques and local relationship transformers. Subsequently, a multi-channel\u0000representation mechanism maps user representations into invariant and specific\u0000subspaces, enhancing the feature vectors. Finally, a self-attention mechanism\u0000is introduced to integrate and refine the enhanced user representations,\u0000enabling efficient information interaction. Extensive experiments demonstrate\u0000that BotSAI outperforms existing state-of-the-art methods on two major Twitter\u0000Bot Detection benchmarks, exhibiting superior performance. Additionally,\u0000systematic experiments reveal the impact of different social relationships on\u0000detection accuracy, providing novel insights for the identification of social\u0000bots.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0