arXiv - CS - Social and Information Networks最新文献_第5页

Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs 为稀疏标记图中的节点分类生成虚拟节点

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-12 DOI: arxiv-2409.07712

Hang Cui, Tarek Abdelzaher

In the broader machine learning literature, data-generation methodsdemonstrate promising results by generating additional informative trainingexamples via augmenting sparse labels. Such methods are less studied in graphsdue to the intricate dependencies among nodes in complex topology structures.This paper presents a novel node generation method that infuses a small set ofhigh-quality synthesized nodes into the graph as additional labeled nodes tooptimally expand the propagation of labeled information. By simply infusingadditional nodes, the framework is orthogonal to the graph learning anddownstream classification techniques, and thus is compatible with most populargraph pre-training (self-supervised learning), semi-supervised learning, andmeta-learning methods. The contribution lies in designing the generated nodeset by solving a novel optimization problem. The optimization places thegenerated nodes in a manner that: (1) minimizes the classification loss toguarantee training accuracy and (2) maximizes label propagation tolow-confidence nodes in the downstream task to ensure high-quality propagation.Theoretically, we show that the above dual optimization maximizes the globalconfidence of node classification. Our Experiments demonstrate statisticallysignificant performance improvements over 14 baselines on 10 publicly availabledatasets.

在更广泛的机器学习文献中，数据生成方法通过增加稀疏标签来生成额外的信息训练样本，从而展示了很有前景的结果。由于复杂拓扑结构中节点之间错综复杂的依赖关系，此类方法在图中的研究较少。本文提出了一种新颖的节点生成方法，该方法将一小部分高质量的合成节点作为附加标签节点注入图中，从而最大限度地扩大了标签信息的传播范围。通过简单地注入额外节点，该框架与图学习和下游分类技术是正交的，因此与大多数流行的图预训练（自我监督学习）、半监督学习和元学习方法是兼容的。它的贡献在于通过解决一个新颖的优化问题来设计生成的节点集。该优化方法将生成的节点以如下方式放置(从理论上讲，我们证明了上述双重优化能最大化节点分类的全局置信度。我们的实验表明，在 10 个公开可用的数据集上，与 14 个基线相比，我们的性能有了统计上的显著提高。

{"title":"Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs","authors":"Hang Cui, Tarek Abdelzaher","doi":"arxiv-2409.07712","DOIUrl":"https://doi.org/arxiv-2409.07712","url":null,"abstract":"In the broader machine learning literature, data-generation methods\u0000demonstrate promising results by generating additional informative training\u0000examples via augmenting sparse labels. Such methods are less studied in graphs\u0000due to the intricate dependencies among nodes in complex topology structures.\u0000This paper presents a novel node generation method that infuses a small set of\u0000high-quality synthesized nodes into the graph as additional labeled nodes to\u0000optimally expand the propagation of labeled information. By simply infusing\u0000additional nodes, the framework is orthogonal to the graph learning and\u0000downstream classification techniques, and thus is compatible with most popular\u0000graph pre-training (self-supervised learning), semi-supervised learning, and\u0000meta-learning methods. The contribution lies in designing the generated node\u0000set by solving a novel optimization problem. The optimization places the\u0000generated nodes in a manner that: (1) minimizes the classification loss to\u0000guarantee training accuracy and (2) maximizes label propagation to\u0000low-confidence nodes in the downstream task to ensure high-quality propagation.\u0000Theoretically, we show that the above dual optimization maximizes the global\u0000confidence of node classification. Our Experiments demonstrate statistically\u0000significant performance improvements over 14 baselines on 10 publicly available\u0000datasets.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Personalized Scoping for Graph Neural Networks under Heterophily 嗜异性条件下的图神经网络个性化范围学习

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-11 DOI: arxiv-2409.06998

Gangda Deng, Hongkuan Zhou, Rajgopal Kannan, Viktor Prasanna

Heterophilous graphs, where dissimilar nodes tend to connect, pose achallenge for graph neural networks (GNNs) as their superior performancetypically comes from aggregating homophilous information. Increasing the GNNdepth can expand the scope (i.e., receptive field), potentially findinghomophily from the higher-order neighborhoods. However, uniformly expanding thescope results in subpar performance since real-world graphs often exhibithomophily disparity between nodes. An ideal way is personalized scopes,allowing nodes to have varying scope sizes. Existing methods typically addnode-adaptive weights for each hop. Although expressive, they inevitably sufferfrom severe overfitting. To address this issue, we formalize personalizedscoping as a separate scope classification problem that overcomes GNNoverfitting in node classification. Specifically, we predict the optimal GNNdepth for each node. Our theoretical and empirical analysis suggests thataccurately predicting the depth can significantly enhance generalization. Wefurther propose Adaptive Scope (AS), a lightweight MLP-based approach that onlyparticipates in GNN inference. AS encodes structural patterns and predicts thedepth to select the best model for each node's prediction. Experimental resultsshow that AS is highly flexible with various GNN architectures across a widerange of datasets while significantly improving accuracy.

嗜异性图（异类节点往往连接在一起）给图神经网络（GNN）带来了挑战，因为它们的卓越性能通常来自于聚合嗜同性信息。增加图神经网络的深度可以扩大其范围（即感受野），从而有可能从高阶邻域中找到同源性信息。然而，均匀地扩大范围会导致性能不佳，因为现实世界中的图通常会表现出节点之间的同源性差异。理想的方法是个性化范围，允许节点拥有不同的范围大小。现有方法通常为每一跳添加节点自适应权重。虽然这些方法具有很强的表现力，但不可避免地存在严重的过拟合问题。为了解决这个问题，我们将个性化范围正式定义为一个单独的范围分类问题，克服了节点分类中的 GNN 过拟合问题。具体来说，我们预测每个节点的最佳 GNN 深度。我们的理论和实证分析表明，准确预测深度可以显著提高泛化效果。我们进一步提出了自适应范围（AS），这是一种基于 MLP 的轻量级方法，只参与 GNN 推断。AS 对结构模式进行编码，并预测深度，从而为每个节点的预测选择最佳模型。实验结果表明，AS 在更广泛的数据集上与各种 GNN 架构配合使用时具有很高的灵活性，同时还能显著提高准确率。

{"title":"Learning Personalized Scoping for Graph Neural Networks under Heterophily","authors":"Gangda Deng, Hongkuan Zhou, Rajgopal Kannan, Viktor Prasanna","doi":"arxiv-2409.06998","DOIUrl":"https://doi.org/arxiv-2409.06998","url":null,"abstract":"Heterophilous graphs, where dissimilar nodes tend to connect, pose a\u0000challenge for graph neural networks (GNNs) as their superior performance\u0000typically comes from aggregating homophilous information. Increasing the GNN\u0000depth can expand the scope (i.e., receptive field), potentially finding\u0000homophily from the higher-order neighborhoods. However, uniformly expanding the\u0000scope results in subpar performance since real-world graphs often exhibit\u0000homophily disparity between nodes. An ideal way is personalized scopes,\u0000allowing nodes to have varying scope sizes. Existing methods typically add\u0000node-adaptive weights for each hop. Although expressive, they inevitably suffer\u0000from severe overfitting. To address this issue, we formalize personalized\u0000scoping as a separate scope classification problem that overcomes GNN\u0000overfitting in node classification. Specifically, we predict the optimal GNN\u0000depth for each node. Our theoretical and empirical analysis suggests that\u0000accurately predicting the depth can significantly enhance generalization. We\u0000further propose Adaptive Scope (AS), a lightweight MLP-based approach that only\u0000participates in GNN inference. AS encodes structural patterns and predicts the\u0000depth to select the best model for each node's prediction. Experimental results\u0000show that AS is highly flexible with various GNN architectures across a wide\u0000range of datasets while significantly improving accuracy.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"274 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DisasterNeedFinder: Understanding the Information Needs in the 2024 Noto Earthquake (Comprehensive Explanation) DisasterNeedFinder：了解 2024 年能登地震的信息需求（综合说明）

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-11 DOI: arxiv-2409.07102

Kota Tsubouchi, Shuji Yamaguchi, Keijirou Saitou, Akihisa Soemori, Masato Morita, Shigeki Asou

We propose and demonstrate the DisasterNeedFinder framework in order toprovide appropriate information support for the Noto Peninsula Earthquake. Inthe event of a large-scale disaster, it is essential to accurately capture theever-changing information needs. However, it is difficult to obtain appropriateinformation from the chaotic situation on the ground. Therefore, as adata-driven approach, we aim to pick up precise information needs at the siteby integrally analyzing the location information of disaster victims and searchinformation. It is difficult to make a clear estimation of information needs byjust analyzing search history information in disaster areas, due to the largeamount of noise and the small number of users. Therefore, the idea of assumingthat the magnitude of information needs is not the volume of searches, but thedegree of abnormalities in searches, enables an appropriate understanding ofthe information needs of the disaster victims. DNF has been continuouslyclarifying the information needs of disaster areas since the disaster strike,and has been recognized as a new approach to support disaster areas by beingfeatured in the major Japanese media on several occasions.

为了给能登半岛地震提供适当的信息支持，我们提出并演示了 DisasterNeedFinder 框架。在发生大规模灾害时，准确捕捉不断变化的信息需求至关重要。然而，要从混乱的现场情况中获取适当的信息却十分困难。因此，作为一种数据驱动型方法，我们的目标是通过综合分析灾民的位置信息和搜索信息，准确捕捉现场的信息需求。由于灾区噪音大、用户数量少，仅通过分析搜索历史信息很难明确估计信息需求。因此，假设信息需求的大小不是搜索量，而是搜索的异常程度，就能恰当地理解灾民的信息需求。自灾害发生以来，DNF 一直在不断阐明灾区的信息需求，并作为一种新的灾区支援方法多次被日本主要媒体报道。

{"title":"DisasterNeedFinder: Understanding the Information Needs in the 2024 Noto Earthquake (Comprehensive Explanation)","authors":"Kota Tsubouchi, Shuji Yamaguchi, Keijirou Saitou, Akihisa Soemori, Masato Morita, Shigeki Asou","doi":"arxiv-2409.07102","DOIUrl":"https://doi.org/arxiv-2409.07102","url":null,"abstract":"We propose and demonstrate the DisasterNeedFinder framework in order to\u0000provide appropriate information support for the Noto Peninsula Earthquake. In\u0000the event of a large-scale disaster, it is essential to accurately capture the\u0000ever-changing information needs. However, it is difficult to obtain appropriate\u0000information from the chaotic situation on the ground. Therefore, as a\u0000data-driven approach, we aim to pick up precise information needs at the site\u0000by integrally analyzing the location information of disaster victims and search\u0000information. It is difficult to make a clear estimation of information needs by\u0000just analyzing search history information in disaster areas, due to the large\u0000amount of noise and the small number of users. Therefore, the idea of assuming\u0000that the magnitude of information needs is not the volume of searches, but the\u0000degree of abnormalities in searches, enables an appropriate understanding of\u0000the information needs of the disaster victims. DNF has been continuously\u0000clarifying the information needs of disaster areas since the disaster strike,\u0000and has been recognized as a new approach to support disaster areas by being\u0000featured in the major Japanese media on several occasions.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Voting System for Medical Catalogues in National Health Insurance 全国医疗保险医疗目录的新型投票系统

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-11 DOI: arxiv-2409.07057

Xingyuan Liang, Haibao Wen

This study explores the conceptual development of a medical insurancecatalogue voting system. The methodology is centred on creating a model wheredoctors would vote on treatment inclusions, aiming to demonstrate transparencyand integrity. The results from Monte Carlo simulations suggest a robustconsensus on the selection of medicines and treatments. Further theoreticalinvestigations propose incorporating a patient outcome-based incentivemechanism. This conceptual approach could enhance decision-making in healthcareby aligning stakeholder interests with patient outcomes, aiming for anoptimised, equitable insurance catalogue with potential blockchain-basedsmart-contracts to ensure transparency and integrity.

本研究探讨了医疗保险目录投票系统的概念发展。该方法的核心是创建一个由医生投票决定是否纳入治疗项目的模型，旨在体现透明度和完整性。蒙特卡罗模拟的结果表明，在药品和治疗方法的选择上达成了稳健的共识。进一步的理论研究建议纳入基于患者结果的激励机制。这种概念性方法可以通过将利益相关者的利益与患者的结果相统一来加强医疗保健决策，从而实现优化、公平的保险目录，并通过潜在的基于区块链的智能合同来确保透明度和完整性。

引用次数: 0

Mapping the Russian Internet Troll Network on Twitter using a Predictive Model 利用预测模型绘制推特上的俄罗斯互联网巨魔网络图

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-11 DOI: arxiv-2409.08305

Sachith Dassanayaka, Ori Swed, Dimitri Volchenkov

Russian Internet Trolls use fake personas to spread disinformation throughmultiple social media streams. Given the increased frequency of this threatacross social media platforms, understanding those operations is paramount incombating their influence. Using Twitter content identified as part of theRussian influence network, we created a predictive model to map the networkoperations. We classify accounts type based on their authenticity function fora sub-sample of accounts by introducing logical categories and training apredictive model to identify similar behavior patterns across the network. Ourmodel attains 88% prediction accuracy for the test set. Validation is done bycomparing the similarities with the 3 million Russian troll tweets dataset. Theresult indicates a 90.7% similarity between the two datasets. Furthermore, wecompare our model predictions on a Russian tweets dataset, and the resultsstate that there is 90.5% correspondence between the predictions and the actualcategories. The prediction and validation results suggest that our predictivemodel can assist with mapping the actors in such networks.

俄罗斯网络巨魔利用虚假角色通过多个社交媒体流传播虚假信息。鉴于这种威胁在社交媒体平台上出现的频率越来越高，了解这些行动对于打击其影响力至关重要。利用被认定为俄罗斯影响力网络一部分的 Twitter 内容，我们创建了一个预测模型来绘制网络运营图。通过引入逻辑类别和训练预测模型来识别整个网络中的类似行为模式，我们根据账户子样本的真实性功能对账户类型进行了分类。我们的模型对测试集的预测准确率达到 88%。通过与 300 万条俄罗斯巨魔推文数据集的相似性比较进行验证。结果表明，两个数据集的相似度为 90.7%。此外，我们还将模型预测结果与俄罗斯推文数据集进行了比较，结果表明预测结果与实际类别之间的对应率为 90.5%。预测和验证结果表明，我们的预测模型可以帮助绘制此类网络中的行为者。

{"title":"Mapping the Russian Internet Troll Network on Twitter using a Predictive Model","authors":"Sachith Dassanayaka, Ori Swed, Dimitri Volchenkov","doi":"arxiv-2409.08305","DOIUrl":"https://doi.org/arxiv-2409.08305","url":null,"abstract":"Russian Internet Trolls use fake personas to spread disinformation through\u0000multiple social media streams. Given the increased frequency of this threat\u0000across social media platforms, understanding those operations is paramount in\u0000combating their influence. Using Twitter content identified as part of the\u0000Russian influence network, we created a predictive model to map the network\u0000operations. We classify accounts type based on their authenticity function for\u0000a sub-sample of accounts by introducing logical categories and training a\u0000predictive model to identify similar behavior patterns across the network. Our\u0000model attains 88% prediction accuracy for the test set. Validation is done by\u0000comparing the similarities with the 3 million Russian troll tweets dataset. The\u0000result indicates a 90.7% similarity between the two datasets. Furthermore, we\u0000compare our model predictions on a Russian tweets dataset, and the results\u0000state that there is 90.5% correspondence between the predictions and the actual\u0000categories. The prediction and validation results suggest that our predictive\u0000model can assist with mapping the actors in such networks.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Market Reaction to News Flows in Supply Chain Networks 市场对供应链网络中新闻流的反应

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-10 DOI: arxiv-2409.06255

Hiroyasu Inoue, Yasuyuki Todo

This study examines whether positive news about firms increases their stockprices and, moreover, whether it increases stock prices of the firms' suppliersand customers, using a large sample of publicly listed firms across the worldand another of Japanese listed firms. The level of positiveness of each newsarticle is determined by FinBERT, a natural language processing modelfine-tuned specifically for financial information. Supply chains of firmsacross the world are identified mostly by financial statements, while those ofJapanese firms are taken from large-scale firm-level surveys. We find thatpositive news increases the change rate of stock prices of firms mentioned inthe news before its disclosure, most likely because of diffusion of informationthrough informal channels. Positive news also raises stock prices of the firms'suppliers and customers before its disclosure, confirming propagation of marketvalues through supply chains. In addition, we generally find a larger post-newseffect on stock prices of the mentioned firms and their suppliers and customersthan the pre-news effect. The positive difference between the post- andpre-news effects can be considered as the net effect of the disclosure ofpositive news, controlling for informal information diffusion. However, thepost-news effect on suppliers and customers in Japan is smaller than thepre-news effect, a result opposite to those from firms across the world. Thisnotable result is possibly because supply chain links of Japanese firms arestronger than global supply chains while such knowledge is restricted toselected investors.

本研究以全球上市公司和日本上市公司为大样本，考察了有关公司的正面新闻是否会提高公司股价，以及是否会提高公司供应商和客户的股价。每篇新闻的积极程度都是由 FinBERT 确定的，FinBERT 是一种专门针对金融信息进行了微调的自然语言处理模式。全球企业的供应链主要由财务报表确定，而日本企业的供应链则来自大规模的企业级调查。我们发现，正面新闻会提高新闻中提到的公司在新闻披露前的股价变化率，这很可能是由于信息通过非正式渠道传播的缘故。正面新闻也会在披露前提高企业供应商和客户的股票价格，这证实了市场价值通过供应链的传播。此外，我们还发现，对上述公司及其供应商和客户股票价格的新闻后效应通常大于新闻前效应。新闻后效应与新闻前效应之间的正差异可以被视为在控制非正式信息扩散的情况下，正面新闻披露的净效应。然而，日本供应商和客户的新闻后效应小于新闻前效应，这与全球企业的结果相反。这一值得注意的结果可能是因为日本公司的供应链联系比全球供应链更紧密，而这些知识仅限于特定的投资者。

{"title":"Market Reaction to News Flows in Supply Chain Networks","authors":"Hiroyasu Inoue, Yasuyuki Todo","doi":"arxiv-2409.06255","DOIUrl":"https://doi.org/arxiv-2409.06255","url":null,"abstract":"This study examines whether positive news about firms increases their stock\u0000prices and, moreover, whether it increases stock prices of the firms' suppliers\u0000and customers, using a large sample of publicly listed firms across the world\u0000and another of Japanese listed firms. The level of positiveness of each news\u0000article is determined by FinBERT, a natural language processing model\u0000fine-tuned specifically for financial information. Supply chains of firms\u0000across the world are identified mostly by financial statements, while those of\u0000Japanese firms are taken from large-scale firm-level surveys. We find that\u0000positive news increases the change rate of stock prices of firms mentioned in\u0000the news before its disclosure, most likely because of diffusion of information\u0000through informal channels. Positive news also raises stock prices of the firms'\u0000suppliers and customers before its disclosure, confirming propagation of market\u0000values through supply chains. In addition, we generally find a larger post-news\u0000effect on stock prices of the mentioned firms and their suppliers and customers\u0000than the pre-news effect. The positive difference between the post- and\u0000pre-news effects can be considered as the net effect of the disclosure of\u0000positive news, controlling for informal information diffusion. However, the\u0000post-news effect on suppliers and customers in Japan is smaller than the\u0000pre-news effect, a result opposite to those from firms across the world. This\u0000notable result is possibly because supply chain links of Japanese firms are\u0000stronger than global supply chains while such knowledge is restricted to\u0000selected investors.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Computation for the Forest Matrix of an Evolving Graph 快速计算演化图的森林矩阵

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-09 DOI: arxiv-2409.05503

Haoxin Sun, Xiaotian Zhou, Zhongzhi Zhang

The forest matrix plays a crucial role in network science, opinion dynamics,and machine learning, offering deep insights into the structure of and dynamicson networks. In this paper, we study the problem of querying entries of theforest matrix in evolving graphs, which more accurately represent the dynamicnature of real-world networks compared to static graphs. To address the uniquechallenges posed by evolving graphs, we first introduce two approximationalgorithms, textsc{SFQ} and textsc{SFQPlus}, for static graphs. textsc{SFQ}employs a probabilistic interpretation of the forest matrix, whiletextsc{SFQPlus} incorporates a novel variance reduction technique and istheoretically proven to offer enhanced accuracy. Based on these two algorithms,we further devise two dynamic algorithms centered around efficientlymaintaining a list of spanning converging forests. This approach ensures $O(1)$runtime complexity for updates, including edge additions and deletions, as wellas for querying matrix elements, and provides an unbiased estimation of forestmatrix entries. Finally, through extensive experiments on various real-worldnetworks, we demonstrate the efficiency and effectiveness of our algorithms.Particularly, our algorithms are scalable to massive graphs with more thanforty million nodes.

森林矩阵在网络科学、舆论动力学和机器学习中发挥着至关重要的作用，能深入揭示网络的结构和动态。在本文中，我们研究了在演化图中查询森林矩阵条目的问题，与静态图相比，演化图更准确地代表了真实世界网络的动态性质。为了解决演化图带来的独特挑战，我们首先介绍了两种针对静态图的近似计算算法--textsc{SFQ}和textsc{SFQPlus}。textsc{SFQ}采用了对森林矩阵的概率解释，而textsc{SFQPlus}则采用了一种新颖的方差缩小技术，并在理论上被证明可以提供更高的精度。在这两种算法的基础上，我们进一步设计了两种动态算法，其核心是有效地维护跨度收敛森林列表。这种方法确保了更新（包括边的添加和删除）以及查询矩阵元素的运行时间复杂度为 $O(1)$，并提供了对森林矩阵条目的无偏估计。最后，通过在各种真实世界网络上的广泛实验，我们证明了我们算法的效率和有效性，特别是我们的算法可以扩展到拥有超过 4000 万个节点的大规模图。

{"title":"Fast Computation for the Forest Matrix of an Evolving Graph","authors":"Haoxin Sun, Xiaotian Zhou, Zhongzhi Zhang","doi":"arxiv-2409.05503","DOIUrl":"https://doi.org/arxiv-2409.05503","url":null,"abstract":"The forest matrix plays a crucial role in network science, opinion dynamics,\u0000and machine learning, offering deep insights into the structure of and dynamics\u0000on networks. In this paper, we study the problem of querying entries of the\u0000forest matrix in evolving graphs, which more accurately represent the dynamic\u0000nature of real-world networks compared to static graphs. To address the unique\u0000challenges posed by evolving graphs, we first introduce two approximation\u0000algorithms, textsc{SFQ} and textsc{SFQPlus}, for static graphs. textsc{SFQ}\u0000employs a probabilistic interpretation of the forest matrix, while\u0000textsc{SFQPlus} incorporates a novel variance reduction technique and is\u0000theoretically proven to offer enhanced accuracy. Based on these two algorithms,\u0000we further devise two dynamic algorithms centered around efficiently\u0000maintaining a list of spanning converging forests. This approach ensures $O(1)$\u0000runtime complexity for updates, including edge additions and deletions, as well\u0000as for querying matrix elements, and provides an unbiased estimation of forest\u0000matrix entries. Finally, through extensive experiments on various real-world\u0000networks, we demonstrate the efficiency and effectiveness of our algorithms.\u0000Particularly, our algorithms are scalable to massive graphs with more than\u0000forty million nodes.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Computation of Kemeny's Constant for Directed Graphs 快速计算有向图的凯美尼常数

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-09 DOI: arxiv-2409.05471

Haisong Xia, Zhongzhi Zhang

Kemeny's constant for random walks on a graph is defined as the mean hittingtime from one node to another selected randomly according to the stationarydistribution. It has found numerous applications and attracted considerableresearch interest. However, exact computation of Kemeny's constant requiresmatrix inversion, which scales poorly for large networks with millions ofnodes. Existing approximation algorithms either leverage properties exclusiveto undirected graphs or involve inefficient simulation, leaving room forfurther optimization. To address these limitations for directed graphs, wepropose two novel approximation algorithms for estimating Kemeny's constant ondirected graphs with theoretical error guarantees. Extensive numericalexperiments on real-world networks validate the superiority of our algorithmsover baseline methods in terms of efficiency and accuracy.

图上随机行走的凯美尼常数被定义为根据静态分布随机选择的一个节点到另一个节点的平均点击时间。它已被广泛应用，并吸引了大量研究兴趣。然而，精确计算凯门尼常数需要矩阵反演，这对于拥有数百万节点的大型网络来说扩展性很差。现有的近似算法要么利用了无向图所独有的特性，要么涉及低效模拟，因此还有进一步优化的空间。为了解决有向图的这些局限性，我们提出了两种新的近似算法，用于估计有向图上的凯门尼常数，并提供理论误差保证。在真实世界网络上进行的大量数值实验验证了我们的算法在效率和准确性方面优于基线方法。

引用次数: 0

Extracting the U.S. building types from OpenStreetMap data 从 OpenStreetMap 数据中提取美国建筑类型

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-09 DOI: arxiv-2409.05692

Henrique F. de Arruda, Sandro M. Reia, Shiyang Ruan, Kuldip S. Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser

Building type information is crucial for population estimation, trafficplanning, urban planning, and emergency response applications. Althoughessential, such data is often not readily available. To alleviate this problem,this work creates a comprehensive dataset by providingresidential/non-residential building classification covering the entire UnitedStates. We propose and utilize an unsupervised machine learning method toclassify building types based on building footprints and availableOpenStreetMap information. The classification result is validated usingauthoritative ground truth data for select counties in the U.S. The validationshows a high precision for non-residential building classification and a highrecall for residential buildings. We identified various approaches to improvingthe quality of the classification, such as removing sheds and garages from thedataset. Furthermore, analyzing the misclassifications revealed that they aremainly due to missing and scarce metadata in OSM. A major result of this workis the resulting dataset of classifying 67,705,475 buildings. We hope that thisdata is of value to the scientific community, including urban andtransportation planners.

建筑类型信息对于人口估计、交通规划、城市规划和应急响应应用至关重要。尽管非常重要，但此类数据往往不易获得。为了缓解这一问题，这项工作通过提供覆盖全美的住宅/非住宅建筑分类，创建了一个综合数据集。我们提出并使用了一种无监督机器学习方法，根据建筑物占地面积和可用的 OpenStreetMap 信息对建筑物类型进行分类。我们使用美国部分郡县的权威地面实况数据对分类结果进行了验证。验证结果表明，非住宅建筑分类的精确度很高，而住宅建筑分类的召回率很高。我们确定了提高分类质量的各种方法，例如从数据集中移除棚屋和车库。此外，对错误分类的分析表明，这些错误分类主要是由于 OSM 中元数据的缺失和匮乏造成的。这项工作的一个主要成果是建立了一个数据集，对 67 705 475 幢建筑物进行了分类。我们希望这些数据能对科学界，包括城市和交通规划者有所帮助。

{"title":"Extracting the U.S. building types from OpenStreetMap data","authors":"Henrique F. de Arruda, Sandro M. Reia, Shiyang Ruan, Kuldip S. Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser","doi":"arxiv-2409.05692","DOIUrl":"https://doi.org/arxiv-2409.05692","url":null,"abstract":"Building type information is crucial for population estimation, traffic\u0000planning, urban planning, and emergency response applications. Although\u0000essential, such data is often not readily available. To alleviate this problem,\u0000this work creates a comprehensive dataset by providing\u0000residential/non-residential building classification covering the entire United\u0000States. We propose and utilize an unsupervised machine learning method to\u0000classify building types based on building footprints and available\u0000OpenStreetMap information. The classification result is validated using\u0000authoritative ground truth data for select counties in the U.S. The validation\u0000shows a high precision for non-residential building classification and a high\u0000recall for residential buildings. We identified various approaches to improving\u0000the quality of the classification, such as removing sheds and garages from the\u0000dataset. Furthermore, analyzing the misclassifications revealed that they are\u0000mainly due to missing and scarce metadata in OSM. A major result of this work\u0000is the resulting dataset of classifying 67,705,475 buildings. We hope that this\u0000data is of value to the scientific community, including urban and\u0000transportation planners.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"120 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis Instagram 上的 Mpox 叙事：用于情感、仇恨言论和焦虑分析的 Mpox Instagram 帖子标签化多语言数据集

arXiv - CS - Social and Information Networks

Pub Date : 2024-09-09 DOI: arxiv-2409.05292

Nirmalya Thakur

The world is currently experiencing an outbreak of mpox, which has beendeclared a Public Health Emergency of International Concern by WHO. No priorwork related to social media mining has focused on the development of a datasetof Instagram posts about the mpox outbreak. The work presented in this paperaims to address this research gap and makes two scientific contributions tothis field. First, it presents a multilingual dataset of 60,127 Instagram postsabout mpox, published between July 23, 2022, and September 5, 2024. Thedataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagramposts about mpox in 52 languages. For each of these posts, the Post ID, PostDescription, Date of publication, language, and translated version of the post(translation to English was performed using the Google Translate API) arepresented as separate attributes in the dataset. After developing this dataset,sentiment analysis, hate speech detection, and anxiety or stress detection wereperformed. This process included classifying each post into (i) one of thesentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, orneutral, (ii) hate or not hate, and (iii) anxiety/stress detected or noanxiety/stress detected. These results are presented as separate attributes inthe dataset. Second, this paper presents the results of performing sentimentanalysis, hate speech analysis, and anxiety or stress analysis. The variationof the sentiment classes - fear, surprise, joy, sadness, anger, disgust, andneutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and50.64%, respectively. In terms of hate speech detection, 95.75% of the postsdid not contain hate and the remaining 4.25% of the posts contained hate.Finally, 72.05% of the posts did not indicate any anxiety/stress, and theremaining 27.95% of the posts represented some form of anxiety/stress.

世界目前正在经历一场天花疫情爆发，世卫组织已将其宣布为国际关注的公共卫生紧急事件。此前没有任何与社交媒体挖掘相关的工作专注于开发有关麻疹疫情的 Instagram 帖子数据集。本文介绍的工作旨在填补这一研究空白，并为这一领域做出了两项科学贡献。首先，本文介绍了一个包含 60127 条 Instagram 上关于麻风腮疫情帖子的多语言数据集，这些帖子发布于 2022 年 7 月 23 日至 2024 年 9 月 5 日之间。该数据集可在 https://dx.doi.org/10.21227/7fvc-y093 网站上查阅，其中包含 52 种语言的关于 mpox 的 Instagram 帖子。对于每条帖子，帖子 ID、帖子描述、发布日期、语言和帖子的翻译版本（使用谷歌翻译 API 翻译成英文）都作为单独的属性显示在数据集中。开发完数据集后，我们进行了情感分析、仇恨言论检测以及焦虑或压力检测。这一过程包括将每篇帖子分为：(i) 一种情感类别，即恐惧、惊讶、喜悦、悲伤、愤怒、厌恶或中性；(ii) 仇恨或非仇恨；(iii) 检测到焦虑/压力或未检测到焦虑/压力。这些结果在数据集中作为单独的属性呈现。其次，本文介绍了情感分析、仇恨言论分析以及焦虑或压力分析的结果。据观察，情感类别（恐惧、惊讶、喜悦、悲伤、愤怒、厌恶和中性）的变化率分别为 27.95%、2.57%、8.69%、5.94%、2.69%、1.53% 和 50.64%。在仇恨言论检测方面，95.75% 的帖子不包含仇恨言论，其余 4.25% 的帖子包含仇恨言论。最后，72.05% 的帖子没有显示任何焦虑/压力，其余 27.95% 的帖子表现出某种形式的焦虑/压力。

{"title":"Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis","authors":"Nirmalya Thakur","doi":"arxiv-2409.05292","DOIUrl":"https://doi.org/arxiv-2409.05292","url":null,"abstract":"The world is currently experiencing an outbreak of mpox, which has been\u0000declared a Public Health Emergency of International Concern by WHO. No prior\u0000work related to social media mining has focused on the development of a dataset\u0000of Instagram posts about the mpox outbreak. The work presented in this paper\u0000aims to address this research gap and makes two scientific contributions to\u0000this field. First, it presents a multilingual dataset of 60,127 Instagram posts\u0000about mpox, published between July 23, 2022, and September 5, 2024. The\u0000dataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram\u0000posts about mpox in 52 languages. For each of these posts, the Post ID, Post\u0000Description, Date of publication, language, and translated version of the post\u0000(translation to English was performed using the Google Translate API) are\u0000presented as separate attributes in the dataset. After developing this dataset,\u0000sentiment analysis, hate speech detection, and anxiety or stress detection were\u0000performed. This process included classifying each post into (i) one of the\u0000sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or\u0000neutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no\u0000anxiety/stress detected. These results are presented as separate attributes in\u0000the dataset. Second, this paper presents the results of performing sentiment\u0000analysis, hate speech analysis, and anxiety or stress analysis. The variation\u0000of the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and\u0000neutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and\u000050.64%, respectively. In terms of hate speech detection, 95.75% of the posts\u0000did not contain hate and the remaining 4.25% of the posts contained hate.\u0000Finally, 72.05% of the posts did not indicate any anxiety/stress, and the\u0000remaining 27.95% of the posts represented some form of anxiety/stress.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0