首页 > 最新文献

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining最新文献

英文 中文
My friends also prefer diverse music: homophily and link prediction with user preferences for mainstream, novelty, and diversity in music 我的朋友们也喜欢多样化的音乐:同质性和链接预测与用户对主流音乐、新颖性和多样性的偏好有关
Tomislav Duricic, Dominik Kowald, M. Schedl, E. Lex
Homophily describes the phenomenon that similarity breeds connection, i.e., individuals tend to form ties with other people who are similar to themselves in some aspect(s). The similarity in music taste can undoubtedly influence who we make friends with and shape our social circles. In this paper, we study homophily in an online music platform Last.fm regarding user preferences towards listening to mainstream (M), novel (N), or diverse (D) content. Furthermore, we draw comparisons with homophily based on listening profiles derived from artists users have listened to in the past, i.e., artist profiles. Finally, we explore the utility of users' artist profiles as well as features describing M, N, and D for the task of link prediction. Our study reveals that: (i) users with a friendship connection share similar music taste based on their artist profiles; (ii) on average, a measure of how diverse is the music two users listen to is a stronger predictor of friendship than measures of their preferences towards mainstream or novel content, i.e., homophily is stronger for D than for M and N; (iii) some user groups such as high-novelty-seekers (explorers) exhibit strong homophily, but lower than average artist profile similarity; (iv) using M, N and D achieves comparable results on link prediction accuracy compared with using artist profiles, but the combination of features yields the best accuracy results, and (v) using combined features does not add value if graph-based features such as common neighbors are available, making M, N, and D features primarily useful in a cold-start user recommendation setting for users with few friendship connections. The insights from this study will inform future work on social context-aware music recommendation, user modeling, and link prediction.
同质性描述了相似产生联系的现象,即个体倾向于与在某些方面与自己相似的人建立联系。音乐品味的相似性无疑会影响我们的交友对象,塑造我们的社交圈。最后,我们研究了一个在线音乐平台的同质性。fm关于用户对听主流(M),新颖(N)或多样化(D)内容的偏好。此外,我们根据用户过去听过的艺术家的听力资料(即艺术家资料)与同质性进行比较。最后,我们探讨了用户的艺术家档案以及描述M、N和D的特征在链接预测任务中的效用。我们的研究表明:(1)有朋友关系的用户根据他们的艺术家资料分享相似的音乐品味;(ii)平均而言,衡量两个用户听的音乐的多样性比衡量他们对主流或新颖内容的偏好更能预测友谊,即D的同质性比M和N的同质性更强;(iii)一些用户群体,如高度新奇的寻求者(探索者)表现出强烈的同质性,但低于平均水平的艺术家形象相似性;(iv)与使用艺术家档案相比,使用M、N和D在链接预测准确性上取得了相当的结果,但特征的组合产生了最好的准确性结果,并且(v)如果基于图的特征(如共同邻居)可用,则使用组合特征不会增加价值,使得M、N和D特征主要用于冷启动用户推荐设置,对于友谊关系很少的用户。这项研究的见解将为未来的社交情境感知音乐推荐、用户建模和链接预测工作提供信息。
{"title":"My friends also prefer diverse music: homophily and link prediction with user preferences for mainstream, novelty, and diversity in music","authors":"Tomislav Duricic, Dominik Kowald, M. Schedl, E. Lex","doi":"10.1145/3487351.3492706","DOIUrl":"https://doi.org/10.1145/3487351.3492706","url":null,"abstract":"Homophily describes the phenomenon that similarity breeds connection, i.e., individuals tend to form ties with other people who are similar to themselves in some aspect(s). The similarity in music taste can undoubtedly influence who we make friends with and shape our social circles. In this paper, we study homophily in an online music platform Last.fm regarding user preferences towards listening to mainstream (M), novel (N), or diverse (D) content. Furthermore, we draw comparisons with homophily based on listening profiles derived from artists users have listened to in the past, i.e., artist profiles. Finally, we explore the utility of users' artist profiles as well as features describing M, N, and D for the task of link prediction. Our study reveals that: (i) users with a friendship connection share similar music taste based on their artist profiles; (ii) on average, a measure of how diverse is the music two users listen to is a stronger predictor of friendship than measures of their preferences towards mainstream or novel content, i.e., homophily is stronger for D than for M and N; (iii) some user groups such as high-novelty-seekers (explorers) exhibit strong homophily, but lower than average artist profile similarity; (iv) using M, N and D achieves comparable results on link prediction accuracy compared with using artist profiles, but the combination of features yields the best accuracy results, and (v) using combined features does not add value if graph-based features such as common neighbors are available, making M, N, and D features primarily useful in a cold-start user recommendation setting for users with few friendship connections. The insights from this study will inform future work on social context-aware music recommendation, user modeling, and link prediction.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116392770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Meta-reinforcement learning via buffering graph signatures for live video streaming events 元强化学习通过缓冲图形签名的实时视频流事件
Stefanos Antaris, Dimitrios Rafailidis, Sarunas Girdzijauskas
In this study, we present a meta-learning model to adapt the predictions of the network's capacity between viewers who participate in a live video streaming event. We propose the MELANIE model, where an event is formulated as a Markov Decision Process, performing meta-learning on reinforcement learning tasks. By considering a new event as a task, we design an actor-critic learning scheme to compute the optimal policy on estimating the viewers' high-bandwidth connections. To ensure fast adaptation to new connections or changes among viewers during an event, we implement a prioritized replay memory buffer based on the Kullback-Leibler divergence of the reward/throughput of the viewers' connections. Moreover, we adopt a model-agnostic meta-learning framework to generate a global model from past events. As viewers scarcely participate in several events, the challenge resides on how to account for the low structural similarity of different events. To combat this issue, we design a graph signature buffer to calculate the structural similarities of several streaming events and adjust the training of the global model accordingly. We evaluate the proposed model on the link weight prediction task on three real-world datasets of live video streaming events. Our experiments demonstrate the effectiveness of our proposed model, with an average relative gain of 25% against state-of-the-art strategies. For reproduction purposes, our evaluation datasets and implementation are publicly available at https://github.com/stefanosantaris/melanie
在这项研究中,我们提出了一个元学习模型,以适应参与视频直播事件的观众之间网络容量的预测。我们提出MELANIE模型,其中事件被表述为马尔可夫决策过程,在强化学习任务上执行元学习。通过将一个新事件作为任务,我们设计了一个演员-评论家学习方案来计算估计观众高带宽连接的最优策略。为了确保在事件期间快速适应新的连接或观众之间的变化,我们基于观众连接的奖励/吞吐量的Kullback-Leibler散度实现了优先级重放记忆缓冲。此外,我们采用了一个与模型无关的元学习框架,从过去的事件中生成一个全局模型。由于观众很少参与几个事件,挑战在于如何解释不同事件的低结构相似性。为了解决这个问题,我们设计了一个图签名缓冲区来计算几个流事件的结构相似性,并相应地调整全局模型的训练。我们在三个实时视频流事件的真实数据集上对所提出的模型进行了链路权重预测任务的评估。我们的实验证明了我们提出的模型的有效性,与最先进的策略相比,平均相对增益为25%。出于复制的目的,我们的评估数据集和实现可以在https://github.com/stefanosantaris/melanie上公开获得
{"title":"Meta-reinforcement learning via buffering graph signatures for live video streaming events","authors":"Stefanos Antaris, Dimitrios Rafailidis, Sarunas Girdzijauskas","doi":"10.1145/3487351.3490973","DOIUrl":"https://doi.org/10.1145/3487351.3490973","url":null,"abstract":"In this study, we present a meta-learning model to adapt the predictions of the network's capacity between viewers who participate in a live video streaming event. We propose the MELANIE model, where an event is formulated as a Markov Decision Process, performing meta-learning on reinforcement learning tasks. By considering a new event as a task, we design an actor-critic learning scheme to compute the optimal policy on estimating the viewers' high-bandwidth connections. To ensure fast adaptation to new connections or changes among viewers during an event, we implement a prioritized replay memory buffer based on the Kullback-Leibler divergence of the reward/throughput of the viewers' connections. Moreover, we adopt a model-agnostic meta-learning framework to generate a global model from past events. As viewers scarcely participate in several events, the challenge resides on how to account for the low structural similarity of different events. To combat this issue, we design a graph signature buffer to calculate the structural similarities of several streaming events and adjust the training of the global model accordingly. We evaluate the proposed model on the link weight prediction task on three real-world datasets of live video streaming events. Our experiments demonstrate the effectiveness of our proposed model, with an average relative gain of 25% against state-of-the-art strategies. For reproduction purposes, our evaluation datasets and implementation are publicly available at https://github.com/stefanosantaris/melanie","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123712578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pruning digital contact networks for meso-scale epidemic surveillance using foursquare data 利用foursquare数据修剪中尺度流行病监测的数字联系网络
S. Hurtado, R. Marculescu, J. Drake, R. Srinivasan
With the recent advances in human sensing, the push to integrate human mobility tracking with epidemic modeling highlights the lack of groundwork at the mesoscale (e.g., city-level) for both contact tracing and transmission dynamics. Although GPS data has been used to study city-level outbreaks in the past, existing approaches fail to capture the path of infection at the individual level. Consequently, in this paper, we extend epidemics prediction from estimating the size of an outbreak at the population level to estimating the individuals who may likely get infected within a finite period of time. To this end, we propose a network science based method to first build and then prune the dynamic contact networks for recurring interactions; these networks can serve as the backbone topology for mechanistic epidemics modeling. We test our method using Foursquare's Points of Interest (POI) smart phone geolocation data from over 1.3 million devices to better approximate the COVID-19 infection curves for two major (yet very different) US cities, (i.e., Austin and New York City), while maintaining the granularity of individual transmissions and reducing model uncertainty. Our method provides a foundation for building a disease prediction framework at the mesoscale that can help both policy makers and individuals better understand their estimated state of health and help the pandemic mitigation efforts.
随着人体传感技术的最新进展,将人体流动跟踪与流行病建模相结合的努力凸显了在中尺度(例如城市一级)缺乏接触者追踪和传播动力学的基础。虽然GPS数据过去曾用于研究城市一级的疫情,但现有方法未能捕捉到个人一级的感染途径。因此,在本文中,我们将流行病预测从估计人口水平上的爆发规模扩展到估计在有限时间内可能被感染的个体。为此,我们提出了一种基于网络科学的方法,首先构建然后修剪动态接触网络以进行重复交互;这些网络可以作为机械流行病建模的主干拓扑。我们使用来自130多万台设备的Foursquare兴趣点(POI)智能手机地理定位数据来测试我们的方法,以更好地近似美国两个主要(但非常不同)城市(即奥斯汀和纽约市)的COVID-19感染曲线,同时保持个人传输的粒度并减少模型的不确定性。我们的方法为建立中尺度疾病预测框架提供了基础,可以帮助政策制定者和个人更好地了解他们的估计健康状况,并帮助减轻大流行的努力。
{"title":"Pruning digital contact networks for meso-scale epidemic surveillance using foursquare data","authors":"S. Hurtado, R. Marculescu, J. Drake, R. Srinivasan","doi":"10.1101/2021.09.29.21264175","DOIUrl":"https://doi.org/10.1101/2021.09.29.21264175","url":null,"abstract":"With the recent advances in human sensing, the push to integrate human mobility tracking with epidemic modeling highlights the lack of groundwork at the mesoscale (e.g., city-level) for both contact tracing and transmission dynamics. Although GPS data has been used to study city-level outbreaks in the past, existing approaches fail to capture the path of infection at the individual level. Consequently, in this paper, we extend epidemics prediction from estimating the size of an outbreak at the population level to estimating the individuals who may likely get infected within a finite period of time. To this end, we propose a network science based method to first build and then prune the dynamic contact networks for recurring interactions; these networks can serve as the backbone topology for mechanistic epidemics modeling. We test our method using Foursquare's Points of Interest (POI) smart phone geolocation data from over 1.3 million devices to better approximate the COVID-19 infection curves for two major (yet very different) US cities, (i.e., Austin and New York City), while maintaining the granularity of individual transmissions and reducing model uncertainty. Our method provides a foundation for building a disease prediction framework at the mesoscale that can help both policy makers and individuals better understand their estimated state of health and help the pandemic mitigation efforts.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126019380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spatial agent-based model for preemptive evacuation decisions during typhoon 基于空间主体的台风预警疏散决策模型
Rey C. Rodrigueza, Maria Regina Justina E. Estuar
Natural disasters continue to cause tremendous damage to human lives and properties. The Philippines, due to its geographic location, is considered a natural disaster-prone country experiencing an average of 20 tropical cyclones annually. Understanding what factors significantly affect decision making during crucial evacuation stages could help in making decisions on how to prepare for disasters, how to act appropriately and strategically respond during and after a calamity. In this work, an agent-based model for preemptive evacuation decisions during typhoon is presented. In the model, civilians are represented by households and their evacuation decisions were based from calculated perceived risk. Also, rescuer and shelter manager agents were included as facilitators during the preemptive evacuation process. National and municipal census data were employed in the model, particularly for the demographics of household agents. Further, geospatial data of a village in a typhoon-susceptible municipality was used to represent the environment. The decision to evacuate or not to evacuate depends on the agent's perceived risk which also depends on three decision factors: characteristics of the decision maker (CDM); capacity related factors (CRF); and hazard related factors (HRF). Finally, the number of households who decided to evacuate or opted to stay as influenced by the model's decision factors were determined during simulations. Sensitivity analysis using linear regression shows that all parameters used in the model are significant in the evacuation decision of household agents.
自然灾害继续给人类生命财产造成巨大损失。由于地理位置的原因,菲律宾被认为是一个自然灾害频发的国家,平均每年经历20个热带气旋。了解在关键的疏散阶段哪些因素会对决策产生重大影响,有助于制定如何为灾难做准备的决策,以及如何在灾难发生期间和之后采取适当和战略性的行动。本文提出了一种基于智能体的台风预警疏散决策模型。在该模型中,平民以家庭为代表,他们的疏散决策基于计算出的感知风险。此外,救援人员和避难所管理人员在先发制人的疏散过程中也被包括在内。该模型采用了全国和城市人口普查数据,特别是住户代理的人口统计数据。此外,还使用了台风易感城市的一个村庄的地理空间数据来表示环境。疏散或不疏散的决策取决于代理人的感知风险,而感知风险又取决于三个决策因素:决策者的特征(CDM);容量相关因素;和危险相关因素(HRF)。最后,在模拟过程中确定受模型决策因素影响而决定撤离或选择留下来的家庭数量。线性回归的敏感性分析表明,模型中使用的所有参数对住户代理的疏散决策都是显著的。
{"title":"A spatial agent-based model for preemptive evacuation decisions during typhoon","authors":"Rey C. Rodrigueza, Maria Regina Justina E. Estuar","doi":"10.1145/3487351.3488338","DOIUrl":"https://doi.org/10.1145/3487351.3488338","url":null,"abstract":"Natural disasters continue to cause tremendous damage to human lives and properties. The Philippines, due to its geographic location, is considered a natural disaster-prone country experiencing an average of 20 tropical cyclones annually. Understanding what factors significantly affect decision making during crucial evacuation stages could help in making decisions on how to prepare for disasters, how to act appropriately and strategically respond during and after a calamity. In this work, an agent-based model for preemptive evacuation decisions during typhoon is presented. In the model, civilians are represented by households and their evacuation decisions were based from calculated perceived risk. Also, rescuer and shelter manager agents were included as facilitators during the preemptive evacuation process. National and municipal census data were employed in the model, particularly for the demographics of household agents. Further, geospatial data of a village in a typhoon-susceptible municipality was used to represent the environment. The decision to evacuate or not to evacuate depends on the agent's perceived risk which also depends on three decision factors: characteristics of the decision maker (CDM); capacity related factors (CRF); and hazard related factors (HRF). Finally, the number of households who decided to evacuate or opted to stay as influenced by the model's decision factors were determined during simulations. Sensitivity analysis using linear regression shows that all parameters used in the model are significant in the evacuation decision of household agents.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Interpretable business survival prediction 可解释的业务生存预测
Anish K. Vallapuram, Nikhil Nanda, Young D. Kwon, Pan Hui
The survival of a business is undeniably pertinent to its success. A key factor contributing to its continuity depends on its customers. The surge of location-based social networks such as Yelp, Diangping, and Foursquare has paved the way for leveraging user-generated content on these platforms to predict business survival. Prior works in this area have developed several quantitative features to capture geography and user mobility among businesses. However, the development of qualitative features is minimal. In this work, we thus perform extensive feature engineering across four feature sets, namely, geography, user mobility, business attributes, and linguistic modelling to develop classifiers for business survival prediction. We additionally employ an interpretability framework to generate explanations and qualitatively assess the classifiers' predictions. Experimentation among the feature sets reveals that qualitative features including business attributes and linguistic features have the highest predictive power, achieving AUC scores of 0.72 and 0.67, respectively. Furthermore, the explanations generated by the interpretability framework demonstrate that these models can potentially identify the reasons from review texts for the survival of a business.
不可否认,一个企业的生存与它的成功息息相关。其持续发展的一个关键因素取决于其客户。基于地理位置的社交网络如Yelp、Diangping和Foursquare的兴起为利用这些平台上的用户生成内容来预测商业生存铺平了道路。该领域之前的工作已经开发了几个定量特征来捕捉企业之间的地理位置和用户移动性。然而,定性特征的发展是最小的。因此,在这项工作中,我们在四个特征集上进行了广泛的特征工程,即地理、用户移动性、业务属性和语言建模,以开发用于业务生存预测的分类器。我们还采用可解释性框架来生成解释并定性地评估分类器的预测。在特征集之间的实验表明,包括业务属性和语言特征在内的定性特征具有最高的预测能力,AUC得分分别为0.72和0.67。此外,由可解释性框架产生的解释表明,这些模型可以从审查文本中潜在地识别企业生存的原因。
{"title":"Interpretable business survival prediction","authors":"Anish K. Vallapuram, Nikhil Nanda, Young D. Kwon, Pan Hui","doi":"10.1145/3487351.3488353","DOIUrl":"https://doi.org/10.1145/3487351.3488353","url":null,"abstract":"The survival of a business is undeniably pertinent to its success. A key factor contributing to its continuity depends on its customers. The surge of location-based social networks such as Yelp, Diangping, and Foursquare has paved the way for leveraging user-generated content on these platforms to predict business survival. Prior works in this area have developed several quantitative features to capture geography and user mobility among businesses. However, the development of qualitative features is minimal. In this work, we thus perform extensive feature engineering across four feature sets, namely, geography, user mobility, business attributes, and linguistic modelling to develop classifiers for business survival prediction. We additionally employ an interpretability framework to generate explanations and qualitatively assess the classifiers' predictions. Experimentation among the feature sets reveals that qualitative features including business attributes and linguistic features have the highest predictive power, achieving AUC scores of 0.72 and 0.67, respectively. Furthermore, the explanations generated by the interpretability framework demonstrate that these models can potentially identify the reasons from review texts for the survival of a business.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128940798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian inference of a social graph with trace feasibility guarantees 具有迹可行性保证的社会图的贝叶斯推理
Effrosyni Papanastasiou, A. Giovanidis
Network inference is the process of deciding what is the true unknown graph underlying a set of interactions between nodes. There is a vast literature on the subject, but most known methods have an important drawback: the inferred graph is not guaranteed to explain every interaction from the input trace. We consider this an important issue since such inferred graph cannot be used as input for applications that require a reliable estimate of the true graph. On the other hand, a graph having trace feasibility guarantees can help us better understand the true (hidden) interactions that may have taken place between nodes of interest. The inference of such graph is the goal of this paper. Firstly, given an activity log from a social network, we introduce a set of constraints that take into consideration all the hidden paths that are possible between the nodes of the trace, given their timestamps of interaction. Then, we develop a nontrivial modification of the Expectation-Maximization algorithm by Newman [1], that we call Constrained-EM, which incorporates the constraints and a set of auxiliary variables into the inference process to guide it towards the feasibility of the trace. Experimental results on real-world data from Twitter confirm that Constrained-EM generates a posterior distribution of graphs that explains all the events observed in the trace while presenting the desired properties of a scale-free, small-world graph. Our method also outperforms established methods in terms of feasibility and quality of the inferred graph.
网络推理是在节点之间的一组交互下决定什么是真正的未知图的过程。关于这个主题有大量的文献,但是大多数已知的方法都有一个重要的缺点:推断的图不能保证解释来自输入跟踪的每个交互。我们认为这是一个重要的问题,因为这种推断图不能用作需要对真实图进行可靠估计的应用程序的输入。另一方面,具有跟踪可行性保证的图可以帮助我们更好地理解感兴趣的节点之间可能发生的真实(隐藏的)交互。这类图的推理是本文的目的。首先,给定来自社交网络的活动日志,我们引入了一组约束,这些约束考虑了轨迹节点之间可能存在的所有隐藏路径,给定了它们的交互时间戳。然后,我们开发了Newman[1]的期望最大化算法的非平凡修改,我们称之为Constrained-EM,它将约束和一组辅助变量纳入推理过程,以指导其走向跟踪的可行性。来自Twitter的真实世界数据的实验结果证实,Constrained-EM生成的图的后验分布解释了在跟踪中观察到的所有事件,同时呈现出无标度小世界图的期望属性。我们的方法在推断图的可行性和质量方面也优于现有的方法。
{"title":"Bayesian inference of a social graph with trace feasibility guarantees","authors":"Effrosyni Papanastasiou, A. Giovanidis","doi":"10.1145/3487351.3488279","DOIUrl":"https://doi.org/10.1145/3487351.3488279","url":null,"abstract":"Network inference is the process of deciding what is the true unknown graph underlying a set of interactions between nodes. There is a vast literature on the subject, but most known methods have an important drawback: the inferred graph is not guaranteed to explain every interaction from the input trace. We consider this an important issue since such inferred graph cannot be used as input for applications that require a reliable estimate of the true graph. On the other hand, a graph having trace feasibility guarantees can help us better understand the true (hidden) interactions that may have taken place between nodes of interest. The inference of such graph is the goal of this paper. Firstly, given an activity log from a social network, we introduce a set of constraints that take into consideration all the hidden paths that are possible between the nodes of the trace, given their timestamps of interaction. Then, we develop a nontrivial modification of the Expectation-Maximization algorithm by Newman [1], that we call Constrained-EM, which incorporates the constraints and a set of auxiliary variables into the inference process to guide it towards the feasibility of the trace. Experimental results on real-world data from Twitter confirm that Constrained-EM generates a posterior distribution of graphs that explains all the events observed in the trace while presenting the desired properties of a scale-free, small-world graph. Our method also outperforms established methods in terms of feasibility and quality of the inferred graph.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130424377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Community formation and detection on GitHub collaboration networks GitHub协作网络上的社区形成和检测
Behnaz Moradi-Jamei, Brandon L. Kramer, J. Bayo´an, Santiago Calder´on, Gizem Korkmaz
This paper studies community formation in OSS collaboration networks. While most current work examines the emergence of small-scale OSS projects, our approach draws on a large-scale historical dataset of 1.8 million GitHub users and their repository contributions. OSS collaborations are characterized by small groups of users that work closely together, leading to the presence of communities defined by short cycles in the underlying network structure. To understand the impact of this phenomenon, we apply a pre-processing step that accounts for the cyclic network structure by using Renewal-Nonbacktracking Random Walks (RNBRW) and the strength of pairwise collaborations before implementing the Louvain method to identify communities within the network. Equipping Louvain with RNBRW and the contribution strength provides a more assertive approach for detecting small-scale teams and reveals nontrivial differences in community detection such as users' tendencies toward preferential attachment to more established collaboration communities. Using this method, we also identify key factors that affect community formation, including the effect of users' location and primary programming language, which was determined using a comparative method of contribution activities. Overall, this paper offers several promising methodological insights for both open-source software experts and network scholars interested in studying team formation.
本文研究了OSS协作网络中的社区形成。虽然目前的大多数工作都是研究小规模OSS项目的出现,但我们的方法利用了180万GitHub用户及其存储库贡献的大规模历史数据集。OSS协作的特点是紧密合作的小用户组,导致底层网络结构中由短周期定义的社区的存在。为了理解这种现象的影响,我们在实施Louvain方法来识别网络中的社区之前,应用了一个预处理步骤,通过使用更新-非回溯随机行走(RNBRW)和成对协作的强度来解释循环网络结构。为Louvain配备RNBRW和贡献强度提供了一种更自信的方法来检测小规模团队,并揭示了社区检测中的重要差异,例如用户倾向于优先依恋更成熟的协作社区。利用这种方法,我们还确定了影响社区形成的关键因素,包括用户位置和主要编程语言的影响,这是通过贡献活动的比较方法确定的。总的来说,本文为开源软件专家和对研究团队形成感兴趣的网络学者提供了一些有前途的方法论见解。
{"title":"Community formation and detection on GitHub collaboration networks","authors":"Behnaz Moradi-Jamei, Brandon L. Kramer, J. Bayo´an, Santiago Calder´on, Gizem Korkmaz","doi":"10.1145/3487351.3488278","DOIUrl":"https://doi.org/10.1145/3487351.3488278","url":null,"abstract":"This paper studies community formation in OSS collaboration networks. While most current work examines the emergence of small-scale OSS projects, our approach draws on a large-scale historical dataset of 1.8 million GitHub users and their repository contributions. OSS collaborations are characterized by small groups of users that work closely together, leading to the presence of communities defined by short cycles in the underlying network structure. To understand the impact of this phenomenon, we apply a pre-processing step that accounts for the cyclic network structure by using Renewal-Nonbacktracking Random Walks (RNBRW) and the strength of pairwise collaborations before implementing the Louvain method to identify communities within the network. Equipping Louvain with RNBRW and the contribution strength provides a more assertive approach for detecting small-scale teams and reveals nontrivial differences in community detection such as users' tendencies toward preferential attachment to more established collaboration communities. Using this method, we also identify key factors that affect community formation, including the effect of users' location and primary programming language, which was determined using a comparative method of contribution activities. Overall, this paper offers several promising methodological insights for both open-source software experts and network scholars interested in studying team formation.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117002427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The banking transactions dataset and its comparative analysis with scale-free networks 银行交易数据集及其与无标度网络的比较分析
A. Saxena, Yulong Pei, Jan Veldsink, Werner van Ipenburg, G. Fletcher, Mykola Pechenizkiy
We construct a network of 1.6 million nodes from banking transactions of users of Rabobank. We assign two weights on each edge, which are the aggregate transferred amount and the total number of transactions between the users from the year 2010 to 2020. We present a detailed analysis of the unweighted and both weighted networks by examining their degree, strength, and weight distributions, as well as the topological assortativity and weighted assortativity, clustering, and weighted clustering, together with correlations between these quantities. We further study the meso-scale properties of the networks and compare them to a randomized reference system. This will be the first publicly shared dataset of intra-bank transactions, and this work highlights the unique characteristics of banking transaction networks with other scale-free networks.
我们从Rabobank用户的银行交易中构建了一个160万个节点的网络。我们在每条边上分配了两个权重,分别是2010年到2020年用户之间的总传输量和总交易量。我们通过检查其程度,强度和权重分布,以及拓扑选型性和加权选型性,聚类和加权聚类,以及这些数量之间的相关性,对未加权和均加权网络进行了详细分析。我们进一步研究了网络的中尺度特性,并将其与随机参考系统进行了比较。这将是第一个公开共享的银行内部交易数据集,这项工作突出了银行交易网络与其他无标度网络的独特特征。
{"title":"The banking transactions dataset and its comparative analysis with scale-free networks","authors":"A. Saxena, Yulong Pei, Jan Veldsink, Werner van Ipenburg, G. Fletcher, Mykola Pechenizkiy","doi":"10.1145/3487351.3488339","DOIUrl":"https://doi.org/10.1145/3487351.3488339","url":null,"abstract":"We construct a network of 1.6 million nodes from banking transactions of users of Rabobank. We assign two weights on each edge, which are the aggregate transferred amount and the total number of transactions between the users from the year 2010 to 2020. We present a detailed analysis of the unweighted and both weighted networks by examining their degree, strength, and weight distributions, as well as the topological assortativity and weighted assortativity, clustering, and weighted clustering, together with correlations between these quantities. We further study the meso-scale properties of the networks and compare them to a randomized reference system. This will be the first publicly shared dataset of intra-bank transactions, and this work highlights the unique characteristics of banking transaction networks with other scale-free networks.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127108387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MMCoVaR: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification MMCoVaR:用于假新闻检测的以COVID-19疫苗为重点的多模式数据存储库和分类基线架构
Mingxuan Chen, Xinqiao Chu, K. P. Subbalakshmi
The outbreak of COVID-19 has resulted in an "infodemic" that has encouraged the propagation of misinformation about COVID-19 and cure methods which, in turn, could negatively affect the adoption of recommended public health measures in the larger population. In this paper, we provide a new multimodal (consisting of images, text and temporal information) labeled dataset containing news articles and tweets on the COVID-19 vaccine. We collected 2,593 news articles from 80 publishers for one year between Feb 16th 2020 to May 8th 2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th 2021). We combine ratings from two news media ranking sites: Medias Bias Chart and Media Bias/Fact Check (MBFC) to classify the news dataset into two levels of credibility: reliable and unreliable. The combination of two filters allows for higher precision of labeling. We also propose a stance detection mechanism to annotate tweets into three levels of credibility: reliable, unreliable and inconclusive. We provide several statistics as well as other analytics like, publisher distribution, publication date distribution, topic analysis, etc. We also provide a novel architecture that classifies the news data into misinformation or truth to provide a baseline performance for this dataset. We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection. Furthermore, we provide benchmark performance for misinformation detection on tweet dataset. This new multimodal dataset can be used in research on COVID-19 vaccine, including misinformation detection, influence of fake COVID-19 vaccine information, etc.
COVID-19的爆发导致了“信息大流行”,助长了关于COVID-19和治疗方法的错误信息的传播,这反过来又可能对在更大人群中采用建议的公共卫生措施产生负面影响。在本文中,我们提供了一个新的多模态(由图像、文本和时间信息组成)标记数据集,其中包含关于COVID-19疫苗的新闻文章和推文。我们在2020年2月16日至2021年5月8日期间收集了来自80家出版商的2,593篇新闻文章和24184篇Twitter帖子(收集于2021年4月17日至2021年5月8日)。我们结合两个新闻媒体排名网站的评级:媒体偏见图表和媒体偏见/事实检查(MBFC),将新闻数据集分为两个可信度级别:可靠和不可靠。两个过滤器的组合允许更高的标签精度。我们还提出了一种姿态检测机制,将推文标注为三个可信度级别:可靠、不可靠和不确定。我们提供一些统计数据以及其他分析,如出版商分布,出版日期分布,主题分析等。我们还提供了一种新的架构,将新闻数据分类为错误信息或事实,从而为该数据集提供基准性能。我们发现所提出的架构在假新闻检测上的F-Score为0.919,准确率为0.882。此外,我们还提供了推文数据集错误信息检测的基准性能。该多模态数据集可用于新冠肺炎疫苗的研究,包括错误信息检测、假疫苗信息的影响等。
{"title":"MMCoVaR: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification","authors":"Mingxuan Chen, Xinqiao Chu, K. P. Subbalakshmi","doi":"10.1145/3487351.3488346","DOIUrl":"https://doi.org/10.1145/3487351.3488346","url":null,"abstract":"The outbreak of COVID-19 has resulted in an \"infodemic\" that has encouraged the propagation of misinformation about COVID-19 and cure methods which, in turn, could negatively affect the adoption of recommended public health measures in the larger population. In this paper, we provide a new multimodal (consisting of images, text and temporal information) labeled dataset containing news articles and tweets on the COVID-19 vaccine. We collected 2,593 news articles from 80 publishers for one year between Feb 16th 2020 to May 8th 2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th 2021). We combine ratings from two news media ranking sites: Medias Bias Chart and Media Bias/Fact Check (MBFC) to classify the news dataset into two levels of credibility: reliable and unreliable. The combination of two filters allows for higher precision of labeling. We also propose a stance detection mechanism to annotate tweets into three levels of credibility: reliable, unreliable and inconclusive. We provide several statistics as well as other analytics like, publisher distribution, publication date distribution, topic analysis, etc. We also provide a novel architecture that classifies the news data into misinformation or truth to provide a baseline performance for this dataset. We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection. Furthermore, we provide benchmark performance for misinformation detection on tweet dataset. This new multimodal dataset can be used in research on COVID-19 vaccine, including misinformation detection, influence of fake COVID-19 vaccine information, etc.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125095043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Assessing the quality of the datasets by identifying mislabeled samples 通过识别错误标记的样本来评估数据集的质量
Vaibhav Pulastya, Gaurav Nuti, Yash Kumar Atri, Tanmoy Chakraborty
Due to the over-emphasize of the quantity of data, the data quality has often been overlooked. However, not all training data points contribute equally to learning. In particular, if mislabeled, it might actively damage the performance of the model and the ability to generalize out of distribution, as the model might end up learning spurious artifacts present in the dataset. This problem gets compounded by the prevalence of heavily parameterized and complex deep neural networks, which can, with their high capacity, end up memorizing the noise present in the dataset. This paper proposes a novel statistic - noise score, as a measure for the quality of each data point to identify such mislabeled samples based on the variations in the latent space representation. In our work, we use the representations derived by the inference network of data quality supervised variational autoencoder (AQUAVS). Our method leverages the fact that samples belonging to the same class will have similar latent representations. Therefore, by identifying the outliers in the latent space, we can find the mislabeled samples. We validate our proposed statistic through experimentation by corrupting MNIST, FashionMNIST, and CIFAR10/100 datasets in different noise settings for the task of identifying mislabelled samples. We further show significant improvements in accuracy for the classification task for each dataset.
由于过分强调数据的数量,往往忽视了数据的质量。然而,并不是所有的训练数据点对学习都有同样的贡献。特别是,如果标记错误,它可能会主动损害模型的性能和从分布中泛化的能力,因为模型可能最终会学习数据集中存在的虚假工件。这个问题由于大量参数化和复杂的深度神经网络的流行而变得更加复杂,这些深度神经网络的高容量最终会记住数据集中存在的噪声。本文提出了一种新的统计-噪声评分,作为每个数据点质量的度量,以识别基于潜在空间表示变化的错误标记样本。在我们的工作中,我们使用了由数据质量监督变分自编码器(AQUAVS)的推理网络派生的表示。我们的方法利用了一个事实,即属于同一类的样本将具有相似的潜在表示。因此,通过识别潜在空间中的异常值,我们可以找到错误标记的样本。我们通过在不同噪声设置下破坏MNIST、FashionMNIST和CIFAR10/100数据集来验证我们提出的统计数据,以识别错误标记的样本。我们进一步展示了对每个数据集的分类任务的准确性的显着改进。
{"title":"Assessing the quality of the datasets by identifying mislabeled samples","authors":"Vaibhav Pulastya, Gaurav Nuti, Yash Kumar Atri, Tanmoy Chakraborty","doi":"10.1145/3487351.3488361","DOIUrl":"https://doi.org/10.1145/3487351.3488361","url":null,"abstract":"Due to the over-emphasize of the quantity of data, the data quality has often been overlooked. However, not all training data points contribute equally to learning. In particular, if mislabeled, it might actively damage the performance of the model and the ability to generalize out of distribution, as the model might end up learning spurious artifacts present in the dataset. This problem gets compounded by the prevalence of heavily parameterized and complex deep neural networks, which can, with their high capacity, end up memorizing the noise present in the dataset. This paper proposes a novel statistic - noise score, as a measure for the quality of each data point to identify such mislabeled samples based on the variations in the latent space representation. In our work, we use the representations derived by the inference network of data quality supervised variational autoencoder (AQUAVS). Our method leverages the fact that samples belonging to the same class will have similar latent representations. Therefore, by identifying the outliers in the latent space, we can find the mislabeled samples. We validate our proposed statistic through experimentation by corrupting MNIST, FashionMNIST, and CIFAR10/100 datasets in different noise settings for the task of identifying mislabelled samples. We further show significant improvements in accuracy for the classification task for each dataset.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1