首页 > 最新文献

2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)最新文献

英文 中文
Predicting Depression and Anxiety on Reddit: a Multi-task Learning Approach 预测Reddit上的抑郁和焦虑:多任务学习方法
Shailik Sarkar, Abdulaziz Alhamadani, Lulwah Alkulaib, Chang-Tien Lu
One of the strongest indicators of a mental health crisis is how people interact with each other or express them-selves. Hence, social media is an ideal source to extract user-level information about the language used to express personal feelings. In the wake of the ever-increasing mental health crisis in the United States, it is imperative to analyze the general well-being of a population and investigate how their public social media posts can be used to detect different underlying mental health conditions. For that purpose, we propose a study that collects posts from “reddits” related to different mental health topics to detect the type of the post and the nature of the mental health issues that correlate to the post. The task of detecting mental health related issues indicates the mental health conditions connected to the posts. To achieve this, we develop a multi-task learning model that leverages, for each post, both the latent embedding space of words and topics for prediction with a message passing mechanism enabling the sharing of information for related tasks. We train the model through an active learning approach in order to tackle the lack of standardized fine-grained label data for this specific task.
心理健康危机最有力的指标之一是人们如何与他人互动或表达自己。因此,社交媒体是提取用于表达个人情感的语言的用户级信息的理想来源。在美国日益严重的心理健康危机之后,有必要分析人口的总体健康状况,并研究如何利用他们的公共社交媒体帖子来检测不同的潜在心理健康状况。为此,我们提出了一项研究,从“reddit”上收集与不同心理健康主题相关的帖子,以检测帖子的类型以及与帖子相关的心理健康问题的性质。发现与心理健康有关的问题的任务表明与这些岗位有关的心理健康状况。为了实现这一目标,我们开发了一个多任务学习模型,该模型利用每个帖子的潜在嵌入空间和主题进行预测,并通过消息传递机制实现相关任务的信息共享。我们通过主动学习方法训练模型,以解决缺乏标准化细粒度标签数据的特定任务。
{"title":"Predicting Depression and Anxiety on Reddit: a Multi-task Learning Approach","authors":"Shailik Sarkar, Abdulaziz Alhamadani, Lulwah Alkulaib, Chang-Tien Lu","doi":"10.1109/ASONAM55673.2022.10068655","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068655","url":null,"abstract":"One of the strongest indicators of a mental health crisis is how people interact with each other or express them-selves. Hence, social media is an ideal source to extract user-level information about the language used to express personal feelings. In the wake of the ever-increasing mental health crisis in the United States, it is imperative to analyze the general well-being of a population and investigate how their public social media posts can be used to detect different underlying mental health conditions. For that purpose, we propose a study that collects posts from “reddits” related to different mental health topics to detect the type of the post and the nature of the mental health issues that correlate to the post. The task of detecting mental health related issues indicates the mental health conditions connected to the posts. To achieve this, we develop a multi-task learning model that leverages, for each post, both the latent embedding space of words and topics for prediction with a message passing mechanism enabling the sharing of information for related tasks. We train the model through an active learning approach in order to tackle the lack of standardized fine-grained label data for this specific task.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130847519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Faster Greedy Optimization of Resistance-based Graph Robustness 基于阻力的图鲁棒性更快贪婪优化
Maria Predari, R. Kooij, Henning Meyerhenke
The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider the optimization problem of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i. e., is most robust). The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in established generic greedy heuristics. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process large graphs for which the application of the state-of-the-art greedy approach was infeasible before. As far as we know, we are the first to be able to process graphs with $100K+$ nodes in the order of minutes.
总有效阻力,也称为基尔霍夫指数,为图形提供了稳健性度量。我们考虑将$k$新边添加到$G$的优化问题,使结果图具有最小的总有效阻力(即最鲁棒)。总有效电阻和节点之间的有效电阻可以用图拉普拉斯的伪逆来计算。伪逆可以通过伪反演显式计算;然而,这在实践中需要三次时间和二次空间。我们利用组合和代数连接来加快已建立的泛型贪婪启发式算法的增益计算。此外,我们利用现有的随机化技术,通过引入子采样步骤来提高我们的方法的性能。我们不同的基于图和矩阵的方法确实比最先进的贪心算法快得多,而它们的质量仍然相当高,而且通常非常接近。我们的实验表明,我们现在可以处理大型图,而最先进的贪心方法在以前是不可行的。据我们所知,我们是第一个能够在几分钟内处理$100K+$ $节点的图的人。
{"title":"Faster Greedy Optimization of Resistance-based Graph Robustness","authors":"Maria Predari, R. Kooij, Henning Meyerhenke","doi":"10.1109/ASONAM55673.2022.10068613","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068613","url":null,"abstract":"The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider the optimization problem of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i. e., is most robust). The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in established generic greedy heuristics. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process large graphs for which the application of the state-of-the-art greedy approach was infeasible before. As far as we know, we are the first to be able to process graphs with $100K+$ nodes in the order of minutes.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Classes versus Communities: Outlier Detection and Removal in Tabular Datasets via Social Network Analysis (ClaCO) 类与社区:通过社会网络分析(ClaCO)在表格数据集中检测和去除异常值
Serkan Üçer, Tansel Özyer, R. Alhajj
In this research, we introduce a model to detect inconsistent & anomalous samples in tabular labeled datasets which are used in machine learning classification tasks, frequently. Our model, abbreviated as the ClaCO (Classes vs. Communities: SNA for Outlier Detection), first converts tabular data with labels into an attributed and labeled undirected network graph. Following the enrichment of the graph, it analyses the edge structure of the individual egonets, in terms of the class and community belongings, by introducing a new SNA metric named as ‘the Consistency Score of a Node - CSoN’. Through an exhaustive analysis of the ego network of a node, CSoN tries to exhibit consistency of a node by examining the similarity of its immediate neighbors in terms of shared class and/or shared community belongings. To prove the efficiency of the proposed ClaCO, we employed it as a subsidiary method for detecting anomalous samples in the train part in the traditional ML classification task. With the help of this new consistency score, the least CSoN scored set of nodes flagged as outliers and removed from the training dataset, and remaining part fed into the ML model to see the effect on classification performance with the ‘whole’ dataset through competing outlier detection methods. We have shown this outlier detection model as an efficient method since it improves classification performance both on the whole dataset and reduced datasets with competing outlier detection methods, over several known both real-life and synthetic datasets.
在本研究中,我们引入了一个模型来检测机器学习分类任务中经常使用的表格标记数据集中的不一致和异常样本。我们的模型,缩写为ClaCO (Classes vs. Communities: SNA for Outlier Detection),首先将带有标签的表格数据转换为带有属性和标记的无向网络图。在图的丰富之后,它通过引入一个新的SNA度量,称为“节点的一致性得分- CSoN”,从类和社区财产的角度分析了个体自我的边缘结构。通过对节点自我网络的详尽分析,CSoN试图通过检查其近邻在共享类和/或共享社区财产方面的相似性来展示节点的一致性。为了证明ClaCO的有效性,我们将其作为传统ML分类任务中训练部分异常样本检测的辅助方法。在这个新的一致性评分的帮助下,CSoN得分最低的节点集被标记为离群值并从训练数据集中删除,其余部分输入ML模型,通过竞争的离群值检测方法查看对“整个”数据集分类性能的影响。我们已经证明了这种离群值检测模型是一种有效的方法,因为它在几个已知的真实数据集和合成数据集上,通过竞争的离群值检测方法,提高了整个数据集和简化数据集的分类性能。
{"title":"Classes versus Communities: Outlier Detection and Removal in Tabular Datasets via Social Network Analysis (ClaCO)","authors":"Serkan Üçer, Tansel Özyer, R. Alhajj","doi":"10.1109/ASONAM55673.2022.10068694","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068694","url":null,"abstract":"In this research, we introduce a model to detect inconsistent & anomalous samples in tabular labeled datasets which are used in machine learning classification tasks, frequently. Our model, abbreviated as the ClaCO (Classes vs. Communities: SNA for Outlier Detection), first converts tabular data with labels into an attributed and labeled undirected network graph. Following the enrichment of the graph, it analyses the edge structure of the individual egonets, in terms of the class and community belongings, by introducing a new SNA metric named as ‘the Consistency Score of a Node - CSoN’. Through an exhaustive analysis of the ego network of a node, CSoN tries to exhibit consistency of a node by examining the similarity of its immediate neighbors in terms of shared class and/or shared community belongings. To prove the efficiency of the proposed ClaCO, we employed it as a subsidiary method for detecting anomalous samples in the train part in the traditional ML classification task. With the help of this new consistency score, the least CSoN scored set of nodes flagged as outliers and removed from the training dataset, and remaining part fed into the ML model to see the effect on classification performance with the ‘whole’ dataset through competing outlier detection methods. We have shown this outlier detection model as an efficient method since it improves classification performance both on the whole dataset and reduced datasets with competing outlier detection methods, over several known both real-life and synthetic datasets.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124028485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is Twitter Enough? Investigating Situational Awareness in Social and Print Media during the Second COVID-19 Wave in India 推特就足够了吗?调查印度第二次COVID-19浪潮期间社交媒体和印刷媒体的态势意识
Ishita Vohra, Meher Shashwat Nigam, Aryan Sakaria, Amey Kudari, N. Rangaswamy
The COVID-19 pandemic required efficient allocation of public resources and transforming existing ways of societal functions. To manage any crisis, governments and public health researchers ex-ploit the information available to them in order to make informed decisions, also defined as situational awareness. Gathering situational awareness using so-cial media, has been functional to manage epidemics. Previous research focused on using discussions during periods of epidemic crises on social media platforms like Twitter, Reddit, or Facebook and developing NLP techniques to filter out important/relevant discussions from a huge corpus of messages and posts. Social media usage varies with internet penetration and other socio-economic factors, which might induce disparity in an-alyzing discussions across different geographies. How-ever, print media is a ubiquitous information source, irrespective of geography. Further, topics discussed in news articles are already ‘newsworthy’, while on social media ‘newsworthiness' is a product of techno-social processes. Developing this fundamental difference, we study Twitter data during the second wave in India focused on six high-population cities with varied macro-economic factors. Through a mixture of qualitative and quantitative methods, we further analyze two Indian newspapers during the same period and compare topics from both Twitter and the newspapers to evaluate sit-uational awareness around the second phase of COVID on each of these platforms. We conclude that factors like internet penetration and GDP in a specific city influence the discourse surrounding situational updates on social media. Thus, augmenting information from newspapers to information extracted from social media would provide a more comprehensive perspective in resource-deficit cities
COVID-19大流行要求有效配置公共资源,转变现有的社会职能方式。为了管理任何危机,政府和公共卫生研究人员利用他们可以获得的信息,以便做出明智的决定,也被定义为态势感知。利用社交媒体收集态势感知,在管理流行病方面发挥了作用。之前的研究侧重于在Twitter、Reddit或Facebook等社交媒体平台上利用疫情危机期间的讨论,并开发NLP技术,从大量信息和帖子中过滤出重要/相关的讨论。社交媒体的使用因互联网普及率和其他社会经济因素而异,这可能会导致分析不同地区讨论的差异。然而,无论地理位置如何,印刷媒体都是无处不在的信息来源。此外,新闻文章中讨论的话题已经具有“新闻价值”,而在社交媒体上,“新闻价值”是技术-社会过程的产物。为了发展这一根本差异,我们研究了印度第二次浪潮期间的Twitter数据,重点研究了六个具有不同宏观经济因素的高人口城市。通过定性和定量相结合的方法,我们进一步分析了同一时期的两家印度报纸,并比较了Twitter和报纸上的话题,以评估这些平台上围绕COVID第二阶段的位置意识。我们的结论是,特定城市的互联网普及率和GDP等因素会影响围绕社交媒体上情境更新的话语。因此,将报纸上的信息增加到从社交媒体中提取的信息,可以为资源短缺的城市提供更全面的视角
{"title":"Is Twitter Enough? Investigating Situational Awareness in Social and Print Media during the Second COVID-19 Wave in India","authors":"Ishita Vohra, Meher Shashwat Nigam, Aryan Sakaria, Amey Kudari, N. Rangaswamy","doi":"10.1109/ASONAM55673.2022.10068667","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068667","url":null,"abstract":"The COVID-19 pandemic required efficient allocation of public resources and transforming existing ways of societal functions. To manage any crisis, governments and public health researchers ex-ploit the information available to them in order to make informed decisions, also defined as situational awareness. Gathering situational awareness using so-cial media, has been functional to manage epidemics. Previous research focused on using discussions during periods of epidemic crises on social media platforms like Twitter, Reddit, or Facebook and developing NLP techniques to filter out important/relevant discussions from a huge corpus of messages and posts. Social media usage varies with internet penetration and other socio-economic factors, which might induce disparity in an-alyzing discussions across different geographies. How-ever, print media is a ubiquitous information source, irrespective of geography. Further, topics discussed in news articles are already ‘newsworthy’, while on social media ‘newsworthiness' is a product of techno-social processes. Developing this fundamental difference, we study Twitter data during the second wave in India focused on six high-population cities with varied macro-economic factors. Through a mixture of qualitative and quantitative methods, we further analyze two Indian newspapers during the same period and compare topics from both Twitter and the newspapers to evaluate sit-uational awareness around the second phase of COVID on each of these platforms. We conclude that factors like internet penetration and GDP in a specific city influence the discourse surrounding situational updates on social media. Thus, augmenting information from newspapers to information extracted from social media would provide a more comprehensive perspective in resource-deficit cities","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132234550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
#WashTheHate: Understanding the Prevalence of Anti-Asian Prejudice on Twitter During the COVID-19 Pandemic #洗涤仇恨:了解2019冠状病毒病大流行期间推特上反亚洲偏见的盛行
Brittany Wheeler, Seong Jung, M. Barioni, Monika Purohit, Deborah L. Hall, Yasin N. Silva
Prejudice and hate directed toward Asian individuals has increased in prevalence and salience during the COVID-19 pandemic, with notable rises in physical violence. Concurrently, as many governments enacted stay-at-home mandates, the spread of anti-Asian content increased in online spaces, including social media. In the present study, we investigated temporal and geographical patterns in social media content relevant to anti-Asian prejudice during the COVID-19 pandemic. Using the Twitter Data Collection API, we queried over 13 million tweets posted between January 30, 2020, and April 30, 2021, for both negative (e.g., #kungflu) and positive (e.g., #stopAAPIhate) hashtags and keywords related to anti-Asian prejudice. In a series of descriptive analyses, we found differences in the frequency of negative and positive keywords based on geographic location. Using burst detection, we also identified distinct increases in negative and positive content in relation to key political tweets and events. These largely exploratory analyses shed light on the role of social media in the expression and proliferation of prejudice as well as positive responses online.
在2019冠状病毒病大流行期间,针对亚洲人的偏见和仇恨在流行和突出程度上有所增加,身体暴力也明显增加。与此同时,随着许多政府颁布居家令,反亚洲内容在网络空间(包括社交媒体)的传播有所增加。在本研究中,我们调查了COVID-19大流行期间与反亚洲偏见相关的社交媒体内容的时间和地理模式。使用推特数据收集API,我们查询了2020年1月30日至2021年4月30日期间发布的1300多万条推文,包括负面(例如,#kungflu)和正面(例如,#stopAAPIhate)标签以及与反亚洲偏见相关的关键词。在一系列描述性分析中,我们发现基于地理位置的消极和积极关键词的频率存在差异。使用突发检测,我们还发现了与关键政治推文和事件相关的负面和正面内容的明显增加。这些主要是探索性的分析揭示了社交媒体在偏见的表达和扩散以及在线积极回应方面的作用。
{"title":"#WashTheHate: Understanding the Prevalence of Anti-Asian Prejudice on Twitter During the COVID-19 Pandemic","authors":"Brittany Wheeler, Seong Jung, M. Barioni, Monika Purohit, Deborah L. Hall, Yasin N. Silva","doi":"10.1109/ASONAM55673.2022.10068578","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068578","url":null,"abstract":"Prejudice and hate directed toward Asian individuals has increased in prevalence and salience during the COVID-19 pandemic, with notable rises in physical violence. Concurrently, as many governments enacted stay-at-home mandates, the spread of anti-Asian content increased in online spaces, including social media. In the present study, we investigated temporal and geographical patterns in social media content relevant to anti-Asian prejudice during the COVID-19 pandemic. Using the Twitter Data Collection API, we queried over 13 million tweets posted between January 30, 2020, and April 30, 2021, for both negative (e.g., #kungflu) and positive (e.g., #stopAAPIhate) hashtags and keywords related to anti-Asian prejudice. In a series of descriptive analyses, we found differences in the frequency of negative and positive keywords based on geographic location. Using burst detection, we also identified distinct increases in negative and positive content in relation to key political tweets and events. These largely exploratory analyses shed light on the role of social media in the expression and proliferation of prejudice as well as positive responses online.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134138592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MSNDS 2022: Organizing Committee MSNDS 2022:组委会
{"title":"MSNDS 2022: Organizing Committee","authors":"","doi":"10.1109/asonam55673.2022.10068603","DOIUrl":"https://doi.org/10.1109/asonam55673.2022.10068603","url":null,"abstract":"","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134224411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Network Analysis on Interpretable Compressed Sparse Networks 可解释压缩稀疏网络的社会网络分析
Connor C. J. Hryhoruk, C. Leung
Big data are everywhere. World Wide Web is an example of these big data. It has become a vast data production and consumption platform, at which threads of data evolve from multiple devices, by different human interactions, over worldwide locations, under divergent distributed settings. Embedded in these big web data is implicit, previously unknown and potentially useful information and knowledge that awaited to be discovered. This calls for web intelligence solutions, which make good use of data science and data mining (especially, web mining or social network mining) to discover useful knowledge and important information from the web. As a web mining task, web structure mining aims to examine incoming and outgoing links on web pages and make recommendations of frequently referenced web pages to web surfers. As another web mining task, web usage mining aims to examine web surfer patterns and make recommendations of frequently visited pages to web surfers. While the size of the web is huge, the connection among all web pages may be sparse. In other words, the number of vertex nodes (i.e., web pages) on the web is huge, the number of directed edges (i.e., incoming and outgoing hyperlinks between web pages) may be small. This leads to a sparse web. In this paper, we present a solution for interpretable mining of frequent patterns from sparse web. In particular, we represent web structure and usage information by bitmaps to capture connections to web pages. Due to the sparsity of the web, we compress the bitmaps, and use them in mining influential patterns (e.g., popular web pages). For explainability of the mining process, we ensure the compressed bitmaps are interpretable. Evaluation on real-life web data demonstrates the effectiveness, interpretability and practicality of our solution for interpretable mining of influential patterns from sparse web.
大数据无处不在。万维网就是这些大数据的一个例子。它已经成为一个庞大的数据生产和消费平台,在这个平台上,通过不同的人类互动,在世界各地、在不同的分布式环境下,数据线程从多个设备演变而来。在这些庞大的网络数据中,隐藏着以前未知的、潜在有用的信息和知识,等待着人们去发现。这就需要网络智能解决方案,它可以很好地利用数据科学和数据挖掘(特别是web挖掘或社交网络挖掘)从网络中发现有用的知识和重要的信息。web结构挖掘是一种web挖掘任务,其目的是检测网页上的输入和输出链接,并向浏览者推荐经常被引用的网页。作为另一项网络挖掘任务,网络使用挖掘的目的是研究网络冲浪者的模式,并向网络冲浪者推荐频繁访问的页面。虽然网络的规模是巨大的,但所有网页之间的连接可能是稀疏的。换句话说,网络上的顶点节点(即网页)的数量是巨大的,而有向边(即网页之间传入和传出的超链接)的数量可能很小。这就形成了一个稀疏的网。本文提出了一种稀疏网络中频繁模式可解释挖掘的解决方案。特别是,我们通过位图来表示web结构和使用信息,以捕获到web页面的连接。由于网络的稀疏性,我们压缩位图,并使用它们来挖掘有影响力的模式(例如,流行的网页)。为了挖掘过程的可解释性,我们确保压缩位图是可解释的。对真实网络数据的评估证明了我们的解决方案的有效性、可解释性和实用性,用于从稀疏网络中可解释地挖掘有影响的模式。
{"title":"Social Network Analysis on Interpretable Compressed Sparse Networks","authors":"Connor C. J. Hryhoruk, C. Leung","doi":"10.1109/ASONAM55673.2022.10068716","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068716","url":null,"abstract":"Big data are everywhere. World Wide Web is an example of these big data. It has become a vast data production and consumption platform, at which threads of data evolve from multiple devices, by different human interactions, over worldwide locations, under divergent distributed settings. Embedded in these big web data is implicit, previously unknown and potentially useful information and knowledge that awaited to be discovered. This calls for web intelligence solutions, which make good use of data science and data mining (especially, web mining or social network mining) to discover useful knowledge and important information from the web. As a web mining task, web structure mining aims to examine incoming and outgoing links on web pages and make recommendations of frequently referenced web pages to web surfers. As another web mining task, web usage mining aims to examine web surfer patterns and make recommendations of frequently visited pages to web surfers. While the size of the web is huge, the connection among all web pages may be sparse. In other words, the number of vertex nodes (i.e., web pages) on the web is huge, the number of directed edges (i.e., incoming and outgoing hyperlinks between web pages) may be small. This leads to a sparse web. In this paper, we present a solution for interpretable mining of frequent patterns from sparse web. In particular, we represent web structure and usage information by bitmaps to capture connections to web pages. Due to the sparsity of the web, we compress the bitmaps, and use them in mining influential patterns (e.g., popular web pages). For explainability of the mining process, we ensure the compressed bitmaps are interpretable. Evaluation on real-life web data demonstrates the effectiveness, interpretability and practicality of our solution for interpretable mining of influential patterns from sparse web.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Customer Lifetime Value Prediction with K-means Clustering and XGBoost 基于k均值聚类和XGBoost的客户终身价值预测
Marius Myburg, S. Berman
Customer lifetime value (CLV) is the revenue expected from a customer over a given time period. CLV customer segmentation is used in marketing, resource management and business strategy. Practically, it is customer segmentation rather than revenue, and a specific timeframe rather than entire lifetimes, that is of interest. A long-standing method of CLV segmentation involves using a variant of the RFM model - an approach based on Recency, Frequency and Monetary value of past purchases. RFM is popular due to its simplicity and understandability, but it is not without its pitfalls. In this work, XGBoost and K-means clustering were used to address problems with the RFM approach: determining relative weightings of the three variables, choice of CLV segmentation method, and ability to predict future CLV segments based on current data. The system was able to predict CLV, loyalty and marketability segments with 77-78% accuracy for the immediate future, and 74-75% accuracy for the longer term. Experimentation also showed that using RFM alone is sufficient, as augmenting the features with additional purchase data did not improve results.
客户生命周期价值(CLV)是在给定时间段内期望从客户获得的收入。CLV客户细分应用于市场营销、资源管理和商业战略。实际上,我们感兴趣的是客户细分而不是收入,是特定的时间框架而不是整个生命周期。长期存在的CLV分割方法包括使用RFM模型的一种变体——一种基于最近、频率和过去购买的货币价值的方法。RFM因其简单性和可理解性而广受欢迎,但它并非没有缺陷。在这项工作中,使用XGBoost和K-means聚类来解决RFM方法的问题:确定三个变量的相对权重,选择CLV分割方法,以及基于当前数据预测未来CLV分割的能力。该系统能够预测CLV、忠诚度和市场细分,近期的准确率为77-78%,长期的准确率为74-75%。实验还表明,单独使用RFM就足够了,因为用额外的购买数据增加特征并不能改善结果。
{"title":"Customer Lifetime Value Prediction with K-means Clustering and XGBoost","authors":"Marius Myburg, S. Berman","doi":"10.1109/ASONAM55673.2022.10068602","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068602","url":null,"abstract":"Customer lifetime value (CLV) is the revenue expected from a customer over a given time period. CLV customer segmentation is used in marketing, resource management and business strategy. Practically, it is customer segmentation rather than revenue, and a specific timeframe rather than entire lifetimes, that is of interest. A long-standing method of CLV segmentation involves using a variant of the RFM model - an approach based on Recency, Frequency and Monetary value of past purchases. RFM is popular due to its simplicity and understandability, but it is not without its pitfalls. In this work, XGBoost and K-means clustering were used to address problems with the RFM approach: determining relative weightings of the three variables, choice of CLV segmentation method, and ability to predict future CLV segments based on current data. The system was able to predict CLV, loyalty and marketability segments with 77-78% accuracy for the immediate future, and 74-75% accuracy for the longer term. Experimentation also showed that using RFM alone is sufficient, as augmenting the features with additional purchase data did not improve results.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126900659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Mechanism indicating Item Novelty for Sequential Recommendation 顺序推荐中指示项目新颖性的注意机制
Li-Chia Wang, Hao-Shang Ma, Jen-Wei Huang
Most sequential recommendation systems, including those that employ a variety of features and state-of-the-art network models, tend to favor items that are the most popular or of greatest relevance to the historic behavior of the user. Recommendations made under these conditions tend to be repetitive; i.e., many options that might be of interest to users are entirely disregarded. This paper presents a novel algorithm that assigns a novelty score to potential recommendation items. We also present an architecture by which to incorporate this functionality in existing recommendation systems. In experiments, the proposed NASM system outperformed state-of-the-art sequential recommender systems, thereby verifying that the inclusion of novelty score can indeed improve recommendation performance.
大多数顺序推荐系统,包括那些采用各种功能和最先进的网络模型的系统,倾向于支持最受欢迎或与用户历史行为最相关的项目。在这种情况下提出的建议往往是重复的;也就是说,许多用户可能感兴趣的选项完全被忽略了。本文提出了一种新的算法,为潜在的推荐项目分配新颖性分数。我们还提出了一个架构,通过该架构可以将此功能整合到现有的推荐系统中。在实验中,所提出的NASM系统优于最先进的顺序推荐系统,从而验证了包含新颖性评分确实可以提高推荐性能。
{"title":"Attention Mechanism indicating Item Novelty for Sequential Recommendation","authors":"Li-Chia Wang, Hao-Shang Ma, Jen-Wei Huang","doi":"10.1109/ASONAM55673.2022.10068599","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068599","url":null,"abstract":"Most sequential recommendation systems, including those that employ a variety of features and state-of-the-art network models, tend to favor items that are the most popular or of greatest relevance to the historic behavior of the user. Recommendations made under these conditions tend to be repetitive; i.e., many options that might be of interest to users are entirely disregarded. This paper presents a novel algorithm that assigns a novelty score to potential recommendation items. We also present an architecture by which to incorporate this functionality in existing recommendation systems. In experiments, the proposed NASM system outperformed state-of-the-art sequential recommender systems, thereby verifying that the inclusion of novelty score can indeed improve recommendation performance.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120994487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FOSINT-SI 2022 Symposium Organizing Committee FOSINT-SI 2022研讨会组委会
R. Alhajj
{"title":"FOSINT-SI 2022 Symposium Organizing Committee","authors":"R. Alhajj","doi":"10.1109/asonam.2014.6921537","DOIUrl":"https://doi.org/10.1109/asonam.2014.6921537","url":null,"abstract":"","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1