首页 > 最新文献

2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)最新文献

英文 中文
Classes versus Communities: Outlier Detection and Removal in Tabular Datasets via Social Network Analysis (ClaCO) 类与社区:通过社会网络分析(ClaCO)在表格数据集中检测和去除异常值
Serkan Üçer, Tansel Özyer, R. Alhajj
In this research, we introduce a model to detect inconsistent & anomalous samples in tabular labeled datasets which are used in machine learning classification tasks, frequently. Our model, abbreviated as the ClaCO (Classes vs. Communities: SNA for Outlier Detection), first converts tabular data with labels into an attributed and labeled undirected network graph. Following the enrichment of the graph, it analyses the edge structure of the individual egonets, in terms of the class and community belongings, by introducing a new SNA metric named as ‘the Consistency Score of a Node - CSoN’. Through an exhaustive analysis of the ego network of a node, CSoN tries to exhibit consistency of a node by examining the similarity of its immediate neighbors in terms of shared class and/or shared community belongings. To prove the efficiency of the proposed ClaCO, we employed it as a subsidiary method for detecting anomalous samples in the train part in the traditional ML classification task. With the help of this new consistency score, the least CSoN scored set of nodes flagged as outliers and removed from the training dataset, and remaining part fed into the ML model to see the effect on classification performance with the ‘whole’ dataset through competing outlier detection methods. We have shown this outlier detection model as an efficient method since it improves classification performance both on the whole dataset and reduced datasets with competing outlier detection methods, over several known both real-life and synthetic datasets.
在本研究中,我们引入了一个模型来检测机器学习分类任务中经常使用的表格标记数据集中的不一致和异常样本。我们的模型,缩写为ClaCO (Classes vs. Communities: SNA for Outlier Detection),首先将带有标签的表格数据转换为带有属性和标记的无向网络图。在图的丰富之后,它通过引入一个新的SNA度量,称为“节点的一致性得分- CSoN”,从类和社区财产的角度分析了个体自我的边缘结构。通过对节点自我网络的详尽分析,CSoN试图通过检查其近邻在共享类和/或共享社区财产方面的相似性来展示节点的一致性。为了证明ClaCO的有效性,我们将其作为传统ML分类任务中训练部分异常样本检测的辅助方法。在这个新的一致性评分的帮助下,CSoN得分最低的节点集被标记为离群值并从训练数据集中删除,其余部分输入ML模型,通过竞争的离群值检测方法查看对“整个”数据集分类性能的影响。我们已经证明了这种离群值检测模型是一种有效的方法,因为它在几个已知的真实数据集和合成数据集上,通过竞争的离群值检测方法,提高了整个数据集和简化数据集的分类性能。
{"title":"Classes versus Communities: Outlier Detection and Removal in Tabular Datasets via Social Network Analysis (ClaCO)","authors":"Serkan Üçer, Tansel Özyer, R. Alhajj","doi":"10.1109/ASONAM55673.2022.10068694","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068694","url":null,"abstract":"In this research, we introduce a model to detect inconsistent & anomalous samples in tabular labeled datasets which are used in machine learning classification tasks, frequently. Our model, abbreviated as the ClaCO (Classes vs. Communities: SNA for Outlier Detection), first converts tabular data with labels into an attributed and labeled undirected network graph. Following the enrichment of the graph, it analyses the edge structure of the individual egonets, in terms of the class and community belongings, by introducing a new SNA metric named as ‘the Consistency Score of a Node - CSoN’. Through an exhaustive analysis of the ego network of a node, CSoN tries to exhibit consistency of a node by examining the similarity of its immediate neighbors in terms of shared class and/or shared community belongings. To prove the efficiency of the proposed ClaCO, we employed it as a subsidiary method for detecting anomalous samples in the train part in the traditional ML classification task. With the help of this new consistency score, the least CSoN scored set of nodes flagged as outliers and removed from the training dataset, and remaining part fed into the ML model to see the effect on classification performance with the ‘whole’ dataset through competing outlier detection methods. We have shown this outlier detection model as an efficient method since it improves classification performance both on the whole dataset and reduced datasets with competing outlier detection methods, over several known both real-life and synthetic datasets.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124028485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Faster Greedy Optimization of Resistance-based Graph Robustness 基于阻力的图鲁棒性更快贪婪优化
Maria Predari, R. Kooij, Henning Meyerhenke
The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider the optimization problem of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i. e., is most robust). The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in established generic greedy heuristics. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process large graphs for which the application of the state-of-the-art greedy approach was infeasible before. As far as we know, we are the first to be able to process graphs with $100K+$ nodes in the order of minutes.
总有效阻力,也称为基尔霍夫指数,为图形提供了稳健性度量。我们考虑将$k$新边添加到$G$的优化问题,使结果图具有最小的总有效阻力(即最鲁棒)。总有效电阻和节点之间的有效电阻可以用图拉普拉斯的伪逆来计算。伪逆可以通过伪反演显式计算;然而,这在实践中需要三次时间和二次空间。我们利用组合和代数连接来加快已建立的泛型贪婪启发式算法的增益计算。此外,我们利用现有的随机化技术,通过引入子采样步骤来提高我们的方法的性能。我们不同的基于图和矩阵的方法确实比最先进的贪心算法快得多,而它们的质量仍然相当高,而且通常非常接近。我们的实验表明,我们现在可以处理大型图,而最先进的贪心方法在以前是不可行的。据我们所知,我们是第一个能够在几分钟内处理$100K+$ $节点的图的人。
{"title":"Faster Greedy Optimization of Resistance-based Graph Robustness","authors":"Maria Predari, R. Kooij, Henning Meyerhenke","doi":"10.1109/ASONAM55673.2022.10068613","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068613","url":null,"abstract":"The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider the optimization problem of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i. e., is most robust). The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in established generic greedy heuristics. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process large graphs for which the application of the state-of-the-art greedy approach was infeasible before. As far as we know, we are the first to be able to process graphs with $100K+$ nodes in the order of minutes.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Machine Learning Approach to Identify Toxic Language in the Online Space 识别在线空间中有毒语言的机器学习方法
Lisa Kaati, A. Shrestha, N. Akrami
In this study, we trained three machine learning models to detect toxic language on social media. These models were trained using data from diverse sources to ensure that the models have a broad understanding of toxic language. Next, we evaluate the performance of our models on a dataset with samples of data from a large number of diverse online forums. The test dataset was annotated by three independent annotators. We also compared the performance of our models with Perspective API - a toxic language detection model created by Jigsaw and Google's Counter Abuse Technology team. The results showed that our classification models performed well on data from the domains they were trained on (Fl = 0.91, 0.91, & 0.84, for the RoBERTa, BERT, & SVM respectively), but the performance decreased when they were tested on annotated data from new domains (Fl = 0.80, 0.61, 0.49, & 0.77, for the RoBERTa, BERT, SVM, & Google perspective, respectively). Finally, we used the best-performing model on the test data (RoBERTa, ROC = 0.86) to examine the frequency (/proportion) of toxic language in 21 diverse forums. The results of these analyses showed that forums for general discussions with moderation (e.g., Alternate history) had much lower proportions of toxic language compared to those with minimal moderation (e.g., 8Kun). Although highlighting the complexity of detecting toxic language, our results show that model performance can be improved by using a diverse dataset when building new models. We conclude by discussing the implication of our findings and some directions for future research.
在这项研究中,我们训练了三个机器学习模型来检测社交媒体上的有毒语言。这些模型使用来自不同来源的数据进行训练,以确保模型对有毒语言有广泛的理解。接下来,我们使用来自大量不同在线论坛的数据样本来评估模型在数据集上的性能。测试数据集由三个独立的注释器进行注释。我们还将模型的性能与Perspective API(由Jigsaw和Google的反滥用技术团队创建的有毒语言检测模型)进行了比较。结果表明,我们的分类模型在训练域的数据上表现良好(分别为RoBERTa、BERT和SVM的Fl = 0.91、0.91和0.84),但在新域的注释数据上进行测试时性能下降(分别为RoBERTa、BERT、SVM和Google的Fl = 0.80、0.61、0.49和0.77)。最后,我们使用测试数据上表现最好的模型(RoBERTa, ROC = 0.86)来检查21个不同论坛中有毒语言的频率(/比例)。这些分析的结果表明,适度的一般性讨论论坛(例如,Alternate history)与适度程度最低的论坛(例如8Kun)相比,有毒语言的比例要低得多。虽然强调了检测有毒语言的复杂性,但我们的结果表明,在构建新模型时,可以通过使用不同的数据集来提高模型的性能。最后讨论了本研究的意义和未来的研究方向。
{"title":"A Machine Learning Approach to Identify Toxic Language in the Online Space","authors":"Lisa Kaati, A. Shrestha, N. Akrami","doi":"10.1109/ASONAM55673.2022.10068619","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068619","url":null,"abstract":"In this study, we trained three machine learning models to detect toxic language on social media. These models were trained using data from diverse sources to ensure that the models have a broad understanding of toxic language. Next, we evaluate the performance of our models on a dataset with samples of data from a large number of diverse online forums. The test dataset was annotated by three independent annotators. We also compared the performance of our models with Perspective API - a toxic language detection model created by Jigsaw and Google's Counter Abuse Technology team. The results showed that our classification models performed well on data from the domains they were trained on (Fl = 0.91, 0.91, & 0.84, for the RoBERTa, BERT, & SVM respectively), but the performance decreased when they were tested on annotated data from new domains (Fl = 0.80, 0.61, 0.49, & 0.77, for the RoBERTa, BERT, SVM, & Google perspective, respectively). Finally, we used the best-performing model on the test data (RoBERTa, ROC = 0.86) to examine the frequency (/proportion) of toxic language in 21 diverse forums. The results of these analyses showed that forums for general discussions with moderation (e.g., Alternate history) had much lower proportions of toxic language compared to those with minimal moderation (e.g., 8Kun). Although highlighting the complexity of detecting toxic language, our results show that model performance can be improved by using a diverse dataset when building new models. We conclude by discussing the implication of our findings and some directions for future research.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128523453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IKEA: Unsupervised domain-specific keyword-expansion 宜家:无监督的特定领域关键字扩展
Joobin Gharibshah, Jakapun Tachaiya, Arman Irani, E. Papalexakis, M. Faloutsos
How can we expand an initial set of keywords with a target domain in mind? A possible application is to use the expanded set of words to search for specific information within the domain of interest. Here, we focus on online forums and specifically security forums. We propose IKEA, an iterative embedding-based approach to expand a set of keywords with a domain in mind. The novelty of our approach is three-fold: (a) we use two similarity expansions in the word-word and post-post spaces, (b) we use an iterative approach in each of these expansions, and (c) we provide a flexible ranking of the identified words to meet the user needs. We evaluate our method with data from three security forums that span five years of activity and the widely-used Fire benchmark. IKEA outperforms previous solutions by identifying more relevant keywords: it exhibits more than 0.82 MAP and 0.85 NDCG in a wide range of initial keyword sets. We see our approach as an essential building block in developing methods for harnessing the wealth of information available in online forums.
我们如何扩展目标域的初始关键字集?一个可能的应用是使用扩展的词集来搜索感兴趣领域内的特定信息。在这里,我们关注在线论坛,特别是安全论坛。我们提出IKEA,这是一种基于迭代嵌入的方法,用于扩展一组具有特定领域的关键字。我们方法的新颖之处在于三个方面:(a)我们在word-word和post-post空间中使用了两个相似展开,(b)我们在每个扩展中使用了迭代方法,以及(c)我们提供了识别词的灵活排名以满足用户需求。我们使用来自三个安全论坛的数据来评估我们的方法,这些数据跨越了五年的活动和广泛使用的Fire基准。宜家通过识别更多相关的关键字优于以前的解决方案:在广泛的初始关键字集中,它展示了超过0.82 MAP和0.85 NDCG。我们认为,我们的方法是开发利用在线论坛中提供的丰富信息的方法的重要组成部分。
{"title":"IKEA: Unsupervised domain-specific keyword-expansion","authors":"Joobin Gharibshah, Jakapun Tachaiya, Arman Irani, E. Papalexakis, M. Faloutsos","doi":"10.1109/ASONAM55673.2022.10068656","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068656","url":null,"abstract":"How can we expand an initial set of keywords with a target domain in mind? A possible application is to use the expanded set of words to search for specific information within the domain of interest. Here, we focus on online forums and specifically security forums. We propose IKEA, an iterative embedding-based approach to expand a set of keywords with a domain in mind. The novelty of our approach is three-fold: (a) we use two similarity expansions in the word-word and post-post spaces, (b) we use an iterative approach in each of these expansions, and (c) we provide a flexible ranking of the identified words to meet the user needs. We evaluate our method with data from three security forums that span five years of activity and the widely-used Fire benchmark. IKEA outperforms previous solutions by identifying more relevant keywords: it exhibits more than 0.82 MAP and 0.85 NDCG in a wide range of initial keyword sets. We see our approach as an essential building block in developing methods for harnessing the wealth of information available in online forums.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130001663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Depression and Anxiety on Reddit: a Multi-task Learning Approach 预测Reddit上的抑郁和焦虑:多任务学习方法
Shailik Sarkar, Abdulaziz Alhamadani, Lulwah Alkulaib, Chang-Tien Lu
One of the strongest indicators of a mental health crisis is how people interact with each other or express them-selves. Hence, social media is an ideal source to extract user-level information about the language used to express personal feelings. In the wake of the ever-increasing mental health crisis in the United States, it is imperative to analyze the general well-being of a population and investigate how their public social media posts can be used to detect different underlying mental health conditions. For that purpose, we propose a study that collects posts from “reddits” related to different mental health topics to detect the type of the post and the nature of the mental health issues that correlate to the post. The task of detecting mental health related issues indicates the mental health conditions connected to the posts. To achieve this, we develop a multi-task learning model that leverages, for each post, both the latent embedding space of words and topics for prediction with a message passing mechanism enabling the sharing of information for related tasks. We train the model through an active learning approach in order to tackle the lack of standardized fine-grained label data for this specific task.
心理健康危机最有力的指标之一是人们如何与他人互动或表达自己。因此,社交媒体是提取用于表达个人情感的语言的用户级信息的理想来源。在美国日益严重的心理健康危机之后,有必要分析人口的总体健康状况,并研究如何利用他们的公共社交媒体帖子来检测不同的潜在心理健康状况。为此,我们提出了一项研究,从“reddit”上收集与不同心理健康主题相关的帖子,以检测帖子的类型以及与帖子相关的心理健康问题的性质。发现与心理健康有关的问题的任务表明与这些岗位有关的心理健康状况。为了实现这一目标,我们开发了一个多任务学习模型,该模型利用每个帖子的潜在嵌入空间和主题进行预测,并通过消息传递机制实现相关任务的信息共享。我们通过主动学习方法训练模型,以解决缺乏标准化细粒度标签数据的特定任务。
{"title":"Predicting Depression and Anxiety on Reddit: a Multi-task Learning Approach","authors":"Shailik Sarkar, Abdulaziz Alhamadani, Lulwah Alkulaib, Chang-Tien Lu","doi":"10.1109/ASONAM55673.2022.10068655","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068655","url":null,"abstract":"One of the strongest indicators of a mental health crisis is how people interact with each other or express them-selves. Hence, social media is an ideal source to extract user-level information about the language used to express personal feelings. In the wake of the ever-increasing mental health crisis in the United States, it is imperative to analyze the general well-being of a population and investigate how their public social media posts can be used to detect different underlying mental health conditions. For that purpose, we propose a study that collects posts from “reddits” related to different mental health topics to detect the type of the post and the nature of the mental health issues that correlate to the post. The task of detecting mental health related issues indicates the mental health conditions connected to the posts. To achieve this, we develop a multi-task learning model that leverages, for each post, both the latent embedding space of words and topics for prediction with a message passing mechanism enabling the sharing of information for related tasks. We train the model through an active learning approach in order to tackle the lack of standardized fine-grained label data for this specific task.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130847519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Understanding the Impact of Culture in Assessing Helpfulness of Online Reviews 了解文化对评估在线评论有用性的影响
Khaled Alanezi, Nuha Albadi, Omar Hammad, Maram Kurdi, Shivakant Mishra
Online reviews have become essential for users to make informed decisions in everyday tasks ranging from planning summer vacations to purchasing groceries and making financial investments. A key problem in using online reviews is the overabundance of online that overwhelms the users. As a result, recommendation systems for providing helpfulness of reviews are being developed. This paper argues that cultural background is an important feature that impacts the nature of a review written by the user, and must be considered as a feature in assessing the helpfulness of online reviews. The paper provides an in-depth study of differences in online reviews written by users from different cultural backgrounds and how incorporating culture as a feature can lead to better review helpfulness recommendations. In particular, we analyze online reviews originating from two distinct cultural spheres, namely Arabic and Western cultures, for two different products, hotels and books. Our analysis demonstrates that the nature of reviews written by users differs based on their cultural backgrounds and that this difference varies based on the specific product being reviewed. Finally, we have developed six different review helpfulness recommendation models that demonstrate that taking culture into account leads to better recommendations.
从计划暑假到购买杂货,再到进行金融投资,在线评论已经成为用户在日常事务中做出明智决策的必要条件。使用在线评论的一个关键问题是过多的在线评论淹没了用户。因此,提供有用评论的推荐系统正在开发中。本文认为,文化背景是影响用户评论性质的一个重要特征,在评估在线评论的有用性时必须将其作为一个特征来考虑。本文深入研究了来自不同文化背景的用户所写的在线评论的差异,以及将文化作为一种特征如何导致更好的评论有用的建议。我们特别分析了两种不同文化领域的在线评论,即阿拉伯文化和西方文化,针对两种不同的产品,酒店和书籍。我们的分析表明,用户所写评论的性质因其文化背景而异,而这种差异又因被评论的具体产品而异。最后,我们开发了六种不同的评论有用推荐模型,证明将文化考虑在内会带来更好的推荐。
{"title":"Understanding the Impact of Culture in Assessing Helpfulness of Online Reviews","authors":"Khaled Alanezi, Nuha Albadi, Omar Hammad, Maram Kurdi, Shivakant Mishra","doi":"10.1109/ASONAM55673.2022.10068664","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068664","url":null,"abstract":"Online reviews have become essential for users to make informed decisions in everyday tasks ranging from planning summer vacations to purchasing groceries and making financial investments. A key problem in using online reviews is the overabundance of online that overwhelms the users. As a result, recommendation systems for providing helpfulness of reviews are being developed. This paper argues that cultural background is an important feature that impacts the nature of a review written by the user, and must be considered as a feature in assessing the helpfulness of online reviews. The paper provides an in-depth study of differences in online reviews written by users from different cultural backgrounds and how incorporating culture as a feature can lead to better review helpfulness recommendations. In particular, we analyze online reviews originating from two distinct cultural spheres, namely Arabic and Western cultures, for two different products, hotels and books. Our analysis demonstrates that the nature of reviews written by users differs based on their cultural backgrounds and that this difference varies based on the specific product being reviewed. Finally, we have developed six different review helpfulness recommendation models that demonstrate that taking culture into account leads to better recommendations.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129321133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Network Analysis on Interpretable Compressed Sparse Networks 可解释压缩稀疏网络的社会网络分析
Connor C. J. Hryhoruk, C. Leung
Big data are everywhere. World Wide Web is an example of these big data. It has become a vast data production and consumption platform, at which threads of data evolve from multiple devices, by different human interactions, over worldwide locations, under divergent distributed settings. Embedded in these big web data is implicit, previously unknown and potentially useful information and knowledge that awaited to be discovered. This calls for web intelligence solutions, which make good use of data science and data mining (especially, web mining or social network mining) to discover useful knowledge and important information from the web. As a web mining task, web structure mining aims to examine incoming and outgoing links on web pages and make recommendations of frequently referenced web pages to web surfers. As another web mining task, web usage mining aims to examine web surfer patterns and make recommendations of frequently visited pages to web surfers. While the size of the web is huge, the connection among all web pages may be sparse. In other words, the number of vertex nodes (i.e., web pages) on the web is huge, the number of directed edges (i.e., incoming and outgoing hyperlinks between web pages) may be small. This leads to a sparse web. In this paper, we present a solution for interpretable mining of frequent patterns from sparse web. In particular, we represent web structure and usage information by bitmaps to capture connections to web pages. Due to the sparsity of the web, we compress the bitmaps, and use them in mining influential patterns (e.g., popular web pages). For explainability of the mining process, we ensure the compressed bitmaps are interpretable. Evaluation on real-life web data demonstrates the effectiveness, interpretability and practicality of our solution for interpretable mining of influential patterns from sparse web.
大数据无处不在。万维网就是这些大数据的一个例子。它已经成为一个庞大的数据生产和消费平台,在这个平台上,通过不同的人类互动,在世界各地、在不同的分布式环境下,数据线程从多个设备演变而来。在这些庞大的网络数据中,隐藏着以前未知的、潜在有用的信息和知识,等待着人们去发现。这就需要网络智能解决方案,它可以很好地利用数据科学和数据挖掘(特别是web挖掘或社交网络挖掘)从网络中发现有用的知识和重要的信息。web结构挖掘是一种web挖掘任务,其目的是检测网页上的输入和输出链接,并向浏览者推荐经常被引用的网页。作为另一项网络挖掘任务,网络使用挖掘的目的是研究网络冲浪者的模式,并向网络冲浪者推荐频繁访问的页面。虽然网络的规模是巨大的,但所有网页之间的连接可能是稀疏的。换句话说,网络上的顶点节点(即网页)的数量是巨大的,而有向边(即网页之间传入和传出的超链接)的数量可能很小。这就形成了一个稀疏的网。本文提出了一种稀疏网络中频繁模式可解释挖掘的解决方案。特别是,我们通过位图来表示web结构和使用信息,以捕获到web页面的连接。由于网络的稀疏性,我们压缩位图,并使用它们来挖掘有影响力的模式(例如,流行的网页)。为了挖掘过程的可解释性,我们确保压缩位图是可解释的。对真实网络数据的评估证明了我们的解决方案的有效性、可解释性和实用性,用于从稀疏网络中可解释地挖掘有影响的模式。
{"title":"Social Network Analysis on Interpretable Compressed Sparse Networks","authors":"Connor C. J. Hryhoruk, C. Leung","doi":"10.1109/ASONAM55673.2022.10068716","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068716","url":null,"abstract":"Big data are everywhere. World Wide Web is an example of these big data. It has become a vast data production and consumption platform, at which threads of data evolve from multiple devices, by different human interactions, over worldwide locations, under divergent distributed settings. Embedded in these big web data is implicit, previously unknown and potentially useful information and knowledge that awaited to be discovered. This calls for web intelligence solutions, which make good use of data science and data mining (especially, web mining or social network mining) to discover useful knowledge and important information from the web. As a web mining task, web structure mining aims to examine incoming and outgoing links on web pages and make recommendations of frequently referenced web pages to web surfers. As another web mining task, web usage mining aims to examine web surfer patterns and make recommendations of frequently visited pages to web surfers. While the size of the web is huge, the connection among all web pages may be sparse. In other words, the number of vertex nodes (i.e., web pages) on the web is huge, the number of directed edges (i.e., incoming and outgoing hyperlinks between web pages) may be small. This leads to a sparse web. In this paper, we present a solution for interpretable mining of frequent patterns from sparse web. In particular, we represent web structure and usage information by bitmaps to capture connections to web pages. Due to the sparsity of the web, we compress the bitmaps, and use them in mining influential patterns (e.g., popular web pages). For explainability of the mining process, we ensure the compressed bitmaps are interpretable. Evaluation on real-life web data demonstrates the effectiveness, interpretability and practicality of our solution for interpretable mining of influential patterns from sparse web.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Customer Lifetime Value Prediction with K-means Clustering and XGBoost 基于k均值聚类和XGBoost的客户终身价值预测
Marius Myburg, S. Berman
Customer lifetime value (CLV) is the revenue expected from a customer over a given time period. CLV customer segmentation is used in marketing, resource management and business strategy. Practically, it is customer segmentation rather than revenue, and a specific timeframe rather than entire lifetimes, that is of interest. A long-standing method of CLV segmentation involves using a variant of the RFM model - an approach based on Recency, Frequency and Monetary value of past purchases. RFM is popular due to its simplicity and understandability, but it is not without its pitfalls. In this work, XGBoost and K-means clustering were used to address problems with the RFM approach: determining relative weightings of the three variables, choice of CLV segmentation method, and ability to predict future CLV segments based on current data. The system was able to predict CLV, loyalty and marketability segments with 77-78% accuracy for the immediate future, and 74-75% accuracy for the longer term. Experimentation also showed that using RFM alone is sufficient, as augmenting the features with additional purchase data did not improve results.
客户生命周期价值(CLV)是在给定时间段内期望从客户获得的收入。CLV客户细分应用于市场营销、资源管理和商业战略。实际上,我们感兴趣的是客户细分而不是收入,是特定的时间框架而不是整个生命周期。长期存在的CLV分割方法包括使用RFM模型的一种变体——一种基于最近、频率和过去购买的货币价值的方法。RFM因其简单性和可理解性而广受欢迎,但它并非没有缺陷。在这项工作中,使用XGBoost和K-means聚类来解决RFM方法的问题:确定三个变量的相对权重,选择CLV分割方法,以及基于当前数据预测未来CLV分割的能力。该系统能够预测CLV、忠诚度和市场细分,近期的准确率为77-78%,长期的准确率为74-75%。实验还表明,单独使用RFM就足够了,因为用额外的购买数据增加特征并不能改善结果。
{"title":"Customer Lifetime Value Prediction with K-means Clustering and XGBoost","authors":"Marius Myburg, S. Berman","doi":"10.1109/ASONAM55673.2022.10068602","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068602","url":null,"abstract":"Customer lifetime value (CLV) is the revenue expected from a customer over a given time period. CLV customer segmentation is used in marketing, resource management and business strategy. Practically, it is customer segmentation rather than revenue, and a specific timeframe rather than entire lifetimes, that is of interest. A long-standing method of CLV segmentation involves using a variant of the RFM model - an approach based on Recency, Frequency and Monetary value of past purchases. RFM is popular due to its simplicity and understandability, but it is not without its pitfalls. In this work, XGBoost and K-means clustering were used to address problems with the RFM approach: determining relative weightings of the three variables, choice of CLV segmentation method, and ability to predict future CLV segments based on current data. The system was able to predict CLV, loyalty and marketability segments with 77-78% accuracy for the immediate future, and 74-75% accuracy for the longer term. Experimentation also showed that using RFM alone is sufficient, as augmenting the features with additional purchase data did not improve results.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126900659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Mechanism indicating Item Novelty for Sequential Recommendation 顺序推荐中指示项目新颖性的注意机制
Li-Chia Wang, Hao-Shang Ma, Jen-Wei Huang
Most sequential recommendation systems, including those that employ a variety of features and state-of-the-art network models, tend to favor items that are the most popular or of greatest relevance to the historic behavior of the user. Recommendations made under these conditions tend to be repetitive; i.e., many options that might be of interest to users are entirely disregarded. This paper presents a novel algorithm that assigns a novelty score to potential recommendation items. We also present an architecture by which to incorporate this functionality in existing recommendation systems. In experiments, the proposed NASM system outperformed state-of-the-art sequential recommender systems, thereby verifying that the inclusion of novelty score can indeed improve recommendation performance.
大多数顺序推荐系统,包括那些采用各种功能和最先进的网络模型的系统,倾向于支持最受欢迎或与用户历史行为最相关的项目。在这种情况下提出的建议往往是重复的;也就是说,许多用户可能感兴趣的选项完全被忽略了。本文提出了一种新的算法,为潜在的推荐项目分配新颖性分数。我们还提出了一个架构,通过该架构可以将此功能整合到现有的推荐系统中。在实验中,所提出的NASM系统优于最先进的顺序推荐系统,从而验证了包含新颖性评分确实可以提高推荐性能。
{"title":"Attention Mechanism indicating Item Novelty for Sequential Recommendation","authors":"Li-Chia Wang, Hao-Shang Ma, Jen-Wei Huang","doi":"10.1109/ASONAM55673.2022.10068599","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068599","url":null,"abstract":"Most sequential recommendation systems, including those that employ a variety of features and state-of-the-art network models, tend to favor items that are the most popular or of greatest relevance to the historic behavior of the user. Recommendations made under these conditions tend to be repetitive; i.e., many options that might be of interest to users are entirely disregarded. This paper presents a novel algorithm that assigns a novelty score to potential recommendation items. We also present an architecture by which to incorporate this functionality in existing recommendation systems. In experiments, the proposed NASM system outperformed state-of-the-art sequential recommender systems, thereby verifying that the inclusion of novelty score can indeed improve recommendation performance.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120994487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FOSINT-SI 2022 Symposium Organizing Committee FOSINT-SI 2022研讨会组委会
R. Alhajj
{"title":"FOSINT-SI 2022 Symposium Organizing Committee","authors":"R. Alhajj","doi":"10.1109/asonam.2014.6921537","DOIUrl":"https://doi.org/10.1109/asonam.2014.6921537","url":null,"abstract":"","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1