2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文中文

Extensive Attention Mechanisms in Graph Neural Networks for Materials Discovery 图神经网络在材料发现中的广泛注意机制

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00090

Guojing Cong, Talia Ben-Naim, Victor Fung, Anshul Gupta, R. Neumann, Mathias Steiner

We present our research where attention mechanism is extensively applied to various aspects of graph neural net- works for predicting materials properties. As a result, surrogate models can not only replace costly simulations for materials screening but also formulate hypotheses and insights to guide further design exploration. We predict formation energy of the Materials Project and gas adsorption of crystalline adsorbents, and demonstrate the superior performance of our graph neural networks. Moreover, attention reveals important substructures that the machine learning models deem important for a material to achieve desired target properties. Our model is based solely on standard structural input files containing atomistic descriptions of the adsorbent material candidates. We construct novel methodological extensions to match the prediction accuracy of state-of-the-art models some of which were built with hundreds of features at much higher computational cost. We show that sophisticated neural networks can obviate the need for elaborate feature engineering. Our approach can be more broadly applied to optimize gas capture processes at industrial scale.

我们介绍了我们的研究，其中注意机制被广泛应用于图神经网络预测材料性能的各个方面。因此，替代模型不仅可以取代昂贵的材料筛选模拟，还可以制定假设和见解，以指导进一步的设计探索。我们预测了材料项目的形成能和晶体吸附剂的气体吸附，并证明了我们的图神经网络的优越性能。此外，注意力揭示了重要的子结构，机器学习模型认为这些子结构对于材料实现所需的目标特性很重要。我们的模型完全基于包含吸附剂候选材料的原子描述的标准结构输入文件。我们构建了新的方法扩展，以匹配最先进的模型的预测精度，其中一些模型是用数百个特征构建的，计算成本要高得多。我们表明，复杂的神经网络可以避免复杂的特征工程的需要。我们的方法可以更广泛地应用于优化工业规模的气体捕获过程。

{"title":"Extensive Attention Mechanisms in Graph Neural Networks for Materials Discovery","authors":"Guojing Cong, Talia Ben-Naim, Victor Fung, Anshul Gupta, R. Neumann, Mathias Steiner","doi":"10.1109/ICDMW58026.2022.00090","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00090","url":null,"abstract":"We present our research where attention mechanism is extensively applied to various aspects of graph neural net- works for predicting materials properties. As a result, surrogate models can not only replace costly simulations for materials screening but also formulate hypotheses and insights to guide further design exploration. We predict formation energy of the Materials Project and gas adsorption of crystalline adsorbents, and demonstrate the superior performance of our graph neural networks. Moreover, attention reveals important substructures that the machine learning models deem important for a material to achieve desired target properties. Our model is based solely on standard structural input files containing atomistic descriptions of the adsorbent material candidates. We construct novel methodological extensions to match the prediction accuracy of state-of-the-art models some of which were built with hundreds of features at much higher computational cost. We show that sophisticated neural networks can obviate the need for elaborate feature engineering. Our approach can be more broadly applied to optimize gas capture processes at industrial scale.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121676321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Domain-Specific Deep Learning Feature Extractor for Diabetic Foot Ulcer Detection 特定领域深度学习特征提取器用于糖尿病足溃疡检测

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00041

R. Basiri, M. Popovic, Shehroz S. Khan

Diabetic Foot Ulcer (DFU) is a condition requiring constant monitoring and evaluations for treatment. DFU patient population is on the rise and will soon outpace the available health resources. Autonomous monitoring and evaluation of DFU wounds is a much-needed area in health care. In this paper, we evaluate and identify the most accurate feature extractor that is the core basis for developing a deep learning wound detection network. For the evaluation, we used mAP and F1-score on the publicly available DFU2020 dataset. A combination of UNet and EfficientNetb3 feature extractor resulted in the best evaluation among the 14 networks compared. UNet and Efficientnetb3 can be used as the classifier in the development of a comprehensive DFU domain-specific autonomous wound detection pipeline.

糖尿病足溃疡(DFU)是一种需要持续监测和评估治疗的疾病。DFU患者人数正在上升，并将很快超过现有的卫生资源。DFU伤口的自主监测和评估是医疗保健中急需的一个领域。在本文中，我们评估和识别最准确的特征提取器，这是开发深度学习伤口检测网络的核心基础。为了进行评估，我们在公开的DFU2020数据集上使用了mAP和F1-score。UNet和effentnetb3特征提取器的组合在14个网络中获得了最好的评价。UNet和Efficientnetb3可以作为分类器用于开发全面的DFU领域自主伤口检测管道。

引用次数: 0

Graph Convolutional Networks with Dependency Parser towards Multiview Representation Learning for Sentiment Analysis 面向情感分析多视图表示学习的依赖解析器图卷积网络

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00070

Minqiang Yang, Xinqi Liu, Chengsheng Mao, Bin Hu

Sentiment analysis has become increasingly important in natural language processing (NLP). Recent efforts have been devoted to the graph convolutional network (GCN) due to its advantages in handling the complex information. However, the improvement of GCN in NLP is hindered because the pretrained word vectors do not fit well in various contexts and the traditional edge building methods are not suited well for the long and complex context. To address these problems, we propose the LSTM-GCN model to contextualize the pretrained word vectors and extract the sentiment representations from the complex texts. Particularly, LSTM-GCN captures the sentiment feature representations from multiple different perspectives including context and syntax. In addition to extracting contextual representation from pretrained word vectors, we utilize the dependency parser to analyse the dependency correlation between each word to extract the syntax representation. For each text, we build a graph with each word in the text as a node. Besides the edges between the neighboring words, we also connect the nodes with dependency correlation to capture syntax representations. Moreover, we introduce the message passing mechanism (MPM) which allows the nodes to update their representation by extract information from its neighbors. Also, to improve the message passing performance, we set the edges to be trainable and initialize the edge weights with the pointwise mutual information (PMI) method. The results of the experiments show that our LSTM-GCN model outperforms several state-of-the-art models. And extensive experiments validate the rationality and effectiveness of our model.

情感分析在自然语言处理(NLP)中越来越重要。图卷积网络(GCN)由于其在处理复杂信息方面的优势，近年来得到了广泛的研究。然而，由于预训练的词向量不能很好地适应各种上下文，传统的边缘构建方法不能很好地适应长而复杂的上下文，阻碍了GCN在自然语言处理中的改进。为了解决这些问题，我们提出了LSTM-GCN模型来将预训练的词向量语境化，并从复杂文本中提取情感表示。特别是，LSTM-GCN从多个不同的角度(包括上下文和语法)捕获情感特征表示。除了从预训练的词向量中提取上下文表示外，我们还利用依赖解析器分析每个词之间的依赖关系以提取语法表示。对于每个文本，我们用文本中的每个单词作为节点构建一个图。除了相邻词之间的边，我们还使用依赖关系连接节点以捕获语法表示。此外，我们还引入了消息传递机制(MPM)，该机制允许节点通过从其邻居中提取信息来更新其表示。此外，为了提高消息传递性能，我们将边缘设置为可训练的，并使用点互信息(PMI)方法初始化边缘权重。实验结果表明，我们的LSTM-GCN模型优于几种最先进的模型。大量的实验验证了该模型的合理性和有效性。

{"title":"Graph Convolutional Networks with Dependency Parser towards Multiview Representation Learning for Sentiment Analysis","authors":"Minqiang Yang, Xinqi Liu, Chengsheng Mao, Bin Hu","doi":"10.1109/ICDMW58026.2022.00070","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00070","url":null,"abstract":"Sentiment analysis has become increasingly important in natural language processing (NLP). Recent efforts have been devoted to the graph convolutional network (GCN) due to its advantages in handling the complex information. However, the improvement of GCN in NLP is hindered because the pretrained word vectors do not fit well in various contexts and the traditional edge building methods are not suited well for the long and complex context. To address these problems, we propose the LSTM-GCN model to contextualize the pretrained word vectors and extract the sentiment representations from the complex texts. Particularly, LSTM-GCN captures the sentiment feature representations from multiple different perspectives including context and syntax. In addition to extracting contextual representation from pretrained word vectors, we utilize the dependency parser to analyse the dependency correlation between each word to extract the syntax representation. For each text, we build a graph with each word in the text as a node. Besides the edges between the neighboring words, we also connect the nodes with dependency correlation to capture syntax representations. Moreover, we introduce the message passing mechanism (MPM) which allows the nodes to update their representation by extract information from its neighbors. Also, to improve the message passing performance, we set the edges to be trainable and initialize the edge weights with the pointwise mutual information (PMI) method. The results of the experiments show that our LSTM-GCN model outperforms several state-of-the-art models. And extensive experiments validate the rationality and effectiveness of our model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121063083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems 平等混淆公平:测量自动决策系统中基于群体的差异

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00027

Furkan Gursoy, I. Kakadiaris

As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.

随着人工智能在影响人类和社会的决策中发挥越来越重要的作用，自动化决策系统的问责制越来越受到研究人员和实践者的关注。公平涉及消除对个人或敏感群体的不公正待遇和歧视，是问责制的一个关键方面。然而，为了评估公平，文献中有大量的公平指标，这些指标采用了不同的观点和假设，这些观点和假设往往是不相容的。这项工作的重点是群体公平。大多数组公平度量要求从属于不同敏感组的混淆矩阵计算的选定统计数据之间的奇偶性。在此基础上，本文提出了一种新的相等混淆公平性检验方法来检验自动决策系统的公平性，并提出了一种新的混淆奇偶校验误差来量化任何不公平的程度。为了进一步分析潜在不公平的来源，还提出了一种适当的事后分析方法。测试、度量和事后分析的有用性通过对有争议的COMPAS案例的案例研究得到了证明。COMPAS是美国使用的一种自动决策系统，用于帮助法官评估再犯风险。总的来说，这里提供的方法和指标可以作为更广泛的问责评估的一部分来评估自动决策系统的公平性，比如那些基于系统问责基准的评估。

{"title":"Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems","authors":"Furkan Gursoy, I. Kakadiaris","doi":"10.1109/ICDMW58026.2022.00027","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00027","url":null,"abstract":"As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116381773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scene and Texture Based Feature Set for DeepFake Video Detection 基于场景和纹理的深度假视频检测特征集

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00021

A. Ramkissoon, Vijayanandh Rajamanickam, W. Goodridge

The existence of fake videos is a problem that is challenging today's social media-enabled world. There are many classifications for fake videos with one of the most popular being DeepFakes. Detecting such fake videos is a challenging issue. This research attempts to comprehend the characteristics that belong to DeepFake videos. In attempting to understand DeepFake videos this work investigates the characteristics of the video that make them unique. As such this research uses scene and texture detection to develop a unique feature set containing 19 data features which is capable of detecting whether a video is a DeepFake or not. This study validates the feature set using a standard dataset of the features relating to the characteristics of the video. These features are analysed using a classification machine learning model. The results of these experiments are examined using four evaluation methodologies. The analysis reveals positive performance with the use of the ML method and the feature set. From these results, it can be ascertained that using the proposed feature set, a video can be predicted as a DeepFake or not and as such prove the hypothesis that there exists a correlation between the characteristics of a video and its genuineness, i.e., whether or not a video is a DeepFake.

假视频的存在是一个挑战当今社交媒体世界的问题。假视频有很多分类，其中最流行的一种是DeepFakes。检测这样的假视频是一个具有挑战性的问题。本研究试图理解属于DeepFake视频的特征。在试图理解DeepFake视频的过程中，这项工作调查了使其独特的视频特征。因此，本研究使用场景和纹理检测来开发一个包含19个数据特征的独特特征集，该特征集能够检测视频是否为DeepFake。本研究使用与视频特征相关的特征的标准数据集验证了特征集。使用分类机器学习模型对这些特征进行分析。这些实验的结果使用四种评价方法进行检验。分析表明，使用ML方法和特征集具有积极的性能。从这些结果中，可以确定使用所提出的特征集，可以预测视频是否为DeepFake，从而证明了视频的特征与其真实性之间存在相关性的假设，即视频是否为DeepFake。

{"title":"Scene and Texture Based Feature Set for DeepFake Video Detection","authors":"A. Ramkissoon, Vijayanandh Rajamanickam, W. Goodridge","doi":"10.1109/ICDMW58026.2022.00021","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00021","url":null,"abstract":"The existence of fake videos is a problem that is challenging today's social media-enabled world. There are many classifications for fake videos with one of the most popular being DeepFakes. Detecting such fake videos is a challenging issue. This research attempts to comprehend the characteristics that belong to DeepFake videos. In attempting to understand DeepFake videos this work investigates the characteristics of the video that make them unique. As such this research uses scene and texture detection to develop a unique feature set containing 19 data features which is capable of detecting whether a video is a DeepFake or not. This study validates the feature set using a standard dataset of the features relating to the characteristics of the video. These features are analysed using a classification machine learning model. The results of these experiments are examined using four evaluation methodologies. The analysis reveals positive performance with the use of the ML method and the feature set. From these results, it can be ascertained that using the proposed feature set, a video can be predicted as a DeepFake or not and as such prove the hypothesis that there exists a correlation between the characteristics of a video and its genuineness, i.e., whether or not a video is a DeepFake.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128776656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AWS-EP: A Multi-Task Prediction Approach for MBTI/Big5 Personality Tests AWS-EP: MBTI/大五人格测试的多任务预测方法

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00049

Fahed Elourajini, Esma Aïmeur

Personality and preferences are essential variables in computational sociology and social science. They describe differences between people at both individual and group levels. In recent years, automated approaches that detect personality traits have received much attention due to the massive availability of individuals' digital footprints. Furthermore, researchers have demonstrated a strong link between personality traits and various downstream tasks such as personalized filtering, profile categorization, and profile embedding. Therefore, the detection of individuals' preferences has become a critical process for improving the performance of different tasks. In this paper, we build on the importance of the individual's behaviour and propose a novel multitask modeling approach that understands and models the users' personalities based on their textual posts and comments within a multimedia framework. The novelties of our work compared to state-of-the-art personality prediction models are: improving the performance of the Big five-factor model (Big5) personality test using shared information from the Myers Briggs Type Indicator (MBTI) test, and proposing a one personality detection framework that accurately predicts both MBTI and Big5 tests simultaneously. Predicting both tests simultaneously improves the personality detection framework's flexibility to be used for different goals instead of being used only for a unique purpose (whether for the MBTI test or for the Big5 test separately). Experiments and results demonstrate that our solution outperforms state-of-the-art models across multiple famous personality datasets.

个性和偏好是计算社会学和社会科学的基本变量。它们描述了个人和群体层面上人与人之间的差异。近年来，由于个人数字足迹的大量可用性，检测个性特征的自动化方法受到了广泛关注。此外，研究人员还证明了人格特质与个性化过滤、档案分类和档案嵌入等下游任务之间的密切联系。因此，个体偏好的检测已成为提高不同任务绩效的关键过程。在本文中，我们以个人行为的重要性为基础，提出了一种新的多任务建模方法，该方法基于用户在多媒体框架内的文本帖子和评论来理解和建模用户的个性。与最先进的人格预测模型相比，我们的工作的新颖之处在于:利用迈尔斯布里格斯类型指标(MBTI)测试的共享信息改进了大五因素模型(Big5)人格测试的性能，并提出了一个同时准确预测MBTI和Big5测试的单一人格检测框架。同时预测这两个测试提高了人格检测框架的灵活性，可以用于不同的目标，而不是只用于一个单一的目的(无论是MBTI测试还是Big5测试)。实验和结果表明，我们的解决方案在多个著名的个性数据集上优于最先进的模型。

{"title":"AWS-EP: A Multi-Task Prediction Approach for MBTI/Big5 Personality Tests","authors":"Fahed Elourajini, Esma Aïmeur","doi":"10.1109/ICDMW58026.2022.00049","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00049","url":null,"abstract":"Personality and preferences are essential variables in computational sociology and social science. They describe differences between people at both individual and group levels. In recent years, automated approaches that detect personality traits have received much attention due to the massive availability of individuals' digital footprints. Furthermore, researchers have demonstrated a strong link between personality traits and various downstream tasks such as personalized filtering, profile categorization, and profile embedding. Therefore, the detection of individuals' preferences has become a critical process for improving the performance of different tasks. In this paper, we build on the importance of the individual's behaviour and propose a novel multitask modeling approach that understands and models the users' personalities based on their textual posts and comments within a multimedia framework. The novelties of our work compared to state-of-the-art personality prediction models are: improving the performance of the Big five-factor model (Big5) personality test using shared information from the Myers Briggs Type Indicator (MBTI) test, and proposing a one personality detection framework that accurately predicts both MBTI and Big5 tests simultaneously. Predicting both tests simultaneously improves the personality detection framework's flexibility to be used for different goals instead of being used only for a unique purpose (whether for the MBTI test or for the Big5 test separately). Experiments and results demonstrate that our solution outperforms state-of-the-art models across multiple famous personality datasets.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116998730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Self-Organizing Map-Based Graph Clustering and Visualization on Streaming Graphs 基于自组织映射的图聚类与流图可视化

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00097

Prabin B. Lamichhane, W. Eberle

Many real-world networks, such as computer networks, social networks, and the Internet of Things (loT), can be represented by streaming (or dynamic) graphs. Analysis of these streaming graphs serves as the basis for classification, anomaly detection, community detection, clustering, and visual-ization tasks. This paper uses a Self-Organizing Map (SOM), an unsupervised learning model, to cluster and visualize streaming graphs. As a result, a SOM is used to visualize and interpret the anomaly detection technique on high-dimensional graph-structured data. For this, the SOM-based graph clustering and visualization technique is divided into two phases. In the first phase, we use various existing graph sketching techniques like StreamS pot, SpotLight, and SnapSketch to embed streaming graphs into sketched vectors. Later, in the second phase, we pass the sketched vector inputs into a SOM to cluster and visualize the normal and anomalous graph streams to interpret the anomaly detection technique. In addition, the SOM-based visualization also helps to estimate the quality of embedding (or sketching) techniques.

许多现实世界的网络，如计算机网络、社交网络和物联网(loT)，都可以用流(或动态)图来表示。这些流图的分析是分类、异常检测、社区检测、聚类和可视化任务的基础。本文使用一种无监督学习模型——自组织映射(SOM)对流图进行聚类和可视化。因此，SOM用于对高维图结构数据的异常检测技术进行可视化和解释。为此，本文将基于som的图聚类和可视化技术分为两个阶段。在第一阶段，我们使用各种现有的图形素描技术，如StreamS pot、SpotLight和SnapSketch，将流图嵌入到草图向量中。随后，在第二阶段，我们将草图向量输入传递到SOM中进行聚类，并将正常和异常图形流可视化，以解释异常检测技术。此外，基于som的可视化还有助于估计嵌入(或草图)技术的质量。

引用次数: 0

Exploiting Cross-Order Patterns and Link Prediction in Higher-Order Networks 利用高阶网络中的交叉阶模式和链接预测

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00156

Hao Tian, Shengmin Jin, R. Zafarani

With the demand to model the relationships among three or more entities, higher-order networks are now more widespread across various domains. Relationships such as multiauthor collaborations, co-appearance of keywords, and copurchases can be naturally modeled as higher-order networks. However, due to (1) computational complexity and (2) insufficient higher-order data, exploring higher-order networks is often limited to order-3 motifs (or triangles). To address these problems, we explore and quantify similarites among various network orders. Our goal is to build relationships between different network orders and to solve higher-order problems using lower-order information. Similarities between different orders are not comparable directly. Hence, we introduce a set of general cross-order similarities, and a measure: subedge rate. Our experiments on multiple real-world datasets demonstrate that most higher-order networks have considerable consistency as we move from higher-orders to lower-orders. Utilizing this discovery, we develop a new cross-order framework for higher-order link prediction method. These methods can predict higher-order links from lower-order edges, which cannot be attained by current higher-order methods that rely on data from a single order.

由于需要对三个或更多实体之间的关系进行建模，高阶网络现在在各个领域得到了更广泛的应用。多作者合作、关键词共同出现和共同购买等关系可以自然地建模为高阶网络。然而，由于(1)计算复杂性和(2)高阶数据不足，探索高阶网络通常仅限于3阶基元(或三角形)。为了解决这些问题，我们探索并量化了各种网络订单之间的相似性。我们的目标是建立不同网络阶数之间的关系，并使用低阶信息解决高阶问题。不同阶之间的相似性不能直接比较。因此，我们引入了一组一般的交叉阶相似度，以及一个度量:次级套期保值率。我们在多个真实世界数据集上的实验表明，当我们从高阶到低阶移动时，大多数高阶网络具有相当大的一致性。利用这一发现，我们开发了一种新的高阶链路预测方法的跨阶框架。这些方法可以从低阶边预测高阶链接，这是当前依赖于单阶数据的高阶方法无法实现的。

{"title":"Exploiting Cross-Order Patterns and Link Prediction in Higher-Order Networks","authors":"Hao Tian, Shengmin Jin, R. Zafarani","doi":"10.1109/ICDMW58026.2022.00156","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00156","url":null,"abstract":"With the demand to model the relationships among three or more entities, higher-order networks are now more widespread across various domains. Relationships such as multiauthor collaborations, co-appearance of keywords, and copurchases can be naturally modeled as higher-order networks. However, due to (1) computational complexity and (2) insufficient higher-order data, exploring higher-order networks is often limited to order-3 motifs (or triangles). To address these problems, we explore and quantify similarites among various network orders. Our goal is to build relationships between different network orders and to solve higher-order problems using lower-order information. Similarities between different orders are not comparable directly. Hence, we introduce a set of general cross-order similarities, and a measure: subedge rate. Our experiments on multiple real-world datasets demonstrate that most higher-order networks have considerable consistency as we move from higher-orders to lower-orders. Utilizing this discovery, we develop a new cross-order framework for higher-order link prediction method. These methods can predict higher-order links from lower-order edges, which cannot be attained by current higher-order methods that rely on data from a single order.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132817409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sentence-BERT Distinguishes Good and Bad Essays in Cross-prompt Automated Essay Scoring 句子- bert在交叉提示自动作文评分中区分好文章和坏文章

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00045

Toru Sasaki, Tomonari Masada

Automated Essay Scoring (AES) refers to a set of processes that automatically assigns grades to student-written essays with machine learning models. Existing AES models are mostly trained prompt-specifically with supervised learning, which requires the essay prompt to be accessible to the system vendor at the time of model training. However, essay prompts for high-stakes testing should usually be kept confidential before the test date, which demands the model to be cross-promptly trainable with pre-scored essay data already in hands. Document embeddings obtained from pretrained language models such as Sentence-BERT (sbert) are primarily expected to represent the semantic content of the text. We hypothesize SBERT embeddings also contain assessment-relevant elements that are extractable by document embedding decomposition through Principal Component Analysis (PCA) enhanced with Normalized Discounted Cumulative Gain (nDCG) measurement. The identified evaluative elements in the entire embedding space of the source essays are then cross-promptly transferred to the target essays written on different prompts for binary clustering task of dividing high/low-scored groups. The result implies non-finetuned SBERT already contains evaluative elements to distinguish good and bad essays.

自动论文评分(Automated Essay Scoring, AES)是指一组使用机器学习模型自动为学生写的论文分配分数的过程。现有的AES模型大多是即时训练的——特别是有监督的学习，这要求系统供应商在模型训练时可以访问论文提示。然而，高风险考试的作文提示通常应该在考试日期前保密，这就要求该模型可以交叉快速训练，并且已经掌握了预评分的作文数据。从预训练语言模型(如Sentence-BERT (sbert))中获得的文档嵌入主要用于表示文本的语义内容。我们假设SBERT嵌入还包含与评估相关的元素，这些元素可以通过主成分分析(PCA)和归一化贴现累积增益(nDCG)测量增强的文档嵌入分解来提取。然后，在源文章的整个嵌入空间中识别出的评价元素被交叉迅速地转移到在不同提示上写的目标文章中，用于划分高分/低分组的二元聚类任务。结果表明，非微调的SBERT已经包含了区分好文章和坏文章的评价元素。

{"title":"Sentence-BERT Distinguishes Good and Bad Essays in Cross-prompt Automated Essay Scoring","authors":"Toru Sasaki, Tomonari Masada","doi":"10.1109/ICDMW58026.2022.00045","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00045","url":null,"abstract":"Automated Essay Scoring (AES) refers to a set of processes that automatically assigns grades to student-written essays with machine learning models. Existing AES models are mostly trained prompt-specifically with supervised learning, which requires the essay prompt to be accessible to the system vendor at the time of model training. However, essay prompts for high-stakes testing should usually be kept confidential before the test date, which demands the model to be cross-promptly trainable with pre-scored essay data already in hands. Document embeddings obtained from pretrained language models such as Sentence-BERT (sbert) are primarily expected to represent the semantic content of the text. We hypothesize SBERT embeddings also contain assessment-relevant elements that are extractable by document embedding decomposition through Principal Component Analysis (PCA) enhanced with Normalized Discounted Cumulative Gain (nDCG) measurement. The identified evaluative elements in the entire embedding space of the source essays are then cross-promptly transferred to the target essays written on different prompts for binary clustering task of dividing high/low-scored groups. The result implies non-finetuned SBERT already contains evaluative elements to distinguish good and bad essays.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Post-pandemic Economic Transformations in the United States of America 大流行后美利坚合众国的经济转型

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00153

Avi Chawla, Nidhi Mulay, M. Bahrami, Vikas Bishnoi, Yatin Katyal, Esteban Moro Egido, Ankur Saraswat, A. Pentland

The COVID-19 pandemic has impacted economic activity not only in the United States, but across the globe. Lockdown and travel restrictions imposed by local authorities have led to change in customer preferences and thus transformation of economic activity from traditional areas to new regions. While most changes have been temporary and short term, some of them have been observed to be of permanent nature. Using large-scale aggregated and anonymized transaction data across various socio-economic groups, we analyse and discuss such temporary relocation of citizens' economic activities in metropolitan areas of 15 states in the US. The results of this study have extensive implications for urban planners and business owners, and can provide insights into the temporary relocation of economic activities resulting from an extreme exogenous shock like the COVID-19 pandemic.

COVID-19大流行不仅影响了美国的经济活动，也影响了全球的经济活动。地方当局实施的封锁和旅行限制导致客户偏好发生变化，从而使经济活动从传统地区转向新的地区。虽然大多数变化是暂时和短期的，但观察到其中一些变化具有永久性。本文利用大规模汇总和匿名化的不同社会经济群体的交易数据，分析和讨论了美国15个州大都市地区公民经济活动的这种临时搬迁。这项研究的结果对城市规划者和企业主具有广泛的影响，并可以为了解因COVID-19大流行等极端外生冲击而导致的经济活动临时搬迁提供见解。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀