首页 > 最新文献

2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
An Efficient and Reliable Tolerance- Based Algorithm for Principal Component Analysis 一种高效可靠的基于公差的主成分分析算法
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00088
Michael Yeh, Ming Gu
Principal component analysis (PCA) is an important method for dimensionality reduction in data science and machine learning. However, it is expensive for large matrices when only a few components are needed. Existing fast PCA algorithms typically assume the user will supply the number of components needed, but in practice, they may not know this number beforehand. Thus, it is important to have fast PCA algorithms depending on a tolerance. We develop one such algorithm that runs quickly for matrices with rapidly decaying singular values, provide approximation error bounds that are within a constant factor away from optimal, and demonstrate its utility with data from a variety of applications.
主成分分析(PCA)是数据科学和机器学习中重要的降维方法。然而,当只需要几个组件时,对于大型矩阵来说,这是昂贵的。现有的快速PCA算法通常假设用户将提供所需组件的数量,但在实践中,他们可能事先不知道这个数量。因此,基于容差的快速PCA算法非常重要。我们开发了一种这样的算法,它可以快速运行具有快速衰减的奇异值的矩阵,提供距离最优值在常数因子范围内的近似误差界限,并通过来自各种应用程序的数据演示其实用性。
{"title":"An Efficient and Reliable Tolerance- Based Algorithm for Principal Component Analysis","authors":"Michael Yeh, Ming Gu","doi":"10.1109/ICDMW58026.2022.00088","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00088","url":null,"abstract":"Principal component analysis (PCA) is an important method for dimensionality reduction in data science and machine learning. However, it is expensive for large matrices when only a few components are needed. Existing fast PCA algorithms typically assume the user will supply the number of components needed, but in practice, they may not know this number beforehand. Thus, it is important to have fast PCA algorithms depending on a tolerance. We develop one such algorithm that runs quickly for matrices with rapidly decaying singular values, provide approximation error bounds that are within a constant factor away from optimal, and demonstrate its utility with data from a variety of applications.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"935 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123062528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cut the peaches: image segmentation for utility pattern mining in food processing 切桃子:食品加工中实用模式挖掘的图像分割
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00072
Diletta Chiaro, E. Prezioso, Stefano Izzo, F. Giampaolo, S. Cuomo, F. Piccialli
The progress achieved in the field of information and communication technologies, particularly in computer science, and the growing capacity of new types of computational systems (cloud/edge computing) significantly contributed to the cyber-physical systems, networks where cooperating computational entities are intensively linked to the surrounding physical en-vironment and its on-going operations. All that has increased the possibility of undertaking tasks hitherto considered to be an exclusively human concern automatically: hence the gradual yet progressive tendency of many companies to adopt artificial intelligence (AI) and machine learning (ML) technologies to automate human activities. This papers falls within the context of deep learning (DL) for utility pattern mining applied to Industry 4.0. Starting from images supplied by a multinational company operating in the food processing industry, we provide a DL framework for real-time pattern recognition applied in the automation of peach pitters. To this aim, we perform transfer learning (TL) for image segmentation by embedding seven pre-trained encoders into multiple segmentation architectures and evaluate and compare segmentation performance in terms of met-rics and inference speed on our data. Furthermore, we propose an attention mechanism to improve multiscale feature learning in the FPN through attention-guided feature aggregation.
信息和通信技术领域取得的进展,特别是在计算机科学领域取得的进展,以及新型计算系统(云/边缘计算)不断增长的能力,极大地促进了网络物理系统的发展,在网络物理系统中,协作计算实体与周围物理环境及其持续运行紧密相连。所有这些都增加了自动完成迄今为止被认为是人类独有的任务的可能性:因此,许多公司逐渐采用人工智能(AI)和机器学习(ML)技术来实现人类活动自动化的趋势。本文属于深度学习(DL)的实用模式挖掘应用于工业4.0的背景下。从一家从事食品加工行业的跨国公司提供的图像开始,我们提供了一个用于实时模式识别的深度学习框架,该框架应用于桃子打罐自动化。为此,我们通过将七个预训练的编码器嵌入到多个分割架构中来执行图像分割的迁移学习(TL),并根据数据的度量和推理速度评估和比较分割性能。此外,我们提出了一种注意机制,通过注意引导的特征聚合来改善FPN中的多尺度特征学习。
{"title":"Cut the peaches: image segmentation for utility pattern mining in food processing","authors":"Diletta Chiaro, E. Prezioso, Stefano Izzo, F. Giampaolo, S. Cuomo, F. Piccialli","doi":"10.1109/ICDMW58026.2022.00072","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00072","url":null,"abstract":"The progress achieved in the field of information and communication technologies, particularly in computer science, and the growing capacity of new types of computational systems (cloud/edge computing) significantly contributed to the cyber-physical systems, networks where cooperating computational entities are intensively linked to the surrounding physical en-vironment and its on-going operations. All that has increased the possibility of undertaking tasks hitherto considered to be an exclusively human concern automatically: hence the gradual yet progressive tendency of many companies to adopt artificial intelligence (AI) and machine learning (ML) technologies to automate human activities. This papers falls within the context of deep learning (DL) for utility pattern mining applied to Industry 4.0. Starting from images supplied by a multinational company operating in the food processing industry, we provide a DL framework for real-time pattern recognition applied in the automation of peach pitters. To this aim, we perform transfer learning (TL) for image segmentation by embedding seven pre-trained encoders into multiple segmentation architectures and evaluate and compare segmentation performance in terms of met-rics and inference speed on our data. Furthermore, we propose an attention mechanism to improve multiscale feature learning in the FPN through attention-guided feature aggregation.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127588477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving net ecosystem CO2 flux prediction using memory-based interpretable machine learning 利用基于记忆的可解释机器学习改进净生态系统二氧化碳通量预测
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00145
Siyan Liu, Dawei Lu, D. Ricciuto, A. Walker
Terrestrial ecosystems play a central role in the global carbon cycle and affect climate change. However, our predictive understanding of these systems is still limited due to their complexity and uncertainty about how key drivers and their legacy effects influence carbon fluxes. Here, we propose an interpretable Long Short-Term Memory (iLSTM) network for predicting net ecosystem CO2 exchange (NEE) and interpreting the influence on the NEE prediction from environmental drivers and their memory effects. We consider five drivers and apply the method to three forest sites in the United States. Besides performing the prediction in each site, we also conduct transfer learning by using the iLSTM model trained in one site to predict at other sites. Results show that the iLSTM model produces good NEE predictions for all three sites and, more importantly, it provides reasonable interpretations on the input driver's importance as well as their temporal importance on the NEE prediction. Additionally, the iLSTM model demonstrates good across-site transferability in terms of both prediction accuracy and interpretability. The transferability can improve the NEE prediction in unobserved forest sites, and the interpretability advances our predictive understanding and guides process-based model development.
陆地生态系统在全球碳循环和影响气候变化中发挥着核心作用。然而,由于这些系统的复杂性和关键驱动因素及其遗留效应如何影响碳通量的不确定性,我们对这些系统的预测性理解仍然有限。在此,我们提出了一个可解释的长短期记忆(iLSTM)网络来预测净生态系统二氧化碳交换(NEE),并解释环境驱动因素及其记忆效应对净生态系统二氧化碳交换预测的影响。我们考虑了五个驱动因素,并将该方法应用于美国的三个森林地点。除了在每个站点进行预测外,我们还使用在一个站点训练的iLSTM模型进行迁移学习,以预测其他站点。结果表明,iLSTM模型对所有三个站点都能产生良好的NEE预测,更重要的是,它对输入驱动因素的重要性及其对NEE预测的时间重要性提供了合理的解释。此外,iLSTM模型在预测精度和可解释性方面具有良好的跨站点可移植性。可转移性可以提高对未观测样地的新能源经济预测能力,可解释性可以提高我们对新能源经济预测的认识,并指导基于过程的模型开发。
{"title":"Improving net ecosystem CO2 flux prediction using memory-based interpretable machine learning","authors":"Siyan Liu, Dawei Lu, D. Ricciuto, A. Walker","doi":"10.1109/ICDMW58026.2022.00145","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00145","url":null,"abstract":"Terrestrial ecosystems play a central role in the global carbon cycle and affect climate change. However, our predictive understanding of these systems is still limited due to their complexity and uncertainty about how key drivers and their legacy effects influence carbon fluxes. Here, we propose an interpretable Long Short-Term Memory (iLSTM) network for predicting net ecosystem CO2 exchange (NEE) and interpreting the influence on the NEE prediction from environmental drivers and their memory effects. We consider five drivers and apply the method to three forest sites in the United States. Besides performing the prediction in each site, we also conduct transfer learning by using the iLSTM model trained in one site to predict at other sites. Results show that the iLSTM model produces good NEE predictions for all three sites and, more importantly, it provides reasonable interpretations on the input driver's importance as well as their temporal importance on the NEE prediction. Additionally, the iLSTM model demonstrates good across-site transferability in terms of both prediction accuracy and interpretability. The transferability can improve the NEE prediction in unobserved forest sites, and the interpretability advances our predictive understanding and guides process-based model development.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"R-30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126631298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What Do Audio Transformers Hear? Probing Their Representations For Language Delivery & Structure 音频变压器听到什么?语言表达与结构表征探析
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00120
Yaman Kumar Singla, Jui Shah, Changyou Chen, R. Shah
Transformer models across multiple domains such as natural language processing and speech form an unavoidable part of the tech stack of practitioners and researchers alike. Au-dio transformers that exploit representational learning to train on unlabeled speech have recently been used for tasks from speaker verification to discourse-coherence with much success. However, little is known about what these models learn and represent in the high-dimensional latent space. In this paper, we interpret two such recent state-of-the-art models, wav2vec2.0 and Mockingjay, on linguistic and acoustic features. We probe each of their layers to understand what it is learning and at the same time, we draw a distinction between the two models. By comparing their performance across a wide variety of settings including native, non-native, read and spontaneous speeches, we also show how much these models are able to learn transferable features. Our results show that the models are capable of significantly capturing a wide range of characteristics such as audio, fluency, supraseg-mental pronunciation, and even syntactic and semantic text-based characteristics. For each category of characteristics, we identify a learning pattern for each framework and conclude which model and which layer of that model is better for a specific category of feature to choose for feature extraction for downstream tasks.
跨多个领域(如自然语言处理和语音)的转换模型是从业者和研究人员不可避免的技术堆栈的一部分。利用表征学习对未标记语音进行训练的音频转换器最近被用于从说话人验证到话语连贯的任务,并取得了很大的成功。然而,对于这些模型在高维潜在空间中学习和表示什么,人们知之甚少。在本文中,我们解释了两个最新的最先进的模型,wav2vec2.0和Mockingjay,关于语言和声学特征。我们探测它们的每一层,以了解它在学习什么,同时,我们在两个模型之间画出区别。通过比较它们在各种环境下的表现,包括母语、非母语、阅读和自发演讲,我们也展示了这些模型能够学习到多少可转移的特征。我们的研究结果表明,这些模型能够显著地捕获广泛的特征,如音频、流利性、超心理发音,甚至是基于文本的句法和语义特征。对于每个类别的特征,我们为每个框架确定一个学习模式,并得出结论,哪个模型和该模型的哪一层更适合用于下游任务的特定类别的特征提取。
{"title":"What Do Audio Transformers Hear? Probing Their Representations For Language Delivery & Structure","authors":"Yaman Kumar Singla, Jui Shah, Changyou Chen, R. Shah","doi":"10.1109/ICDMW58026.2022.00120","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00120","url":null,"abstract":"Transformer models across multiple domains such as natural language processing and speech form an unavoidable part of the tech stack of practitioners and researchers alike. Au-dio transformers that exploit representational learning to train on unlabeled speech have recently been used for tasks from speaker verification to discourse-coherence with much success. However, little is known about what these models learn and represent in the high-dimensional latent space. In this paper, we interpret two such recent state-of-the-art models, wav2vec2.0 and Mockingjay, on linguistic and acoustic features. We probe each of their layers to understand what it is learning and at the same time, we draw a distinction between the two models. By comparing their performance across a wide variety of settings including native, non-native, read and spontaneous speeches, we also show how much these models are able to learn transferable features. Our results show that the models are capable of significantly capturing a wide range of characteristics such as audio, fluency, supraseg-mental pronunciation, and even syntactic and semantic text-based characteristics. For each category of characteristics, we identify a learning pattern for each framework and conclude which model and which layer of that model is better for a specific category of feature to choose for feature extraction for downstream tasks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127568500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mining High Utility Itemset with Multiple Minimum Utility Thresholds Based on Utility Deviation 基于效用偏差的多最小效用阈值高效用项集挖掘
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00071
Naji Alhusaini, Jing Li, Philippe Fournier-Viger, Ammar Hawbani, Guilin Chen
High Utility Itemset Mining (HUIM) is the task of extracting actionable patterns considering the utility of items such as profits and quantities. An important issue with traditional HUIM methods is that they evaluate all items using a single threshold, which is inconsistent with reality due to differences in the nature and importance of items. Recently, algorithms were proposed to address this problem by assigning a minimum item utility threshold to each item. However, since the minimum item utility (MIU) is expressed as a percentage of the external utility, these methods still face two problems, called “itemset missing” and “itemset explosion”. To solve these problems, this paper introduces a novel notion of Utility Deviation (UD), which is calculated based on the standard deviation. The U D and actual utility are jointly used to calculate the MIU of items. By doing so, the problems of “itemset missing” and “itemset explosion” are alleviated. To implement and evaluate the U D notion, a novel algorithm is proposed, called HUI-MMU-UD. Experimental results demonstrate the effectiveness of the proposed notion for solving the problems of “itemset missing” and “itemset explosion”. Results also show that the proposed algorithm outperforms the previous HUI-MMU algorithm in many cases, in terms of runtime and memory usage.
高效用项集挖掘(HUIM)是一项考虑利润和数量等项的效用提取可操作模式的任务。传统HUIM方法的一个重要问题是,它们使用单一阈值来评估所有项目,由于项目的性质和重要性的差异,这与现实不一致。最近提出了一种算法,通过为每个项目分配最小项目效用阈值来解决这个问题。然而,由于最小项目效用(MIU)是用外部效用的百分比表示的,这些方法仍然面临两个问题,称为“项目集缺失”和“项目集爆炸”。为了解决这些问题,本文引入了基于标准偏差计算的效用偏差的新概念。在计算项目的MIU时,采用了U D和实际效用相结合的方法。这样可以缓解“物品集缺失”和“物品集爆炸”的问题。为了实现和评估U- D概念,提出了一种新的算法,称为HUI-MMU-UD。实验结果证明了该方法在解决“项集缺失”和“项集爆炸”问题上的有效性。结果还表明,在运行时间和内存使用方面,该算法在许多情况下都优于以前的HUI-MMU算法。
{"title":"Mining High Utility Itemset with Multiple Minimum Utility Thresholds Based on Utility Deviation","authors":"Naji Alhusaini, Jing Li, Philippe Fournier-Viger, Ammar Hawbani, Guilin Chen","doi":"10.1109/ICDMW58026.2022.00071","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00071","url":null,"abstract":"High Utility Itemset Mining (HUIM) is the task of extracting actionable patterns considering the utility of items such as profits and quantities. An important issue with traditional HUIM methods is that they evaluate all items using a single threshold, which is inconsistent with reality due to differences in the nature and importance of items. Recently, algorithms were proposed to address this problem by assigning a minimum item utility threshold to each item. However, since the minimum item utility (MIU) is expressed as a percentage of the external utility, these methods still face two problems, called “itemset missing” and “itemset explosion”. To solve these problems, this paper introduces a novel notion of Utility Deviation (UD), which is calculated based on the standard deviation. The U D and actual utility are jointly used to calculate the MIU of items. By doing so, the problems of “itemset missing” and “itemset explosion” are alleviated. To implement and evaluate the U D notion, a novel algorithm is proposed, called HUI-MMU-UD. Experimental results demonstrate the effectiveness of the proposed notion for solving the problems of “itemset missing” and “itemset explosion”. Results also show that the proposed algorithm outperforms the previous HUI-MMU algorithm in many cases, in terms of runtime and memory usage.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"72 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126926019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Fair Representation Learning in Knowledge Graph with Stable Adversarial Debiasing 基于稳定对抗去偏的知识图公平表示学习研究
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00119
Yihe Wang, Mohammad Mahdi Khalili, X. Zhang
With graph-structured tremendous information, Knowledge Graphs (KG) aroused increasing interest in aca-demic research and industrial applications. Recent studies have shown demographic bias, in terms of sensitive attributes (e.g., gender and race), exist in the learned representations of KG entities. Such bias negatively affects specific popu-lations, especially minorities and underrepresented groups, and exacerbates machine learning-based human inequality. Adversariallearning is regarded as an effective way to alleviate bias in the representation learning model by simultaneously training a task-specific predictor and a sensitive attribute-specific discriminator. However, due to the unique challenge caused by topological structure and the comprehensive re-lationship between knowledge entities, adversarial learning-based debiasing is rarely studied in representation learning in knowledge graphs. In this paper, we propose a framework to learn unbiased representations for nodes and edges in knowledge graph mining. Specifically, we integrate a simple-but-effective normalization technique with Graph Neural Networks (GNNs) to constrain the weights updating process. Moreover, as a work-in-progress paper, we also find that the introduced weights normalization technique can mitigate the pitfalls of instability in adversarial debasing towards fair-and-stable machine learning. We evaluate the proposed framework on a benchmarking graph with multiple edge types and node types. The experimental results show that our model achieves comparable or better gender fairness over three competitive baselines on Equality of Odds. Importantly, our superiority in the fair model does not scarify the performance in the knowledge graph task (i.e., multi-class edge classification).
知识图谱(Knowledge Graphs, KG)以其庞大的信息结构引起了越来越多的学术研究和工业应用的兴趣。最近的研究表明,在KG实体的学习表征中存在敏感属性(如性别和种族)方面的人口统计学偏见。这种偏见对特定人群产生了负面影响,尤其是少数民族和代表性不足的群体,并加剧了基于机器学习的人类不平等。通过同时训练特定任务的预测器和特定属性的敏感判别器,对抗学习被认为是缓解表征学习模型偏差的有效方法。然而,由于拓扑结构带来的独特挑战和知识实体之间的综合关系,在知识图表示学习中基于对抗性学习的去偏研究很少。在本文中,我们提出了一个框架来学习知识图挖掘中节点和边的无偏表示。具体来说,我们将一种简单而有效的归一化技术与图神经网络(gnn)相结合,以约束权重更新过程。此外,作为一篇正在进行的论文,我们还发现引入的权重归一化技术可以减轻对抗性贬低中不稳定的陷阱,从而实现公平和稳定的机器学习。我们在具有多个边类型和节点类型的基准图上评估了所提出的框架。实验结果表明,我们的模型在赔率平等的三个竞争基线上达到了相当或更好的性别公平。重要的是,我们在公平模型上的优势并没有牺牲知识图任务(即多类边缘分类)的性能。
{"title":"Towards Fair Representation Learning in Knowledge Graph with Stable Adversarial Debiasing","authors":"Yihe Wang, Mohammad Mahdi Khalili, X. Zhang","doi":"10.1109/ICDMW58026.2022.00119","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00119","url":null,"abstract":"With graph-structured tremendous information, Knowledge Graphs (KG) aroused increasing interest in aca-demic research and industrial applications. Recent studies have shown demographic bias, in terms of sensitive attributes (e.g., gender and race), exist in the learned representations of KG entities. Such bias negatively affects specific popu-lations, especially minorities and underrepresented groups, and exacerbates machine learning-based human inequality. Adversariallearning is regarded as an effective way to alleviate bias in the representation learning model by simultaneously training a task-specific predictor and a sensitive attribute-specific discriminator. However, due to the unique challenge caused by topological structure and the comprehensive re-lationship between knowledge entities, adversarial learning-based debiasing is rarely studied in representation learning in knowledge graphs. In this paper, we propose a framework to learn unbiased representations for nodes and edges in knowledge graph mining. Specifically, we integrate a simple-but-effective normalization technique with Graph Neural Networks (GNNs) to constrain the weights updating process. Moreover, as a work-in-progress paper, we also find that the introduced weights normalization technique can mitigate the pitfalls of instability in adversarial debasing towards fair-and-stable machine learning. We evaluate the proposed framework on a benchmarking graph with multiple edge types and node types. The experimental results show that our model achieves comparable or better gender fairness over three competitive baselines on Equality of Odds. Importantly, our superiority in the fair model does not scarify the performance in the knowledge graph task (i.e., multi-class edge classification).","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126277893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base Completion ZeroKBC:零射击知识库完成的综合基准
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00117
Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen
Knowledge base completion (KBC) aims to predict the missing links in knowledge graphs. Previous KBC tasks and approaches mainly focus on the setting where all test entities and relations have appeared in the training set. However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base. In this work, we systematically examine different possible scenarios of zero-shot KBC and develop a comprehensive benchmark, ZeroKBC, that covers these scenarios with diverse types of knowledge sources. Our systematic analysis reveals several missing yet important zero-shot KBC settings. Experimental results show that canonical and state-of-the-art KBC systems cannot achieve satisfactory performance on this challenging benchmark. By analyzing the strength and weaknesses of these systems on solving ZeroKBC, we further present several important observations and promising future directions.11Work was done during the internship at Tencent AI lab. The data and code are available at: https://github.com/brickee/ZeroKBC
知识库补全(KBC)的目的是预测知识图中缺失的环节。以前的KBC任务和方法主要集中在所有测试实体和关系都出现在训练集中的设置上。然而,关于零射击KBC设置的研究有限,我们需要处理在不断增长的知识库中出现的看不见的实体和关系。在这项工作中,我们系统地研究了零射击KBC的不同可能场景,并开发了一个综合基准ZeroKBC,该基准涵盖了具有不同类型知识来源的这些场景。我们的系统分析揭示了几个缺失但重要的零射击KBC设置。实验结果表明,规范的和最先进的KBC系统不能在这个具有挑战性的基准上取得令人满意的性能。通过分析这些系统在解决ZeroKBC问题上的优缺点,我们进一步提出了一些重要的观察结果和有希望的未来方向。11工作是在腾讯AI实验室实习期间完成的。数据和代码可从https://github.com/brickee/ZeroKBC获得
{"title":"ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base Completion","authors":"Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen","doi":"10.1109/ICDMW58026.2022.00117","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00117","url":null,"abstract":"Knowledge base completion (KBC) aims to predict the missing links in knowledge graphs. Previous KBC tasks and approaches mainly focus on the setting where all test entities and relations have appeared in the training set. However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base. In this work, we systematically examine different possible scenarios of zero-shot KBC and develop a comprehensive benchmark, ZeroKBC, that covers these scenarios with diverse types of knowledge sources. Our systematic analysis reveals several missing yet important zero-shot KBC settings. Experimental results show that canonical and state-of-the-art KBC systems cannot achieve satisfactory performance on this challenging benchmark. By analyzing the strength and weaknesses of these systems on solving ZeroKBC, we further present several important observations and promising future directions.11Work was done during the internship at Tencent AI lab. The data and code are available at: https://github.com/brickee/ZeroKBC","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125352011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed LSTM-Learning from Differentially Private Label Proportions 基于不同自有标签比例的分布式lstm学习
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00139
Timon Sachweh, Daniel Boiar, T. Liebig
Data privacy and decentralised data collection has become more and more popular in recent years. In order to solve issues with privacy, communication bandwidth and learning from spatio-temporal data, we will propose two efficient models which use Differential Privacy and decentralized LSTM-Learning: One, in which a Long Short Term Memory (LSTM) model is learned for extracting local temporal node constraints and feeding them into a Dense-Layer (LabeIProportionToLocal). The other approach extends the first one by fetching histogram data from the neighbors and joining the information with the LSTM output (LabeIProportionToDense). For evaluation two popular datasets are used: Pems-Bay and METR-LA. Additionally, we provide an own dataset, which is based on LuST. The evaluation will show the tradeoff between performance and data privacy.
近年来,数据隐私和分散的数据收集越来越受欢迎。为了解决隐私、通信带宽和从时空数据中学习的问题,我们将提出两种使用差分隐私和分散LSTM学习的高效模型:一种是学习长短期记忆(LSTM)模型,用于提取局部时间节点约束并将其馈送到致密层(LabeIProportionToLocal)。另一种方法是对第一种方法的扩展,从邻居中获取直方图数据,并将这些信息与LSTM输出(LabeIProportionToDense)连接起来。为了进行评估,使用了两个流行的数据集:Pems-Bay和metro - la。此外,我们提供了自己的数据集,该数据集基于LuST。评估将显示性能和数据隐私之间的权衡。
{"title":"Distributed LSTM-Learning from Differentially Private Label Proportions","authors":"Timon Sachweh, Daniel Boiar, T. Liebig","doi":"10.1109/ICDMW58026.2022.00139","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00139","url":null,"abstract":"Data privacy and decentralised data collection has become more and more popular in recent years. In order to solve issues with privacy, communication bandwidth and learning from spatio-temporal data, we will propose two efficient models which use Differential Privacy and decentralized LSTM-Learning: One, in which a Long Short Term Memory (LSTM) model is learned for extracting local temporal node constraints and feeding them into a Dense-Layer (LabeIProportionToLocal). The other approach extends the first one by fetching histogram data from the neighbors and joining the information with the LSTM output (LabeIProportionToDense). For evaluation two popular datasets are used: Pems-Bay and METR-LA. Additionally, we provide an own dataset, which is based on LuST. The evaluation will show the tradeoff between performance and data privacy.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133036493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Sequential Utility Pattern Mining in Uncertain Environments 不确定环境下大规模顺序效用模式挖掘
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00077
J. Wu, Shuo Liu, Jerry Chun‐wei Lin
High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.
高效用序列模式挖掘(HUSPM)考虑时间戳、内部量化和外部效用因素来挖掘高效用序列模式(HUSP),在数据挖掘中占有重要地位。由于环境因素、设备限制、隐私问题等,数据收集在现实生活中可能存在不确定性。随着不确定数据量的迅速增加,传统挖掘算法的效率严重下降。当数据量较大时,传统的单机算法会产生更多的候选序列,占用大量内存,显著影响执行速度。本文设计了一种基于MapReduce的高效用概率序列模式挖掘算法。该算法利用MapReduce框架解决了数据量过大时单机运行的瓶颈问题。该算法采用有效的剪枝策略,可以有效地处理和减少生成的候选项集的数量,从而大大提高设计模型的性能。通过实验验证了所提算法的性能,并对所提算法的正确性和完整性进行了论证和讨论,显示了所设计模型的巨大成就。
{"title":"Large-Scale Sequential Utility Pattern Mining in Uncertain Environments","authors":"J. Wu, Shuo Liu, Jerry Chun‐wei Lin","doi":"10.1109/ICDMW58026.2022.00077","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00077","url":null,"abstract":"High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132924428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diagonally Colorized iVAT Images for Labeled Data 标记数据的对角线彩色iVAT图像
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00043
Elizabeth D. Hathaway, R. Hathaway
The iVAT (improved Visual Assessment of cluster Tendency) image is a useful tool for assessing possible cluster structure in an unlabeled, numerical data set. If labeled data are available then it is sometimes helpful to determine how closely the (unlabeled) data clusters agree with the data partitioning based on the labels. In this note the DCiVAT (Diagonally Colorized iVAT) image is introduced for the case of labeled data. It incorporates all available data and label information into a single colorized iVAT image so that it is possible to visually assess the degree to which data clusters are aligned with label categories. The new approach is illustrated with several examples.
iVAT(改进的聚类趋势视觉评估)图像是在未标记的数值数据集中评估可能的聚类结构的有用工具。如果有标记的数据可用,那么确定(未标记的)数据集群与基于标签的数据分区的一致程度有时是有帮助的。在本文中,针对标记数据的情况,介绍了DCiVAT(对角线彩色iVAT)图像。它将所有可用的数据和标签信息合并到单个彩色iVAT图像中,以便可以直观地评估数据簇与标签类别对齐的程度。用几个例子说明了这种新方法。
{"title":"Diagonally Colorized iVAT Images for Labeled Data","authors":"Elizabeth D. Hathaway, R. Hathaway","doi":"10.1109/ICDMW58026.2022.00043","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00043","url":null,"abstract":"The iVAT (improved Visual Assessment of cluster Tendency) image is a useful tool for assessing possible cluster structure in an unlabeled, numerical data set. If labeled data are available then it is sometimes helpful to determine how closely the (unlabeled) data clusters agree with the data partitioning based on the labels. In this note the DCiVAT (Diagonally Colorized iVAT) image is introduced for the case of labeled data. It incorporates all available data and label information into a single colorized iVAT image so that it is possible to visually assess the degree to which data clusters are aligned with label categories. The new approach is illustrated with several examples.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133342849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1