首页 > 最新文献

2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
Discovering Unknown Labels for Multi-Label Image Classification 多标签图像分类中的未知标签发现
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00108
Jun Huang, Yu Yan, Xiao Zheng, Xiwen Qu, Xudong Hong
A multi-label learning (MLL) method can simul-taneously process the instances with multiple labels, and many well-known methods have been proposed to solve various MLL-related problems. The existing MLL methods are mainly applied under the assumption of a fixed label set, i.e., the class labels are all observed for the training data. However, in many real-world applications, there may be some unknown labels outside of this set, especially for large-scale and complex datasets. In this paper, a multi-label classification model based on deep learning is proposed to discover the unknown labels for multi-label image classification. It can simultaneously predict known and unknown labels for unseen images. Besides, an attention mechanism is introduced into the model, where the attention maps of unknown labels can be used to observe the corresponding objects of an image and to get the semantic information of these unknown labels.
多标签学习(multi-label learning, MLL)方法可以同时处理具有多个标签的实例,已经提出了许多著名的方法来解决各种与多标签学习相关的问题。现有的MLL方法主要是在固定标签集的假设下应用的,即对训练数据都观察到类标签。然而,在许多现实世界的应用程序中,可能会有一些未知的标签在这个集合之外,特别是对于大规模和复杂的数据集。本文提出了一种基于深度学习的多标签分类模型,用于多标签图像分类中未知标签的发现。它可以同时预测未知图像的已知和未知标签。此外,在模型中引入了注意机制,利用未知标签的注意图来观察图像中对应的对象,并获得这些未知标签的语义信息。
{"title":"Discovering Unknown Labels for Multi-Label Image Classification","authors":"Jun Huang, Yu Yan, Xiao Zheng, Xiwen Qu, Xudong Hong","doi":"10.1109/ICDMW58026.2022.00108","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00108","url":null,"abstract":"A multi-label learning (MLL) method can simul-taneously process the instances with multiple labels, and many well-known methods have been proposed to solve various MLL-related problems. The existing MLL methods are mainly applied under the assumption of a fixed label set, i.e., the class labels are all observed for the training data. However, in many real-world applications, there may be some unknown labels outside of this set, especially for large-scale and complex datasets. In this paper, a multi-label classification model based on deep learning is proposed to discover the unknown labels for multi-label image classification. It can simultaneously predict known and unknown labels for unseen images. Besides, an attention mechanism is introduced into the model, where the attention maps of unknown labels can be used to observe the corresponding objects of an image and to get the semantic information of these unknown labels.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126573765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling 基于两阶段建模的文本与调查数据组合特征提取与预测
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00064
A. A. Neloy, M. Turgeon
Deep learning (DL) based natural language processing (NLP) has recently grown as one the fastest research domain and retained remarkable improvement in many applications. Due to the significant amount of data, the adaptation of feature learning and symmetric data efficiency is a critical underlying task in such applications. However, their ability to extract features is limited due to a lack of proper model formation. Moreover, the use of these methods on smaller datasets is unexplored and underdeveloped compared to more popular research areas. This work introduces a two-stage modeling approach to combine classical statistical analysis with NLP problems in a real-world dataset. We effectively layout a combination of the classical statistical model incorporating a stacked ensemble classifier and a DL framework of convolutional neural network (CNN) and Bidirectional Recurrent Neural Networks (Bi-RNN) to structure a more decomposed architecture with lower computational complexity. Additionally, the experimental results illustrating 96.69 % training and 70.56 % testing accuracy and hypothesis testing from our DL models followed by an ablation study empirically demonstrate the validation of our proposed combined modeling technique.
基于深度学习(DL)的自然语言处理(NLP)是近年来发展最快的研究领域之一,在许多应用中都取得了显著的进步。由于数据量巨大,特征学习的适应和对称数据的效率是这类应用的关键底层任务。然而,由于缺乏适当的模型形成,它们提取特征的能力受到限制。此外,与更流行的研究领域相比,这些方法在较小数据集上的使用是未经探索和不发达的。这项工作介绍了一种两阶段建模方法,将经典统计分析与现实世界数据集中的NLP问题结合起来。我们有效地将经典统计模型与卷积神经网络(CNN)和双向递归神经网络(Bi-RNN)的堆叠集成分类器和深度学习框架组合在一起,以构建具有更低计算复杂度的更分解的体系结构。此外,实验结果表明,我们的深度学习模型的训练准确率为96.69%,测试准确率为70.56%,并且在消融研究之后进行了假设检验,从经验上证明了我们提出的组合建模技术的有效性。
{"title":"Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling","authors":"A. A. Neloy, M. Turgeon","doi":"10.1109/ICDMW58026.2022.00064","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00064","url":null,"abstract":"Deep learning (DL) based natural language processing (NLP) has recently grown as one the fastest research domain and retained remarkable improvement in many applications. Due to the significant amount of data, the adaptation of feature learning and symmetric data efficiency is a critical underlying task in such applications. However, their ability to extract features is limited due to a lack of proper model formation. Moreover, the use of these methods on smaller datasets is unexplored and underdeveloped compared to more popular research areas. This work introduces a two-stage modeling approach to combine classical statistical analysis with NLP problems in a real-world dataset. We effectively layout a combination of the classical statistical model incorporating a stacked ensemble classifier and a DL framework of convolutional neural network (CNN) and Bidirectional Recurrent Neural Networks (Bi-RNN) to structure a more decomposed architecture with lower computational complexity. Additionally, the experimental results illustrating 96.69 % training and 70.56 % testing accuracy and hypothesis testing from our DL models followed by an ablation study empirically demonstrate the validation of our proposed combined modeling technique.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116606562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning-based approach for mercury detection in marine waters 一种基于机器学习的海水汞检测方法
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00074
F. Piccialli, F. Giampaolo, Vincenzo Schiano Di Cola, Federico Gatta, Diletta Chiaro, E. Prezioso, Stefano Izzo, S. Cuomo
Thanks to the widespread use of mobile devices, analyses that in the past had to be carried out in specifically designated and equipped laboratories and which required long processing times, may now take place outdoor and in real time. In the marine science, for example, the development of a mobile and compact system for the on-site detection of heavy metals contamination in seawater would be helpful for scientists and society in at least two ways: i) reduction of time and costs associated with these experiments; ii) the implementation of a strategy for outdoor analysis, eventually embeddable in a lab-on-hardware system. This paper falls within the context of machine learning (ML) for utility pattern mining applied on interdisciplinary domains: starting from wellplates images, we provide a novel proof-of-concept (PoC) machine learning-based framework to assist scientists in their daily research on seawater samples, proposing a system which automatically recognise wells in a multiwell firstly and then predicts the degree of fluorescence in each of them, thus showing possible presence of heavy metals.
由于移动设备的广泛使用,过去必须在专门指定和装备齐全的实验室进行的分析,需要很长的处理时间,现在可以在室外实时进行。例如,在海洋科学中,开发一种可移动的紧凑型系统,用于现场检测海水中的重金属污染,至少在两个方面对科学家和社会有帮助:1)减少与这些实验相关的时间和成本;Ii)室外分析策略的实施,最终可嵌入到硬件实验室系统中。本文属于机器学习(ML)在跨学科领域应用的实用模式挖掘的背景下:从井板图像开始,我们提供了一个新的概念验证(PoC)基于机器学习的框架,以协助科学家对海水样本进行日常研究,提出了一个系统,该系统首先自动识别多井中的井,然后预测每个井的荧光程度,从而显示重金属的可能存在。
{"title":"A machine learning-based approach for mercury detection in marine waters","authors":"F. Piccialli, F. Giampaolo, Vincenzo Schiano Di Cola, Federico Gatta, Diletta Chiaro, E. Prezioso, Stefano Izzo, S. Cuomo","doi":"10.1109/ICDMW58026.2022.00074","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00074","url":null,"abstract":"Thanks to the widespread use of mobile devices, analyses that in the past had to be carried out in specifically designated and equipped laboratories and which required long processing times, may now take place outdoor and in real time. In the marine science, for example, the development of a mobile and compact system for the on-site detection of heavy metals contamination in seawater would be helpful for scientists and society in at least two ways: i) reduction of time and costs associated with these experiments; ii) the implementation of a strategy for outdoor analysis, eventually embeddable in a lab-on-hardware system. This paper falls within the context of machine learning (ML) for utility pattern mining applied on interdisciplinary domains: starting from wellplates images, we provide a novel proof-of-concept (PoC) machine learning-based framework to assist scientists in their daily research on seawater samples, proposing a system which automatically recognise wells in a multiwell firstly and then predicts the degree of fluorescence in each of them, thus showing possible presence of heavy metals.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HeteroGuard: Defending Heterogeneous Graph Neural Networks against Adversarial Attacks 异构保护:防御异构图神经网络对抗攻击
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00096
Udesh Kumarasinghe, Mohamed Nabeel, K. de Zoysa, K. Gunawardana, Charitha Elvitigala
Graph neural networks (GNNs) have achieved re-markable success in many application domains including drug discovery, program analysis, social networks, and cyber security. However, it has been shown that they are not robust against adversarial attacks. In the recent past, many adversarial attacks against homogeneous GNNs and defenses have been proposed. However, most of these attacks and defenses are ineffective on heterogeneous graphs as these algorithms optimize under the assumption that all edge and node types are of the same and further they introduce semantically incorrect edges to perturbed graphs. Here, we first develop, HetePR-BCD, a training time (i.e. poisoning) adversarial attack on heterogeneous graphs that outperforms the start of the art attacks proposed in the literature. Our experimental results on three benchmark heterogeneous graphs show that our attack, with a small perturbation budget of 15 %, degrades the performance up to 32 % (Fl score) compared to existing ones. It is concerning to mention that existing defenses are not robust against our attack. These defenses primarily modify the GNN's neural message passing operators assuming that adversarial attacks tend to connect nodes with dissimilar features, but this assumption does not hold in heterogeneous graphs. We construct HeteroGuard, an effective defense against training time attacks including HetePR-BCD on heterogeneous models. HeteroGuard outperforms the existing defenses by 3–8 % on Fl score depending on the benchmark dataset.
图神经网络(gnn)在药物发现、程序分析、社交网络和网络安全等许多应用领域取得了显著的成功。然而,已经证明它们对对抗性攻击并不健壮。在最近的过去,已经提出了许多针对同构gnn的对抗性攻击和防御。然而,这些攻击和防御在异构图上大多是无效的,因为这些算法是在假设所有的边和节点类型都是相同的前提下进行优化的,而且它们还向扰动图引入了语义上不正确的边。在这里,我们首先开发了HetePR-BCD,这是一种针对异构图的训练时间(即中毒)对抗性攻击,优于文献中提出的艺术攻击的开始。我们在三个基准异构图上的实验结果表明,我们的攻击,在15%的小扰动预算下,与现有的攻击相比,性能下降了32% (Fl分数)。值得一提的是,现有的防御工事经不起我们的进攻。这些防御主要修改了GNN的神经信息传递算子,假设对抗性攻击倾向于连接具有不同特征的节点,但这种假设在异构图中并不成立。在异构模型上构建了一种有效防御包括HetePR-BCD在内的训练时间攻击的异构防御机制。根据基准数据集,HeteroGuard在Fl得分上比现有防御高出3 - 8%。
{"title":"HeteroGuard: Defending Heterogeneous Graph Neural Networks against Adversarial Attacks","authors":"Udesh Kumarasinghe, Mohamed Nabeel, K. de Zoysa, K. Gunawardana, Charitha Elvitigala","doi":"10.1109/ICDMW58026.2022.00096","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00096","url":null,"abstract":"Graph neural networks (GNNs) have achieved re-markable success in many application domains including drug discovery, program analysis, social networks, and cyber security. However, it has been shown that they are not robust against adversarial attacks. In the recent past, many adversarial attacks against homogeneous GNNs and defenses have been proposed. However, most of these attacks and defenses are ineffective on heterogeneous graphs as these algorithms optimize under the assumption that all edge and node types are of the same and further they introduce semantically incorrect edges to perturbed graphs. Here, we first develop, HetePR-BCD, a training time (i.e. poisoning) adversarial attack on heterogeneous graphs that outperforms the start of the art attacks proposed in the literature. Our experimental results on three benchmark heterogeneous graphs show that our attack, with a small perturbation budget of 15 %, degrades the performance up to 32 % (Fl score) compared to existing ones. It is concerning to mention that existing defenses are not robust against our attack. These defenses primarily modify the GNN's neural message passing operators assuming that adversarial attacks tend to connect nodes with dissimilar features, but this assumption does not hold in heterogeneous graphs. We construct HeteroGuard, an effective defense against training time attacks including HetePR-BCD on heterogeneous models. HeteroGuard outperforms the existing defenses by 3–8 % on Fl score depending on the benchmark dataset.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133034498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Degree-Related Bias in Link Prediction 链接预测中的度相关偏差
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00103
Yu Wang, Tyler Derr
Link prediction is a fundamental problem for network-structured data and has achieved unprecedented success in many real-world applications. Despite the significant progress being made towards improving its performance by characterizing underlined topological patterns or leveraging representation learning, few works have focused on the imbalanced performance among nodes of different degrees. In this paper, we propose a novel problem, degree-related bias and evaluation bias, on link prediction with an emphasis on recommender system applications. We first empirically demonstrate the performance differ-ence among nodes with different degrees and then theoretically prove that Recall is an unbiased evaluation metric compared with Fl, NDCG and Precision. Furthermore, we show that under the unbiased evaluation metric Recall, low-degree nodes tend to have higher performance than high-degree nodes in link prediction.
链路预测是网络结构化数据的一个基本问题,在许多实际应用中取得了前所未有的成功。尽管通过表征下划线拓扑模式或利用表示学习在提高其性能方面取得了重大进展,但很少有作品关注不同程度节点之间的不平衡性能。在本文中,我们提出了一个关于链接预测的新问题,学位相关偏差和评价偏差,并重点讨论了推荐系统的应用。我们首先通过实证证明了不同程度节点之间的性能差异,然后从理论上证明了召回率与Fl、NDCG和Precision相比是一个无偏的评价指标。此外,我们发现在无偏评价指标Recall下,低度节点在链路预测方面往往比高度节点具有更高的性能。
{"title":"Degree-Related Bias in Link Prediction","authors":"Yu Wang, Tyler Derr","doi":"10.1109/ICDMW58026.2022.00103","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00103","url":null,"abstract":"Link prediction is a fundamental problem for network-structured data and has achieved unprecedented success in many real-world applications. Despite the significant progress being made towards improving its performance by characterizing underlined topological patterns or leveraging representation learning, few works have focused on the imbalanced performance among nodes of different degrees. In this paper, we propose a novel problem, degree-related bias and evaluation bias, on link prediction with an emphasis on recommender system applications. We first empirically demonstrate the performance differ-ence among nodes with different degrees and then theoretically prove that Recall is an unbiased evaluation metric compared with Fl, NDCG and Precision. Furthermore, we show that under the unbiased evaluation metric Recall, low-degree nodes tend to have higher performance than high-degree nodes in link prediction.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133810451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Post-pandemic Economic Transformations in the United States of America 大流行后美利坚合众国的经济转型
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00153
Avi Chawla, Nidhi Mulay, M. Bahrami, Vikas Bishnoi, Yatin Katyal, Esteban Moro Egido, Ankur Saraswat, A. Pentland
The COVID-19 pandemic has impacted economic activity not only in the United States, but across the globe. Lockdown and travel restrictions imposed by local authorities have led to change in customer preferences and thus transformation of economic activity from traditional areas to new regions. While most changes have been temporary and short term, some of them have been observed to be of permanent nature. Using large-scale aggregated and anonymized transaction data across various socio-economic groups, we analyse and discuss such temporary relocation of citizens' economic activities in metropolitan areas of 15 states in the US. The results of this study have extensive implications for urban planners and business owners, and can provide insights into the temporary relocation of economic activities resulting from an extreme exogenous shock like the COVID-19 pandemic.
COVID-19大流行不仅影响了美国的经济活动,也影响了全球的经济活动。地方当局实施的封锁和旅行限制导致客户偏好发生变化,从而使经济活动从传统地区转向新的地区。虽然大多数变化是暂时和短期的,但观察到其中一些变化具有永久性。本文利用大规模汇总和匿名化的不同社会经济群体的交易数据,分析和讨论了美国15个州大都市地区公民经济活动的这种临时搬迁。这项研究的结果对城市规划者和企业主具有广泛的影响,并可以为了解因COVID-19大流行等极端外生冲击而导致的经济活动临时搬迁提供见解。
{"title":"Post-pandemic Economic Transformations in the United States of America","authors":"Avi Chawla, Nidhi Mulay, M. Bahrami, Vikas Bishnoi, Yatin Katyal, Esteban Moro Egido, Ankur Saraswat, A. Pentland","doi":"10.1109/ICDMW58026.2022.00153","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00153","url":null,"abstract":"The COVID-19 pandemic has impacted economic activity not only in the United States, but across the globe. Lockdown and travel restrictions imposed by local authorities have led to change in customer preferences and thus transformation of economic activity from traditional areas to new regions. While most changes have been temporary and short term, some of them have been observed to be of permanent nature. Using large-scale aggregated and anonymized transaction data across various socio-economic groups, we analyse and discuss such temporary relocation of citizens' economic activities in metropolitan areas of 15 states in the US. The results of this study have extensive implications for urban planners and business owners, and can provide insights into the temporary relocation of economic activities resulting from an extreme exogenous shock like the COVID-19 pandemic.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134496827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Removal of Population Bias in Genomics Phenotype Prediction 基因组学表型预测中种群偏差的对抗性去除
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00052
Honggang Zhao, Wenlu Wang
Many factors impact trait prediction from genotype data. One of the major confounding factors comes from the presence of population structure among sampled individuals, namely population stratification. When exists, it will lead to biased quantitative phenotype prediction, therefore hampering the unambiguous conclusions about prediction and limiting the downstream usage like disease evaluation or epidemiology survey. Population stratification is an implicit bias that can not be easily removed by data preprocessing. With the purpose of training a phenotype prediction model, we propose an adversarial training framework that ensures the genomics encoder is agnostic to sample populations. For better generalization, our adversarial training framework is orthogonal to the genomics encoder and phenotype prediction model. We experimentally ascertain our debiasing framework by testing on a real-world yield (phenotype) prediction dataset with soybean genomics. The developed frame-work is designed for general genomic data (e.g., human, livestock, and crops) while the phenotype can be either continuous or categorical variables.
许多因素影响着基因型数据的性状预测。其中一个主要的混杂因素来自于样本个体中存在的种群结构,即种群分层。当存在时,会导致定量表型预测的偏差,从而影响预测的明确结论,限制下游的使用,如疾病评估或流行病学调查。人口分层是一种不容易通过数据预处理消除的隐性偏差。为了训练表型预测模型,我们提出了一种对抗性训练框架,以确保基因组编码器对样本群体不可知。为了更好地泛化,我们的对抗性训练框架与基因组编码器和表型预测模型正交。我们通过大豆基因组学在实际产量(表型)预测数据集上进行测试,实验确定了我们的去偏框架。开发的框架是为一般基因组数据(例如,人类,牲畜和作物)设计的,而表型可以是连续变量或分类变量。
{"title":"Adversarial Removal of Population Bias in Genomics Phenotype Prediction","authors":"Honggang Zhao, Wenlu Wang","doi":"10.1109/ICDMW58026.2022.00052","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00052","url":null,"abstract":"Many factors impact trait prediction from genotype data. One of the major confounding factors comes from the presence of population structure among sampled individuals, namely population stratification. When exists, it will lead to biased quantitative phenotype prediction, therefore hampering the unambiguous conclusions about prediction and limiting the downstream usage like disease evaluation or epidemiology survey. Population stratification is an implicit bias that can not be easily removed by data preprocessing. With the purpose of training a phenotype prediction model, we propose an adversarial training framework that ensures the genomics encoder is agnostic to sample populations. For better generalization, our adversarial training framework is orthogonal to the genomics encoder and phenotype prediction model. We experimentally ascertain our debiasing framework by testing on a real-world yield (phenotype) prediction dataset with soybean genomics. The developed frame-work is designed for general genomic data (e.g., human, livestock, and crops) while the phenotype can be either continuous or categorical variables.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133327903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Cross-Order Patterns and Link Prediction in Higher-Order Networks 利用高阶网络中的交叉阶模式和链接预测
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00156
Hao Tian, Shengmin Jin, R. Zafarani
With the demand to model the relationships among three or more entities, higher-order networks are now more widespread across various domains. Relationships such as multiauthor collaborations, co-appearance of keywords, and copurchases can be naturally modeled as higher-order networks. However, due to (1) computational complexity and (2) insufficient higher-order data, exploring higher-order networks is often limited to order-3 motifs (or triangles). To address these problems, we explore and quantify similarites among various network orders. Our goal is to build relationships between different network orders and to solve higher-order problems using lower-order information. Similarities between different orders are not comparable directly. Hence, we introduce a set of general cross-order similarities, and a measure: subedge rate. Our experiments on multiple real-world datasets demonstrate that most higher-order networks have considerable consistency as we move from higher-orders to lower-orders. Utilizing this discovery, we develop a new cross-order framework for higher-order link prediction method. These methods can predict higher-order links from lower-order edges, which cannot be attained by current higher-order methods that rely on data from a single order.
由于需要对三个或更多实体之间的关系进行建模,高阶网络现在在各个领域得到了更广泛的应用。多作者合作、关键词共同出现和共同购买等关系可以自然地建模为高阶网络。然而,由于(1)计算复杂性和(2)高阶数据不足,探索高阶网络通常仅限于3阶基元(或三角形)。为了解决这些问题,我们探索并量化了各种网络订单之间的相似性。我们的目标是建立不同网络阶数之间的关系,并使用低阶信息解决高阶问题。不同阶之间的相似性不能直接比较。因此,我们引入了一组一般的交叉阶相似度,以及一个度量:次级套期保值率。我们在多个真实世界数据集上的实验表明,当我们从高阶到低阶移动时,大多数高阶网络具有相当大的一致性。利用这一发现,我们开发了一种新的高阶链路预测方法的跨阶框架。这些方法可以从低阶边预测高阶链接,这是当前依赖于单阶数据的高阶方法无法实现的。
{"title":"Exploiting Cross-Order Patterns and Link Prediction in Higher-Order Networks","authors":"Hao Tian, Shengmin Jin, R. Zafarani","doi":"10.1109/ICDMW58026.2022.00156","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00156","url":null,"abstract":"With the demand to model the relationships among three or more entities, higher-order networks are now more widespread across various domains. Relationships such as multiauthor collaborations, co-appearance of keywords, and copurchases can be naturally modeled as higher-order networks. However, due to (1) computational complexity and (2) insufficient higher-order data, exploring higher-order networks is often limited to order-3 motifs (or triangles). To address these problems, we explore and quantify similarites among various network orders. Our goal is to build relationships between different network orders and to solve higher-order problems using lower-order information. Similarities between different orders are not comparable directly. Hence, we introduce a set of general cross-order similarities, and a measure: subedge rate. Our experiments on multiple real-world datasets demonstrate that most higher-order networks have considerable consistency as we move from higher-orders to lower-orders. Utilizing this discovery, we develop a new cross-order framework for higher-order link prediction method. These methods can predict higher-order links from lower-order edges, which cannot be attained by current higher-order methods that rely on data from a single order.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132817409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems 平等混淆公平:测量自动决策系统中基于群体的差异
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00027
Furkan Gursoy, I. Kakadiaris
As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.
随着人工智能在影响人类和社会的决策中发挥越来越重要的作用,自动化决策系统的问责制越来越受到研究人员和实践者的关注。公平涉及消除对个人或敏感群体的不公正待遇和歧视,是问责制的一个关键方面。然而,为了评估公平,文献中有大量的公平指标,这些指标采用了不同的观点和假设,这些观点和假设往往是不相容的。这项工作的重点是群体公平。大多数组公平度量要求从属于不同敏感组的混淆矩阵计算的选定统计数据之间的奇偶性。在此基础上,本文提出了一种新的相等混淆公平性检验方法来检验自动决策系统的公平性,并提出了一种新的混淆奇偶校验误差来量化任何不公平的程度。为了进一步分析潜在不公平的来源,还提出了一种适当的事后分析方法。测试、度量和事后分析的有用性通过对有争议的COMPAS案例的案例研究得到了证明。COMPAS是美国使用的一种自动决策系统,用于帮助法官评估再犯风险。总的来说,这里提供的方法和指标可以作为更广泛的问责评估的一部分来评估自动决策系统的公平性,比如那些基于系统问责基准的评估。
{"title":"Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems","authors":"Furkan Gursoy, I. Kakadiaris","doi":"10.1109/ICDMW58026.2022.00027","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00027","url":null,"abstract":"As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116381773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AWS-EP: A Multi-Task Prediction Approach for MBTI/Big5 Personality Tests AWS-EP: MBTI/大五人格测试的多任务预测方法
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00049
Fahed Elourajini, Esma Aïmeur
Personality and preferences are essential variables in computational sociology and social science. They describe differences between people at both individual and group levels. In recent years, automated approaches that detect personality traits have received much attention due to the massive availability of individuals' digital footprints. Furthermore, researchers have demonstrated a strong link between personality traits and various downstream tasks such as personalized filtering, profile categorization, and profile embedding. Therefore, the detection of individuals' preferences has become a critical process for improving the performance of different tasks. In this paper, we build on the importance of the individual's behaviour and propose a novel multitask modeling approach that understands and models the users' personalities based on their textual posts and comments within a multimedia framework. The novelties of our work compared to state-of-the-art personality prediction models are: improving the performance of the Big five-factor model (Big5) personality test using shared information from the Myers Briggs Type Indicator (MBTI) test, and proposing a one personality detection framework that accurately predicts both MBTI and Big5 tests simultaneously. Predicting both tests simultaneously improves the personality detection framework's flexibility to be used for different goals instead of being used only for a unique purpose (whether for the MBTI test or for the Big5 test separately). Experiments and results demonstrate that our solution outperforms state-of-the-art models across multiple famous personality datasets.
个性和偏好是计算社会学和社会科学的基本变量。它们描述了个人和群体层面上人与人之间的差异。近年来,由于个人数字足迹的大量可用性,检测个性特征的自动化方法受到了广泛关注。此外,研究人员还证明了人格特质与个性化过滤、档案分类和档案嵌入等下游任务之间的密切联系。因此,个体偏好的检测已成为提高不同任务绩效的关键过程。在本文中,我们以个人行为的重要性为基础,提出了一种新的多任务建模方法,该方法基于用户在多媒体框架内的文本帖子和评论来理解和建模用户的个性。与最先进的人格预测模型相比,我们的工作的新颖之处在于:利用迈尔斯布里格斯类型指标(MBTI)测试的共享信息改进了大五因素模型(Big5)人格测试的性能,并提出了一个同时准确预测MBTI和Big5测试的单一人格检测框架。同时预测这两个测试提高了人格检测框架的灵活性,可以用于不同的目标,而不是只用于一个单一的目的(无论是MBTI测试还是Big5测试)。实验和结果表明,我们的解决方案在多个著名的个性数据集上优于最先进的模型。
{"title":"AWS-EP: A Multi-Task Prediction Approach for MBTI/Big5 Personality Tests","authors":"Fahed Elourajini, Esma Aïmeur","doi":"10.1109/ICDMW58026.2022.00049","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00049","url":null,"abstract":"Personality and preferences are essential variables in computational sociology and social science. They describe differences between people at both individual and group levels. In recent years, automated approaches that detect personality traits have received much attention due to the massive availability of individuals' digital footprints. Furthermore, researchers have demonstrated a strong link between personality traits and various downstream tasks such as personalized filtering, profile categorization, and profile embedding. Therefore, the detection of individuals' preferences has become a critical process for improving the performance of different tasks. In this paper, we build on the importance of the individual's behaviour and propose a novel multitask modeling approach that understands and models the users' personalities based on their textual posts and comments within a multimedia framework. The novelties of our work compared to state-of-the-art personality prediction models are: improving the performance of the Big five-factor model (Big5) personality test using shared information from the Myers Briggs Type Indicator (MBTI) test, and proposing a one personality detection framework that accurately predicts both MBTI and Big5 tests simultaneously. Predicting both tests simultaneously improves the personality detection framework's flexibility to be used for different goals instead of being used only for a unique purpose (whether for the MBTI test or for the Big5 test separately). Experiments and results demonstrate that our solution outperforms state-of-the-art models across multiple famous personality datasets.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116998730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1