首页 > 最新文献

ACM Transactions on Knowledge Discovery from Data (TKDD)最新文献

英文 中文
Measuring the Network Vulnerability Based on Markov Criticality 基于马尔可夫临界的网络漏洞度量
Pub Date : 2021-07-21 DOI: 10.1145/3464390
Hui-jia Li, Lin Wang, Zhan Bu, Jie Cao, Yong Shi
Vulnerability assessment—a critical issue for networks—attempts to foresee unexpected destructive events or hostile attacks in the whole system. In this article, we consider a new Markov global connectivity metric—Kemeny constant, and take its derivative called Markov criticality to identify critical links. Markov criticality allows us to find links that are most influential on the derivative of Kemeny constant. Thus, we can utilize it to identity a critical link (i, j) from node i to node j, such that removing it leads to a minimization of networks’ global connectivity, i.e., the Kemeny constant. Furthermore, we also define a novel vulnerability index to measure the average speed by which we can disconnect a specified ratio of links with network decomposition. Our method is of high efficiency, which can be easily employed to calculate the Markov criticality in real-life networks. Comprehensive experiments on several synthetic and real-life networks have demonstrated our method’s better performance by comparing it with state-of-the-art baseline approaches.
脆弱性评估是网络的一个关键问题,它试图在整个系统中预测意外的破坏性事件或恶意攻击。在本文中,我们考虑了一个新的马尔可夫全局连通性度量- kemeny常数,并取其导数称为马尔可夫临界性来识别关键链接。马尔可夫临界性使我们能够找到对凯梅尼常数导数影响最大的联系。因此,我们可以利用它来识别从节点i到节点j的关键链接(i, j),这样,删除它会导致网络的全局连通性最小化,即Kemeny常数。此外,我们还定义了一个新的漏洞指数来衡量平均速度,我们可以通过网络分解断开指定比例的链路。该方法效率高,可方便地应用于实际网络中的马尔可夫临界性计算。通过将我们的方法与最先进的基线方法进行比较,在几个合成和现实网络上的综合实验证明了我们的方法具有更好的性能。
{"title":"Measuring the Network Vulnerability Based on Markov Criticality","authors":"Hui-jia Li, Lin Wang, Zhan Bu, Jie Cao, Yong Shi","doi":"10.1145/3464390","DOIUrl":"https://doi.org/10.1145/3464390","url":null,"abstract":"Vulnerability assessment—a critical issue for networks—attempts to foresee unexpected destructive events or hostile attacks in the whole system. In this article, we consider a new Markov global connectivity metric—Kemeny constant, and take its derivative called Markov criticality to identify critical links. Markov criticality allows us to find links that are most influential on the derivative of Kemeny constant. Thus, we can utilize it to identity a critical link (i, j) from node i to node j, such that removing it leads to a minimization of networks’ global connectivity, i.e., the Kemeny constant. Furthermore, we also define a novel vulnerability index to measure the average speed by which we can disconnect a specified ratio of links with network decomposition. Our method is of high efficiency, which can be easily employed to calculate the Markov criticality in real-life networks. Comprehensive experiments on several synthetic and real-life networks have demonstrated our method’s better performance by comparing it with state-of-the-art baseline approaches.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Unsupervised Subspace Extraction via Deep Kernelized Clustering 基于深度核聚类的无监督子空间提取
Pub Date : 2021-07-20 DOI: 10.1145/3459082
Gyoung S. Na, Hyunju Chang
Feature extraction has been widely studied to find informative latent features and reduce the dimensionality of data. In particular, due to the difficulty in obtaining labeled data, unsupervised feature extraction has received much attention in data mining. However, widely used unsupervised feature extraction methods require side information about data or rigid assumptions on the latent feature space. Furthermore, most feature extraction methods require predefined dimensionality of the latent feature space,which should be manually tuned as a hyperparameter. In this article, we propose a new unsupervised feature extraction method called Unsupervised Subspace Extractor (USE), which does not require any side information and rigid assumptions on data. Furthermore, USE can find a subspace generated by a nonlinear combination of the input feature and automatically determine the optimal dimensionality of the subspace for the given nonlinear combination. The feature extraction process of USE is well justified mathematically, and we also empirically demonstrate the effectiveness of USE for several benchmark datasets.
特征提取是一种发现信息潜在特征和降低数据维数的方法。特别是,由于难以获得标记数据,无监督特征提取在数据挖掘中受到越来越多的关注。然而,广泛使用的无监督特征提取方法需要数据的侧信息或对潜在特征空间的严格假设。此外,大多数特征提取方法需要潜在特征空间的预定义维数,这需要手动调优为超参数。在本文中,我们提出了一种新的无监督特征提取方法,称为无监督子空间提取器(USE),它不需要任何侧信息和对数据的刚性假设。此外,USE可以找到由输入特征的非线性组合生成的子空间,并自动确定给定非线性组合的子空间的最优维数。USE的特征提取过程在数学上得到了很好的证明,并在几个基准数据集上实证证明了USE的有效性。
{"title":"Unsupervised Subspace Extraction via Deep Kernelized Clustering","authors":"Gyoung S. Na, Hyunju Chang","doi":"10.1145/3459082","DOIUrl":"https://doi.org/10.1145/3459082","url":null,"abstract":"Feature extraction has been widely studied to find informative latent features and reduce the dimensionality of data. In particular, due to the difficulty in obtaining labeled data, unsupervised feature extraction has received much attention in data mining. However, widely used unsupervised feature extraction methods require side information about data or rigid assumptions on the latent feature space. Furthermore, most feature extraction methods require predefined dimensionality of the latent feature space,which should be manually tuned as a hyperparameter. In this article, we propose a new unsupervised feature extraction method called Unsupervised Subspace Extractor (USE), which does not require any side information and rigid assumptions on data. Furthermore, USE can find a subspace generated by a nonlinear combination of the input feature and automatically determine the optimal dimensionality of the subspace for the given nonlinear combination. The feature extraction process of USE is well justified mathematically, and we also empirically demonstrate the effectiveness of USE for several benchmark datasets.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127172586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble MULFE:通过标签特定特征空间集成的多标签学习
Pub Date : 2021-07-20 DOI: 10.1145/3451392
Yaojin Lin, Q. Hu, Jinghua Liu, Xingquan Zhu, Xindong Wu
In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, multi-label-specific feature space ensemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.
在多标签学习中,数据中通常存在标签相关性。这种相关性不仅提供了有用的信息,但也对多标签学习提出了重大挑战。最近,针对特定标签的特征嵌入被提出,从训练数据中挖掘特定标签的特征,并使用对多标签集高度定制的特征进行学习。虽然这些特征嵌入方法表现出了良好的性能,但特征嵌入空间的创建仅基于单个标签,而没有考虑数据中标签的相关性。在本文中,我们建议使用标签相关性来组合多个标签特定的特征空间,以进行多标签学习。提出的多标签特征空间集成(MULFE)算法综合考虑了标签特征、标签相关性和加权集成原理,形成了一个学习框架。通过对每个标签的负面和正面实例进行聚类分析,MULFE首先为每个标签创建定制的特征。然后,MULFE利用标签相关性来优化相关标签特定特征空间诱导的基分类器的余量分布。MULFE通过结合多个标签特定特征、基于标签相关性的加权和集成学习,通过底层优化框架实现最大余量的多标签分类目标。对10个公共数据集的实证研究表明了该方法的有效性。
{"title":"MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble","authors":"Yaojin Lin, Q. Hu, Jinghua Liu, Xingquan Zhu, Xindong Wu","doi":"10.1145/3451392","DOIUrl":"https://doi.org/10.1145/3451392","url":null,"abstract":"In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, multi-label-specific feature space ensemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114553996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Explainable Artificial Intelligence-Based Competitive Factor Identification 可解释的基于人工智能的竞争因素识别
Pub Date : 2021-07-20 DOI: 10.1145/3451529
Juhee Han, Younghoon Lee
Competitor analysis is an essential component of corporate strategy, providing both offensive and defensive strategic contexts to identify opportunities and threats. The rapid development of social media has recently led to several methodologies and frameworks facilitating competitor analysis through online reviews. Existing studies only focused on detecting comparative sentences in review comments or utilized low-performance models. However, this study proposes a novel approach to identifying the competitive factors using a recent explainable artificial intelligence approach at the comprehensive product feature level. We establish a model to classify the review comments for each corresponding product and evaluate the relevance of each keyword in such comments during the classification process. We then extract and prioritize the keywords and determine their competitiveness based on relevance. Our experiment results show that the proposed method can effectively extract the competitive factors both qualitatively and quantitatively.
竞争对手分析是企业战略的重要组成部分,提供进攻性和防御性战略背景,以识别机会和威胁。社交媒体的快速发展最近催生了一些通过在线评论来分析竞争对手的方法和框架。现有的研究只关注于对评语中比较句的检测,或者使用了低性能的模型。然而,本研究提出了一种新的方法来识别竞争因素,使用最近的可解释的人工智能方法在综合产品特征层面。我们建立了一个模型,对每个对应产品的评论进行分类,并在分类过程中评估这些评论中每个关键词的相关性。然后,我们提取并优先考虑关键字,并根据相关性确定其竞争力。实验结果表明,该方法可以有效地定性和定量地提取竞争因素。
{"title":"Explainable Artificial Intelligence-Based Competitive Factor Identification","authors":"Juhee Han, Younghoon Lee","doi":"10.1145/3451529","DOIUrl":"https://doi.org/10.1145/3451529","url":null,"abstract":"Competitor analysis is an essential component of corporate strategy, providing both offensive and defensive strategic contexts to identify opportunities and threats. The rapid development of social media has recently led to several methodologies and frameworks facilitating competitor analysis through online reviews. Existing studies only focused on detecting comparative sentences in review comments or utilized low-performance models. However, this study proposes a novel approach to identifying the competitive factors using a recent explainable artificial intelligence approach at the comprehensive product feature level. We establish a model to classify the review comments for each corresponding product and evaluate the relevance of each keyword in such comments during the classification process. We then extract and prioritize the keywords and determine their competitiveness based on relevance. Our experiment results show that the proposed method can effectively extract the competitive factors both qualitatively and quantitatively.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128821244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CrowdTC: Crowd-powered Learning for Text Classification CrowdTC:文本分类的群体动力学习
Pub Date : 2021-07-20 DOI: 10.1145/3457216
Keyu Yang, Yunjun Gao, Lei Liang, Song Bian, Lu Chen, Baihua Zheng
Text classification is a fundamental task in content analysis. Nowadays, deep learning has demonstrated promising performance in text classification compared with shallow models. However, almost all the existing models do not take advantage of the wisdom of human beings to help text classification. Human beings are more intelligent and capable than machine learning models in terms of understanding and capturing the implicit semantic information from text. In this article, we try to take guidance from human beings to classify text. We propose Crowd-powered learning for Text Classification (CrowdTC for short). We design and post the questions on a crowdsourcing platform to extract keywords in text. Sampling and clustering techniques are utilized to reduce the cost of crowdsourcing. Also, we present an attention-based neural network and a hybrid neural network to incorporate the extracted keywords as human guidance into deep neural networks. Extensive experiments on public datasets confirm that CrowdTC improves the text classification accuracy of neural networks by using the crowd-powered keyword guidance.
文本分类是内容分析的一项基本任务。目前,深度学习在文本分类方面与浅层模型相比表现出了良好的性能。然而,几乎所有现有的模型都没有利用人类的智慧来帮助文本分类。在理解和捕捉文本隐含的语义信息方面,人类比机器学习模型更聪明、更有能力。在本文中,我们试图以人为指导对文本进行分类。我们提出了用于文本分类的群体动力学习(简称CrowdTC)。我们设计问题并在众包平台上发布,以提取文本中的关键词。利用采样和聚类技术来降低众包的成本。此外,我们还提出了一种基于注意力的神经网络和一种混合神经网络,将提取的关键词作为人类的指导纳入深度神经网络。在公共数据集上的大量实验证实,CrowdTC通过使用群体驱动的关键词引导提高了神经网络的文本分类准确率。
{"title":"CrowdTC: Crowd-powered Learning for Text Classification","authors":"Keyu Yang, Yunjun Gao, Lei Liang, Song Bian, Lu Chen, Baihua Zheng","doi":"10.1145/3457216","DOIUrl":"https://doi.org/10.1145/3457216","url":null,"abstract":"Text classification is a fundamental task in content analysis. Nowadays, deep learning has demonstrated promising performance in text classification compared with shallow models. However, almost all the existing models do not take advantage of the wisdom of human beings to help text classification. Human beings are more intelligent and capable than machine learning models in terms of understanding and capturing the implicit semantic information from text. In this article, we try to take guidance from human beings to classify text. We propose Crowd-powered learning for Text Classification (CrowdTC for short). We design and post the questions on a crowdsourcing platform to extract keywords in text. Sampling and clustering techniques are utilized to reduce the cost of crowdsourcing. Also, we present an attention-based neural network and a hybrid neural network to incorporate the extracted keywords as human guidance into deep neural networks. Extensive experiments on public datasets confirm that CrowdTC improves the text classification accuracy of neural networks by using the crowd-powered keyword guidance.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133039005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
New Multi-View Classification Method with Uncertain Data 不确定数据下新的多视图分类方法
Pub Date : 2021-07-20 DOI: 10.1145/3458282
Bo Liu, Haowen Zhong, Yanshan Xiao
Multi-view classification aims at designing a multi-view learning strategy to train a classifier from multi-view data, which are easily collected in practice. Most of the existing works focus on multi-view classification by assuming the multi-view data are collected with precise information. However, we always collect the uncertain multi-view data due to the collection process is corrupted with noise in real-life application. In this case, this article proposes a novel approach, called uncertain multi-view learning with support vector machine (UMV-SVM) to cope with the problem of multi-view learning with uncertain data. The method first enforces the agreement among all the views to seek complementary information of multi-view data and takes the uncertainty of the multi-view data into consideration by modeling reachability area of the noise. Then it proposes an iterative framework to solve the proposed UMV-SVM model such that we can obtain the multi-view classifier for prediction. Extensive experiments on real-life datasets have shown that the proposed UMV-SVM can achieve a better performance for uncertain multi-view classification in comparison to the state-of-the-art multi-view classification methods.
多视图分类的目的是设计一种多视图学习策略,从实践中易于收集的多视图数据中训练分类器。现有的研究大多集中在多视图分类上,假设采集的多视图数据具有精确的信息。然而,在实际应用中,由于采集过程中受到噪声的干扰,我们经常采集不确定的多视图数据。在这种情况下,本文提出了一种新的方法——基于支持向量机的不确定多视图学习(UMV-SVM)来解决不确定数据下的多视图学习问题。该方法首先通过各视图之间的一致性来寻求多视图数据的互补信息,并通过建模噪声的可达区域来考虑多视图数据的不确定性。然后提出了一个迭代框架来求解所提出的UMV-SVM模型,从而得到用于预测的多视图分类器。在实际数据集上的大量实验表明,与目前最先进的多视图分类方法相比,所提出的UMV-SVM在不确定多视图分类中可以取得更好的性能。
{"title":"New Multi-View Classification Method with Uncertain Data","authors":"Bo Liu, Haowen Zhong, Yanshan Xiao","doi":"10.1145/3458282","DOIUrl":"https://doi.org/10.1145/3458282","url":null,"abstract":"Multi-view classification aims at designing a multi-view learning strategy to train a classifier from multi-view data, which are easily collected in practice. Most of the existing works focus on multi-view classification by assuming the multi-view data are collected with precise information. However, we always collect the uncertain multi-view data due to the collection process is corrupted with noise in real-life application. In this case, this article proposes a novel approach, called uncertain multi-view learning with support vector machine (UMV-SVM) to cope with the problem of multi-view learning with uncertain data. The method first enforces the agreement among all the views to seek complementary information of multi-view data and takes the uncertainty of the multi-view data into consideration by modeling reachability area of the noise. Then it proposes an iterative framework to solve the proposed UMV-SVM model such that we can obtain the multi-view classifier for prediction. Extensive experiments on real-life datasets have shown that the proposed UMV-SVM can achieve a better performance for uncertain multi-view classification in comparison to the state-of-the-art multi-view classification methods.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"8 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133170242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
BiLabel-Specific Features for Multi-Label Classification 用于多标签分类的双标签特定功能
Pub Date : 2021-07-20 DOI: 10.1145/3458283
Min-Ling Zhang, Jun-Peng Fang, Yi-Bo Wang
In multi-label classification, the task is to induce predictive models which can assign a set of relevant labels for the unseen instance. The strategy of label-specific features has been widely employed in learning from multi-label examples, where the classification model for predicting the relevancy of each class label is induced based on its tailored features rather than the original features. Existing approaches work by generating a group of tailored features for each class label independently, where label correlations are not fully considered in the label-specific features generation process. In this article, we extend existing strategy by proposing a simple yet effective approach based on BiLabel-specific features. Specifically, a group of tailored features is generated for a pair of class labels with heuristic prototype selection and embedding. Thereafter, predictions of classifiers induced by BiLabel-specific features are ensembled to determine the relevancy of each class label for unseen instance. To thoroughly evaluate the BiLabel-specific features strategy, extensive experiments are conducted over a total of 35 benchmark datasets. Comparative studies against state-of-the-art label-specific features techniques clearly validate the superiority of utilizing BiLabel-specific features to yield stronger generalization performance for multi-label classification.
在多标签分类中,任务是建立预测模型,该模型可以为未见实例分配一组相关标签。标签特定特征策略在多标签示例学习中得到了广泛的应用,其中预测每个类标签相关性的分类模型是根据其定制的特征而不是原始特征来诱导的。现有的方法是为每个类标签独立地生成一组定制的特征,在特定于标签的特征生成过程中没有充分考虑标签相关性。在本文中,我们通过提出一种基于bilabel特定特性的简单而有效的方法来扩展现有的策略。具体来说,通过启发式的原型选择和嵌入,为一对类标签生成一组定制的特征。然后,由双标签特定特征诱导的分类器的预测被集成,以确定每个未见实例的类标签的相关性。为了彻底评估特定于bilabel的特征策略,在总共35个基准数据集上进行了广泛的实验。与最先进的标签特定特征技术的比较研究清楚地验证了利用双标签特定特征在多标签分类中产生更强泛化性能的优越性。
{"title":"BiLabel-Specific Features for Multi-Label Classification","authors":"Min-Ling Zhang, Jun-Peng Fang, Yi-Bo Wang","doi":"10.1145/3458283","DOIUrl":"https://doi.org/10.1145/3458283","url":null,"abstract":"In multi-label classification, the task is to induce predictive models which can assign a set of relevant labels for the unseen instance. The strategy of label-specific features has been widely employed in learning from multi-label examples, where the classification model for predicting the relevancy of each class label is induced based on its tailored features rather than the original features. Existing approaches work by generating a group of tailored features for each class label independently, where label correlations are not fully considered in the label-specific features generation process. In this article, we extend existing strategy by proposing a simple yet effective approach based on BiLabel-specific features. Specifically, a group of tailored features is generated for a pair of class labels with heuristic prototype selection and embedding. Thereafter, predictions of classifiers induced by BiLabel-specific features are ensembled to determine the relevancy of each class label for unseen instance. To thoroughly evaluate the BiLabel-specific features strategy, extensive experiments are conducted over a total of 35 benchmark datasets. Comparative studies against state-of-the-art label-specific features techniques clearly validate the superiority of utilizing BiLabel-specific features to yield stronger generalization performance for multi-label classification.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122830027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Generic Multi-label Annotation via Adaptive Graph and Marginalized Augmentation 基于自适应图和边缘增强的通用多标签标注
Pub Date : 2021-07-20 DOI: 10.1145/3451884
Lichen Wang, Zhengming Ding, Y. Fu
Multi-label learning recovers multiple labels from a single instance. It is a more challenging task compared with single-label manner. Most multi-label learning approaches need large-scale well-labeled samples to achieve high accurate performance. However, it is expensive to build such a dataset. In this work, we propose a generic multi-label learning framework based on Adaptive Graph and Marginalized Augmentation (AGMA) in a semi-supervised scenario. Generally speaking, AGMA makes use of a small amount of labeled data associated with a lot of unlabeled data to boost the learning performance. First, an adaptive similarity graph is learned to effectively capture the intrinsic structure within the data. Second, marginalized augmentation strategy is explored to enhance the model generalization and robustness. Third, a feature-label autoencoder is further deployed to improve inferring efficiency. All the modules are jointly trained to benefit each other. State-of-the-art benchmarks in both traditional and zero-shot multi-label learning scenarios are evaluated. Experiments and ablation studies illustrate the accuracy and efficiency of our AGMA method.
多标签学习从单个实例中恢复多个标签。与单标签方式相比,这是一项更具挑战性的任务。大多数多标签学习方法需要大规模标记良好的样本才能获得高精度的性能。然而,构建这样的数据集是非常昂贵的。在这项工作中,我们提出了一种在半监督场景下基于自适应图和边缘增强(AGMA)的通用多标签学习框架。一般来说,AGMA使用少量标记数据与大量未标记数据相关联来提高学习性能。首先,学习自适应相似图,有效捕获数据的内在结构。其次,探索边缘增强策略,增强模型的泛化和鲁棒性。第三,进一步部署特征标签自编码器,提高推理效率。所有模块都是联合训练,相互受益。评估了传统和零射击多标签学习场景中最先进的基准。实验和烧蚀实验验证了该方法的准确性和有效性。
{"title":"Generic Multi-label Annotation via Adaptive Graph and Marginalized Augmentation","authors":"Lichen Wang, Zhengming Ding, Y. Fu","doi":"10.1145/3451884","DOIUrl":"https://doi.org/10.1145/3451884","url":null,"abstract":"Multi-label learning recovers multiple labels from a single instance. It is a more challenging task compared with single-label manner. Most multi-label learning approaches need large-scale well-labeled samples to achieve high accurate performance. However, it is expensive to build such a dataset. In this work, we propose a generic multi-label learning framework based on Adaptive Graph and Marginalized Augmentation (AGMA) in a semi-supervised scenario. Generally speaking, AGMA makes use of a small amount of labeled data associated with a lot of unlabeled data to boost the learning performance. First, an adaptive similarity graph is learned to effectively capture the intrinsic structure within the data. Second, marginalized augmentation strategy is explored to enhance the model generalization and robustness. Third, a feature-label autoencoder is further deployed to improve inferring efficiency. All the modules are jointly trained to benefit each other. State-of-the-art benchmarks in both traditional and zero-shot multi-label learning scenarios are evaluated. Experiments and ablation studies illustrate the accuracy and efficiency of our AGMA method.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128286436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
MSIPA: Multi-Scale Interval Pattern-Aware Network for ICU Transfer Prediction ICU转移预测的多尺度间隔模式感知网络
Pub Date : 2021-07-20 DOI: 10.1145/3458284
Wu Lee, Yuliang Shi, Hongfeng Sun, Lin Cheng, Kun Zhang, Xinjun Wang, Zhiyong Chen
Accurate prediction of patients’ ICU transfer events is of great significance for improving ICU treatment efficiency. ICU transition prediction task based on Electronic Health Records (EHR) is a temporal mining task like many other health informatics mining tasks. In the EHR-based temporal mining task, existing approaches are usually unable to mine and exploit patterns used to improve model performance. This article proposes a network based on Interval Pattern-Aware, Multi-Scale Interval Pattern-Aware (MSIPA) network. MSIPA mines different interval patterns in temporal EHR data according to the short, medium, and long intervals. MSIPA utilizes the Scaled Dot-Product Attention mechanism to query the contexts corresponding to the three scale patterns. Furthermore, Transformer will use all three types of contextual information simultaneously for ICU transfer prediction. Extensive experiments on real-world data demonstrate that an MSIPA network outperforms state-of-the-art methods.
准确预测患者ICU转移事件对提高ICU治疗效率具有重要意义。与许多其他健康信息挖掘任务一样,基于电子健康记录(EHR)的ICU转移预测任务是一种时间挖掘任务。在基于ehr的时间挖掘任务中,现有的方法通常无法挖掘和利用用于提高模型性能的模式。本文提出了一种基于间隔模式感知的多尺度间隔模式感知(MSIPA)网络。MSIPA根据短、中、长时间间隔挖掘时间型电子病历数据的不同间隔模式。MSIPA利用缩放点积注意机制来查询与这三种缩放模式相对应的上下文。此外,Transformer将同时使用所有三种类型的上下文信息进行ICU转移预测。对真实世界数据的大量实验表明,MSIPA网络优于最先进的方法。
{"title":"MSIPA: Multi-Scale Interval Pattern-Aware Network for ICU Transfer Prediction","authors":"Wu Lee, Yuliang Shi, Hongfeng Sun, Lin Cheng, Kun Zhang, Xinjun Wang, Zhiyong Chen","doi":"10.1145/3458284","DOIUrl":"https://doi.org/10.1145/3458284","url":null,"abstract":"Accurate prediction of patients’ ICU transfer events is of great significance for improving ICU treatment efficiency. ICU transition prediction task based on Electronic Health Records (EHR) is a temporal mining task like many other health informatics mining tasks. In the EHR-based temporal mining task, existing approaches are usually unable to mine and exploit patterns used to improve model performance. This article proposes a network based on Interval Pattern-Aware, Multi-Scale Interval Pattern-Aware (MSIPA) network. MSIPA mines different interval patterns in temporal EHR data according to the short, medium, and long intervals. MSIPA utilizes the Scaled Dot-Product Attention mechanism to query the contexts corresponding to the three scale patterns. Furthermore, Transformer will use all three types of contextual information simultaneously for ICU transfer prediction. Extensive experiments on real-world data demonstrate that an MSIPA network outperforms state-of-the-art methods.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133701375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bayesian Additive Matrix Approximation for Social Recommendation 社会推荐的贝叶斯加性矩阵逼近
Pub Date : 2021-07-03 DOI: 10.1145/3451391
Huafeng Liu, L. Jing, Jingxuan Wen, Pengyu Xu, Jian Yu, M. Ng
Social relations between users have been proven to be a good type of auxiliary information to improve the recommendation performance. However, it is a challenging issue to sufficiently exploit the social relations and correctly determine the user preference from both social and rating information. In this article, we propose a unified Bayesian Additive Matrix Approximation model (BAMA), which takes advantage of rating preference and social network to provide high-quality recommendation. The basic idea of BAMA is to extract social influence from social networks, integrate them to Bayesian additive co-clustering for effectively determining the user clusters and item clusters, and provide an accurate rating prediction. In addition, an efficient algorithm with collapsed Gibbs Sampling is designed to inference the proposed model. A series of experiments were conducted on six real-world social datasets. The results demonstrate the superiority of the proposed BAMA by comparing with the state-of-the-art methods from three views, all users, cold-start users, and users with few social relations. With the aid of social information, furthermore, BAMA has ability to provide the explainable recommendation.
用户之间的社会关系被证明是一种很好的辅助信息,可以提高推荐的性能。然而,如何充分利用社会关系,从社会信息和评价信息中正确判断用户偏好是一个具有挑战性的问题。在本文中,我们提出了一个统一的贝叶斯加性矩阵近似模型(BAMA),该模型利用评分偏好和社会网络来提供高质量的推荐。BAMA的基本思想是从社交网络中提取社会影响力,将其整合到贝叶斯加性共聚类中,有效地确定用户聚类和项目聚类,并提供准确的评分预测。此外,还设计了一种有效的Gibbs崩塌抽样算法来对所提出的模型进行推理。在六个真实社会数据集上进行了一系列实验。从全用户、冷启动用户和无社会关系用户三个角度与现有方法进行了比较,结果表明了该方法的优越性。此外,在社会信息的帮助下,BAMA具有提供可解释性推荐的能力。
{"title":"Bayesian Additive Matrix Approximation for Social Recommendation","authors":"Huafeng Liu, L. Jing, Jingxuan Wen, Pengyu Xu, Jian Yu, M. Ng","doi":"10.1145/3451391","DOIUrl":"https://doi.org/10.1145/3451391","url":null,"abstract":"Social relations between users have been proven to be a good type of auxiliary information to improve the recommendation performance. However, it is a challenging issue to sufficiently exploit the social relations and correctly determine the user preference from both social and rating information. In this article, we propose a unified Bayesian Additive Matrix Approximation model (BAMA), which takes advantage of rating preference and social network to provide high-quality recommendation. The basic idea of BAMA is to extract social influence from social networks, integrate them to Bayesian additive co-clustering for effectively determining the user clusters and item clusters, and provide an accurate rating prediction. In addition, an efficient algorithm with collapsed Gibbs Sampling is designed to inference the proposed model. A series of experiments were conducted on six real-world social datasets. The results demonstrate the superiority of the proposed BAMA by comparing with the state-of-the-art methods from three views, all users, cold-start users, and users with few social relations. With the aid of social information, furthermore, BAMA has ability to provide the explainable recommendation.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132525447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
ACM Transactions on Knowledge Discovery from Data (TKDD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1