Regularization that incorporates the linear combination of empirical loss and explicit regularization terms as the loss function has been frequently used for many machine learning tasks. The explicit regularization term is designed in different types, depending on its applications. While regularized learning often boost the performance with higher accuracy and faster convergence, the regularization would sometimes hurt the empirical loss minimization and lead to poor performance. To deal with such issues in this work, we propose a novel strategy, namely Gradients Orthogonal Decomposition (GrOD), that improves the training procedure of regularized deep learning. Instead of linearly combining gradients of the two terms, GrOD re-estimates a new direction for iteration that does not hurt the empirical loss minimization while preserving the regularization affects, through orthogonal decomposition. We have performed extensive experiments to use GrOD improving the commonly used algorithms of transfer learning [2], knowledge distillation [3], and adversarial learning [4]. The experiment results based on large datasets, including Caltech 256 [5], MIT indoor 67 [6], CIFAR-10 [7], and ImageNet [8], show significant improvement made by GrOD for all three algorithms in all cases.
{"title":"GrOD: Deep Learning with Gradients Orthogonal Decomposition for Knowledge Transfer, Distillation, and Adversarial Training","authors":"Haoyi Xiong, Ruosi Wan, Jian Zhao, Zeyu Chen, Xingjian Li, Zhanxing Zhu, Jun Huan","doi":"10.1145/3530836","DOIUrl":"https://doi.org/10.1145/3530836","url":null,"abstract":"Regularization that incorporates the linear combination of empirical loss and explicit regularization terms as the loss function has been frequently used for many machine learning tasks. The explicit regularization term is designed in different types, depending on its applications. While regularized learning often boost the performance with higher accuracy and faster convergence, the regularization would sometimes hurt the empirical loss minimization and lead to poor performance. To deal with such issues in this work, we propose a novel strategy, namely Gradients Orthogonal Decomposition (GrOD), that improves the training procedure of regularized deep learning. Instead of linearly combining gradients of the two terms, GrOD re-estimates a new direction for iteration that does not hurt the empirical loss minimization while preserving the regularization affects, through orthogonal decomposition. We have performed extensive experiments to use GrOD improving the commonly used algorithms of transfer learning [2], knowledge distillation [3], and adversarial learning [4]. The experiment results based on large datasets, including Caltech 256 [5], MIT indoor 67 [6], CIFAR-10 [7], and ImageNet [8], show significant improvement made by GrOD for all three algorithms in all cases.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121319353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilkay Yildiz, Jennifer G. Dy, D. Erdoğmuş, S. Ostmo, J. Campbell, Michael F. Chiang, Stratis Ioannidis
We study the problem of ranking regression, in which a dataset of rankings is used to learn Plackett–Luce scores as functions of sample features. We propose a novel spectral algorithm to accelerate learning in ranking regression. Our main technical contribution is to show that the Plackett–Luce negative log-likelihood augmented with a proximal penalty has stationary points that satisfy the balance equations of a Markov Chain. This allows us to tackle the ranking regression problem via an efficient spectral algorithm by using the Alternating Directions Method of Multipliers (ADMM). ADMM separates the learning of scores and model parameters, and in turn, enables us to devise fast spectral algorithms for ranking regression via both shallow and deep neural network (DNN) models. For shallow models, our algorithms are up to 579 times faster than the Newton’s method. For DNN models, we extend the standard ADMM via a Kullback–Leibler proximal penalty and show that this is still amenable to fast inference via a spectral approach. Compared to a state-of-the-art siamese network, our resulting algorithms are up to 175 times faster and attain better predictions by up to 26% Top-1 Accuracy and 6% Kendall-Tau correlation over five real-life ranking datasets.
{"title":"Spectral Ranking Regression","authors":"Ilkay Yildiz, Jennifer G. Dy, D. Erdoğmuş, S. Ostmo, J. Campbell, Michael F. Chiang, Stratis Ioannidis","doi":"10.1145/3530693","DOIUrl":"https://doi.org/10.1145/3530693","url":null,"abstract":"We study the problem of ranking regression, in which a dataset of rankings is used to learn Plackett–Luce scores as functions of sample features. We propose a novel spectral algorithm to accelerate learning in ranking regression. Our main technical contribution is to show that the Plackett–Luce negative log-likelihood augmented with a proximal penalty has stationary points that satisfy the balance equations of a Markov Chain. This allows us to tackle the ranking regression problem via an efficient spectral algorithm by using the Alternating Directions Method of Multipliers (ADMM). ADMM separates the learning of scores and model parameters, and in turn, enables us to devise fast spectral algorithms for ranking regression via both shallow and deep neural network (DNN) models. For shallow models, our algorithms are up to 579 times faster than the Newton’s method. For DNN models, we extend the standard ADMM via a Kullback–Leibler proximal penalty and show that this is still amenable to fast inference via a spectral approach. Compared to a state-of-the-art siamese network, our resulting algorithms are up to 175 times faster and attain better predictions by up to 26% Top-1 Accuracy and 6% Kendall-Tau correlation over five real-life ranking datasets.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126362490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team’s competition performance, and investigate the team’s evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams’ future performances. By leveraging team’s rank as a proxy, we observe “the stronger, the stronger” rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team’s future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team’s future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.
{"title":"Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond","authors":"Hao Liu, Qingyu Guo, Hengshu Zhu, Fuzhen Zhuang, Shen Yang, D. Dou, Hui Xiong","doi":"10.1145/3511896","DOIUrl":"https://doi.org/10.1145/3511896","url":null,"abstract":"Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team’s competition performance, and investigate the team’s evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams’ future performances. By leveraging team’s rank as a proxy, we observe “the stronger, the stronger” rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team’s future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team’s future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129247617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengdi Huai, T. Zheng, Chenglin Miao, Liuyi Yao, Aidong Zhang
Metric learning aims at automatically learning a distance metric from data so that the precise similarity between data instances can be faithfully reflected, and its importance has long been recognized in many fields. An implicit assumption in existing metric learning works is that the learned models are performed in a reliable and secure environment. However, the increasingly critical role of metric learning makes it susceptible to a risk of being malicious attacked. To well understand the performance of metric learning models in adversarial environments, in this article, we study the robustness of metric learning to adversarial perturbations, which are also known as the imperceptible changes to the input data that are crafted by an attacker to fool a well-learned model. However, different from traditional classification models, metric learning models take instance pairs rather than individual instances as input, and the perturbation on one instance may not necessarily affect the prediction result for an instance pair, which makes it more difficult to study the robustness of metric learning. To address this challenge, in this article, we first provide a definition of pairwise robustness for metric learning, and then propose a novel projected gradient descent-based attack method (called AckMetric) to evaluate the robustness of metric learning models. To further explore the capability of the attacker to change the prediction results, we also propose a theoretical framework to derive the upper bound of the pairwise adversarial loss. Finally, we incorporate the derived bound into the training process of metric learning and design a novel defense method to make the learned models more robust. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed methods.
{"title":"On the Robustness of Metric Learning: An Adversarial Perspective","authors":"Mengdi Huai, T. Zheng, Chenglin Miao, Liuyi Yao, Aidong Zhang","doi":"10.1145/3502726","DOIUrl":"https://doi.org/10.1145/3502726","url":null,"abstract":"Metric learning aims at automatically learning a distance metric from data so that the precise similarity between data instances can be faithfully reflected, and its importance has long been recognized in many fields. An implicit assumption in existing metric learning works is that the learned models are performed in a reliable and secure environment. However, the increasingly critical role of metric learning makes it susceptible to a risk of being malicious attacked. To well understand the performance of metric learning models in adversarial environments, in this article, we study the robustness of metric learning to adversarial perturbations, which are also known as the imperceptible changes to the input data that are crafted by an attacker to fool a well-learned model. However, different from traditional classification models, metric learning models take instance pairs rather than individual instances as input, and the perturbation on one instance may not necessarily affect the prediction result for an instance pair, which makes it more difficult to study the robustness of metric learning. To address this challenge, in this article, we first provide a definition of pairwise robustness for metric learning, and then propose a novel projected gradient descent-based attack method (called AckMetric) to evaluate the robustness of metric learning models. To further explore the capability of the attacker to change the prediction results, we also propose a theoretical framework to derive the upper bound of the pairwise adversarial loss. Finally, we incorporate the derived bound into the training process of metric learning and design a novel defense method to make the learned models more robust. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed methods.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116335417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating the distance covered by a spreading event on a network can lead to a better understanding of epidemics, economic growth, and human behavior. There are many methods solving this problem—which has been called Node Vector Distance (NVD)—for single layer networks. However, many phenomena are better represented by multilayer networks: networks in which nodes can connect in qualitatively different ways. In this article, we extend the literature by proposing an algorithm solving NVD for multilayer networks. We do so by adapting the Mahalanobis distance, incorporating the graph’s topology via the pseudoinverse of its Laplacian. Since this is a proper generalization of the Euclidean distance in a complex space defined by the topology of the graph, and that it works on multilayer networks, we call our measure the Multi Layer Generalized Euclidean (MLGE). In our experiments, we show that MLGE is intuitive, theoretically simpler than the alternatives, performs well in recovering infection parameters, and it is useful in specific case studies. MLGE requires solving a special case of the effective resistance on the graph, which has a high time complexity. However, this needs to be done only once per network. In the experiments, we show that MLGE can cache its most computationally heavy parts, allowing it to solve hundreds of NVD problems on the same network with little to no additional runtime. MLGE is provided as a free open source tool, along with the data and the code necessary to replicate our results.
{"title":"Generalized Euclidean Measure to Estimate Distances on Multilayer Networks","authors":"M. Coscia","doi":"10.1145/3529396","DOIUrl":"https://doi.org/10.1145/3529396","url":null,"abstract":"Estimating the distance covered by a spreading event on a network can lead to a better understanding of epidemics, economic growth, and human behavior. There are many methods solving this problem—which has been called Node Vector Distance (NVD)—for single layer networks. However, many phenomena are better represented by multilayer networks: networks in which nodes can connect in qualitatively different ways. In this article, we extend the literature by proposing an algorithm solving NVD for multilayer networks. We do so by adapting the Mahalanobis distance, incorporating the graph’s topology via the pseudoinverse of its Laplacian. Since this is a proper generalization of the Euclidean distance in a complex space defined by the topology of the graph, and that it works on multilayer networks, we call our measure the Multi Layer Generalized Euclidean (MLGE). In our experiments, we show that MLGE is intuitive, theoretically simpler than the alternatives, performs well in recovering infection parameters, and it is useful in specific case studies. MLGE requires solving a special case of the effective resistance on the graph, which has a high time complexity. However, this needs to be done only once per network. In the experiments, we show that MLGE can cache its most computationally heavy parts, allowing it to solve hundreds of NVD problems on the same network with little to no additional runtime. MLGE is provided as a free open source tool, along with the data and the code necessary to replicate our results.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125933524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Zhu, Jiangxia Cao, Xinjiang Lu, Chuanren Liu, Hao Liu, Yanyan Li, Xiangfeng Luo, H. Xiong
The understanding of people’s inter-regional mobility behaviors, such as predicting the next activity region (AR) or uncovering the intentions for regional mobility, is of great value to public administration or business interests. While there are numerous studies on human mobility, these studies are mainly from a statistical view or study movement behaviors within a region. The work on individual-level inter-regional mobility behavior is limited. To this end, in this article, we propose a dynamic region-relation-aware graph neural network (DRRGNN) for exploring individual mobility behaviors over ARs. Specifically, we aim at developing models that can answer three questions: (1) Which regions are the ARs? (2) Which region will be the next AR, and (3) Why do people make this regional mobility? To achieve these tasks, we first propose a method to find out people’s ARs. Then, the designed model integrates a dynamic graph convolution network (DGCN) and a recurrent neural network (RNN) to depict the evolution of relations between ARs and mine the regional mobility patterns. In the learning process, the model further considers peoples’ profiles and visited point-of-interest (POIs). Finally, extensive experiments on two real-world datasets show that the proposed model can significantly improve accuracy for both the next AR prediction and mobility intention prediction.
{"title":"Predicting a Person’s Next Activity Region with a Dynamic Region-Relation-Aware Graph Neural Network","authors":"N. Zhu, Jiangxia Cao, Xinjiang Lu, Chuanren Liu, Hao Liu, Yanyan Li, Xiangfeng Luo, H. Xiong","doi":"10.1145/3529091","DOIUrl":"https://doi.org/10.1145/3529091","url":null,"abstract":"The understanding of people’s inter-regional mobility behaviors, such as predicting the next activity region (AR) or uncovering the intentions for regional mobility, is of great value to public administration or business interests. While there are numerous studies on human mobility, these studies are mainly from a statistical view or study movement behaviors within a region. The work on individual-level inter-regional mobility behavior is limited. To this end, in this article, we propose a dynamic region-relation-aware graph neural network (DRRGNN) for exploring individual mobility behaviors over ARs. Specifically, we aim at developing models that can answer three questions: (1) Which regions are the ARs? (2) Which region will be the next AR, and (3) Why do people make this regional mobility? To achieve these tasks, we first propose a method to find out people’s ARs. Then, the designed model integrates a dynamic graph convolution network (DGCN) and a recurrent neural network (RNN) to depict the evolution of relations between ARs and mine the regional mobility patterns. In the learning process, the model further considers peoples’ profiles and visited point-of-interest (POIs). Finally, extensive experiments on two real-world datasets show that the proposed model can significantly improve accuracy for both the next AR prediction and mobility intention prediction.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126422707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaowei Wang, Lingling Zhang, Xuan Luo, Yi Yang, Xin Hu, Tao Qin, Jun Liu
Diagram is a special form of visual expression for representing complex concepts, logic, and knowledge, which widely appears in educational scenes such as textbooks, blogs, and encyclopedias. Current research on diagrams preliminarily focuses on natural disciplines such as Biology and Geography, whose expressions are still similar to natural images. In this article, we construct the first novel geometric type of diagrams dataset in Computer Science field, which has more abstract expressions and complex logical relations. The dataset has exhaustive annotations of objects and relations for about 1,300 diagrams and 3,500 question-answer pairs. We introduce the tasks of diagram classification (DC) and diagram question answering (DQA) based on the new dataset, and propose the Diagram Paring Net (DPN) that focuses on analyzing the topological structure and text information of diagrams. We use DPN-based models to solve DC and DQA tasks, and compare the performances to well-known natural images classification models and visual question answering models. Our experiments show the effectiveness of the proposed DPN-based models on diagram understanding tasks, also indicate that our dataset is more complex compared to previous natural image understanding datasets. The presented dataset opens new challenges for research in diagram understanding, and the DPN method provides a novel perspective for studying such data. Our dataset can be available from https://github.com/WayneWong97/CSDia.
{"title":"Computer Science Diagram Understanding with Topology Parsing","authors":"Shaowei Wang, Lingling Zhang, Xuan Luo, Yi Yang, Xin Hu, Tao Qin, Jun Liu","doi":"10.1145/3522689","DOIUrl":"https://doi.org/10.1145/3522689","url":null,"abstract":"Diagram is a special form of visual expression for representing complex concepts, logic, and knowledge, which widely appears in educational scenes such as textbooks, blogs, and encyclopedias. Current research on diagrams preliminarily focuses on natural disciplines such as Biology and Geography, whose expressions are still similar to natural images. In this article, we construct the first novel geometric type of diagrams dataset in Computer Science field, which has more abstract expressions and complex logical relations. The dataset has exhaustive annotations of objects and relations for about 1,300 diagrams and 3,500 question-answer pairs. We introduce the tasks of diagram classification (DC) and diagram question answering (DQA) based on the new dataset, and propose the Diagram Paring Net (DPN) that focuses on analyzing the topological structure and text information of diagrams. We use DPN-based models to solve DC and DQA tasks, and compare the performances to well-known natural images classification models and visual question answering models. Our experiments show the effectiveness of the proposed DPN-based models on diagram understanding tasks, also indicate that our dataset is more complex compared to previous natural image understanding datasets. The presented dataset opens new challenges for research in diagram understanding, and the DPN method provides a novel perspective for studying such data. Our dataset can be available from https://github.com/WayneWong97/CSDia.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134446902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contextual bandit serves as an invaluable tool to balance the exploration vs. exploitation tradeoff in various applications such as online recommendation. In many applications, heterogeneous information networks (HINs) provide rich side information for contextual bandits, such as different types of attributes and relationships among users and items. In this article, we propose the first HIN-assisted contextual bandit framework, which utilizes a given HIN to assist contextual bandit learning. The proposed framework uses meta-paths in HIN to extract rich relations among users and items for the contextual bandit. The main challenge is how to leverage these relations, since users’ preference over items, the target of our online learning, are closely related to users’ preference over meta-paths. However, it is unknown which meta-path a user prefers more. Thus, both preferences are needed to be learned in an online fashion with exploration vs. exploitation tradeoff balanced. We propose the HIN-assisted upper confidence bound (HUCB) algorithm to address such a challenge. For each meta-path, the HUCB algorithm employs an independent base bandit algorithm to handle online item recommendations by leveraging the relationship captured in this meta-path. A bandit master is then employed to learn users’ preference over meta-paths to dynamically combine base bandit algorithms with a balance of exploration vs. exploitation tradeoff. We theoretically prove that the HUCB algorithm can achieve similar performance compared with the optimal algorithm where each user is served according to his true preference over meta-paths (assuming the optimal algorithm knows the preference). Moreover, we prove that the HUCB algorithm benefits from leveraging HIN in achieving a smaller regret upper bound than the baseline algorithm without leveraging HIN. Experimental results on a synthetic dataset, as well as real datasets from LastFM and Yelp demonstrate the fast learning speed of the HUCB algorithm.
{"title":"Improving Bandit Learning Via Heterogeneous Information Networks: Algorithms and Applications","authors":"Xiaoying Zhang, Hong Xie, John C.S. Lui","doi":"10.1145/3522590","DOIUrl":"https://doi.org/10.1145/3522590","url":null,"abstract":"Contextual bandit serves as an invaluable tool to balance the exploration vs. exploitation tradeoff in various applications such as online recommendation. In many applications, heterogeneous information networks (HINs) provide rich side information for contextual bandits, such as different types of attributes and relationships among users and items. In this article, we propose the first HIN-assisted contextual bandit framework, which utilizes a given HIN to assist contextual bandit learning. The proposed framework uses meta-paths in HIN to extract rich relations among users and items for the contextual bandit. The main challenge is how to leverage these relations, since users’ preference over items, the target of our online learning, are closely related to users’ preference over meta-paths. However, it is unknown which meta-path a user prefers more. Thus, both preferences are needed to be learned in an online fashion with exploration vs. exploitation tradeoff balanced. We propose the HIN-assisted upper confidence bound (HUCB) algorithm to address such a challenge. For each meta-path, the HUCB algorithm employs an independent base bandit algorithm to handle online item recommendations by leveraging the relationship captured in this meta-path. A bandit master is then employed to learn users’ preference over meta-paths to dynamically combine base bandit algorithms with a balance of exploration vs. exploitation tradeoff. We theoretically prove that the HUCB algorithm can achieve similar performance compared with the optimal algorithm where each user is served according to his true preference over meta-paths (assuming the optimal algorithm knows the preference). Moreover, we prove that the HUCB algorithm benefits from leveraging HIN in achieving a smaller regret upper bound than the baseline algorithm without leveraging HIN. Experimental results on a synthetic dataset, as well as real datasets from LastFM and Yelp demonstrate the fast learning speed of the HUCB algorithm.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"31 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123283923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Outlier detection is an important task in data mining, and many technologies for it have been explored in various applications. However, owing to the default assumption that outliers are not concentrated, unsupervised outlier detection may not correctly identify group anomalies with higher levels of density. Although high detection rates and optimal parameters can usually be achieved by using supervised outlier detection, obtaining a sufficient number of correct labels is a time-consuming task. To solve these problems, we focus on semi-supervised outlier detection with few identified anomalies and a large amount of unlabeled data. The task of semi-supervised outlier detection is first decomposed into the detection of discrete anomalies and that of partially identified group anomalies, and a distribution construction sub-module and a data augmentation sub-module are then proposed to identify them, respectively. In this way, the dual multiple generative adversarial networks (Dual-MGAN) that combine the two sub-modules can identify discrete as well as partially identified group anomalies. In addition, in view of the difficulty of determining the stop node of training, two evaluation indicators are introduced to evaluate the training status of the sub-GANs. Extensive experiments on synthetic and real-world data show that the proposed Dual-MGAN can significantly improve the accuracy of outlier detection, and the proposed evaluation indicators can reflect the training status of the sub-GANs.
{"title":"Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies","authors":"Zhe Li, Chunhua Sun, Chunli Liu, Xiayu Chen, Meng Wang, Yezheng Liu","doi":"10.1145/3522690","DOIUrl":"https://doi.org/10.1145/3522690","url":null,"abstract":"Outlier detection is an important task in data mining, and many technologies for it have been explored in various applications. However, owing to the default assumption that outliers are not concentrated, unsupervised outlier detection may not correctly identify group anomalies with higher levels of density. Although high detection rates and optimal parameters can usually be achieved by using supervised outlier detection, obtaining a sufficient number of correct labels is a time-consuming task. To solve these problems, we focus on semi-supervised outlier detection with few identified anomalies and a large amount of unlabeled data. The task of semi-supervised outlier detection is first decomposed into the detection of discrete anomalies and that of partially identified group anomalies, and a distribution construction sub-module and a data augmentation sub-module are then proposed to identify them, respectively. In this way, the dual multiple generative adversarial networks (Dual-MGAN) that combine the two sub-modules can identify discrete as well as partially identified group anomalies. In addition, in view of the difficulty of determining the stop node of training, two evaluation indicators are introduced to evaluate the training status of the sub-GANs. Extensive experiments on synthetic and real-world data show that the proposed Dual-MGAN can significantly improve the accuracy of outlier detection, and the proposed evaluation indicators can reflect the training status of the sub-GANs.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116969147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the era of big data, data are usually distributed across numerous connected computing and storage units (i.e., nodes or workers). Under such an environment, many machine learning problems can be reformulated as a consensus optimization problem, which consists of one objective and constraint terms splitting into N parts (each corresponds to a node). Such a problem can be solved efficiently in a distributed manner via Alternating Direction Method of Multipliers (ADMM). However, existing consensus optimization frameworks assume that every node has the same quality of information (QoI), i.e., the data from all the nodes are equally informative for the estimation of global model parameters. As a consequence, they may lead to inaccurate estimates in the presence of nodes with low QoI. To overcome this challenge, in this article, we propose a novel consensus optimization framework for distributed machine-learning that incorporates the crucial metric, QoI. Theoretically, we prove that the convergence rate of the proposed framework is linear to the number of iterations, but has a tighter upper bound compared with ADMM. Experimentally, we show that the proposed framework is more efficient and effective than existing ADMM-based solutions on both synthetic and real-world datasets due to its faster convergence rate and higher accuracy.
{"title":"Toward Quality of Information Aware Distributed Machine Learning","authors":"Houping Xiao, Shiyu Wang","doi":"10.1145/3522591","DOIUrl":"https://doi.org/10.1145/3522591","url":null,"abstract":"In the era of big data, data are usually distributed across numerous connected computing and storage units (i.e., nodes or workers). Under such an environment, many machine learning problems can be reformulated as a consensus optimization problem, which consists of one objective and constraint terms splitting into N parts (each corresponds to a node). Such a problem can be solved efficiently in a distributed manner via Alternating Direction Method of Multipliers (ADMM). However, existing consensus optimization frameworks assume that every node has the same quality of information (QoI), i.e., the data from all the nodes are equally informative for the estimation of global model parameters. As a consequence, they may lead to inaccurate estimates in the presence of nodes with low QoI. To overcome this challenge, in this article, we propose a novel consensus optimization framework for distributed machine-learning that incorporates the crucial metric, QoI. Theoretically, we prove that the convergence rate of the proposed framework is linear to the number of iterations, but has a tighter upper bound compared with ADMM. Experimentally, we show that the proposed framework is more efficient and effective than existing ADMM-based solutions on both synthetic and real-world datasets due to its faster convergence rate and higher accuracy.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127511837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}