Collaborative filtering recommender systems (CFRSs) are critical components of existing popular e-commerce websites to make personalized recommendations. In practice, CFRSs are highly vulnerable to "shilling" attacks or "profile injection" attacks due to its openness. A number of detection methods have been proposed to make CFRSs resistant to such attacks. However, some of them distinguished attackers by using typical similarity metrics, which are difficult to fully defend all attackers and show high computation time, although they can be effective to capture the concerned attackers in some extent. In this paper, we propose an unsupervised method to detect such attacks. Firstly, we filter out more genuine users by using suspected target items as far as possible in order to reduce time consumption. Based on the remained result of the first stage, we employ a new similarity metric to further filter out the remained genuine users, which combines the traditional similarity metric and the linkage information between users to improve the accuracy of similarity of users. Experimental results show that our proposed detection method is superior to benchmarked method.
{"title":"Defending Suspected Users by Exploiting Specific Distance Metric in Collaborative Filtering Recommender Systems","authors":"Zhihai Yang, Zhongmin Cai","doi":"10.1109/ICDMW.2015.89","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.89","url":null,"abstract":"Collaborative filtering recommender systems (CFRSs) are critical components of existing popular e-commerce websites to make personalized recommendations. In practice, CFRSs are highly vulnerable to \"shilling\" attacks or \"profile injection\" attacks due to its openness. A number of detection methods have been proposed to make CFRSs resistant to such attacks. However, some of them distinguished attackers by using typical similarity metrics, which are difficult to fully defend all attackers and show high computation time, although they can be effective to capture the concerned attackers in some extent. In this paper, we propose an unsupervised method to detect such attacks. Firstly, we filter out more genuine users by using suspected target items as far as possible in order to reduce time consumption. Based on the remained result of the first stage, we employ a new similarity metric to further filter out the remained genuine users, which combines the traditional similarity metric and the linkage information between users to improve the accuracy of similarity of users. Experimental results show that our proposed detection method is superior to benchmarked method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116948244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Searching desirable events in uncontrolled videos isa challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consumingand labor expensive to collect a large amount of required labeled videos to model events under various circumstances. To alleviate the labeling process, we propose to learn models for videos by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer may hurt the retrieval performance. To address such negative transfer problem, we propose a novel Joint Group Weighting Learning (JGWL) framework to leverage different but related groups of knowledge (source domain) queried from the Web image searching engine to real-world videos (target domain). Under this framework, weights of different groups are learned in a joint optimization framework, and each weight represents how contributive the corresponding image group is to the knowledge transferred to the videos. Moreover, to deal with the feature distribution mismatching between video feature space and image feature space, we build a common feature subspace to bridge these two heterogeneous feature spaces in an unsupervised manner. Experimental results on two challenging video datasets demonstrate that it is effective to use grouped knowledge gained from Web images for video retrieval.
{"title":"Finding Event Videos via Image Search Engine","authors":"Han Wang, Xinxiao Wu","doi":"10.1109/ICDMW.2015.78","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.78","url":null,"abstract":"Searching desirable events in uncontrolled videos isa challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consumingand labor expensive to collect a large amount of required labeled videos to model events under various circumstances. To alleviate the labeling process, we propose to learn models for videos by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer may hurt the retrieval performance. To address such negative transfer problem, we propose a novel Joint Group Weighting Learning (JGWL) framework to leverage different but related groups of knowledge (source domain) queried from the Web image searching engine to real-world videos (target domain). Under this framework, weights of different groups are learned in a joint optimization framework, and each weight represents how contributive the corresponding image group is to the knowledge transferred to the videos. Moreover, to deal with the feature distribution mismatching between video feature space and image feature space, we build a common feature subspace to bridge these two heterogeneous feature spaces in an unsupervised manner. Experimental results on two challenging video datasets demonstrate that it is effective to use grouped knowledge gained from Web images for video retrieval.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115105087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes our solution to ICDM 2015's contest. The challenge is to recover cross-device connections, i.e. identifying device-cookie pairs that is used by the same natural person. To tackle this task, we first model the privateness of each IP, then employ pairwise ranking techniques for predicting the likelihood of each connection, finally ensemble learning is used for integrating multiple models from various settings. Our approach achieves 5th place in the contest (average F-score of 0.8608) using ONLY IP footprint information.
{"title":"Recovering Cross-Device Connections via Mining IP Footprints with Ensemble Learning","authors":"Xuezhi Cao, Weiyue Huang, Yong Yu","doi":"10.1109/ICDMW.2015.129","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.129","url":null,"abstract":"This paper describes our solution to ICDM 2015's contest. The challenge is to recover cross-device connections, i.e. identifying device-cookie pairs that is used by the same natural person. To tackle this task, we first model the privateness of each IP, then employ pairwise ranking techniques for predicting the likelihood of each connection, finally ensemble learning is used for integrating multiple models from various settings. Our approach achieves 5th place in the contest (average F-score of 0.8608) using ONLY IP footprint information.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115425277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growth of user generated contents (UGC), it is important to know consumers' opinions about features or deficiencies of products quickly. Such information is important not only for companies, but also for consumers. Keyword-based visualization and clustering are effective methods to observe summary of opinions. In order to decrease users' effort in examining vast amount of UGC, we proposed an interactive visualization system that presents sentiment words with aspects based on natural language processing and sentiment lexicon. This paper also proposes to apply latent Dirichlet allocation (LDA) to cluster reviews into several topics in order to improve understandability of visualization. This paper explains the developed system with case studies.
{"title":"Proposal of LDA-Based Sentiment Visualization of Hotel Reviews","authors":"Yu-Sheng Chen, Lieu-Hen Chen, Y. Takama","doi":"10.1109/ICDMW.2015.72","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.72","url":null,"abstract":"With the growth of user generated contents (UGC), it is important to know consumers' opinions about features or deficiencies of products quickly. Such information is important not only for companies, but also for consumers. Keyword-based visualization and clustering are effective methods to observe summary of opinions. In order to decrease users' effort in examining vast amount of UGC, we proposed an interactive visualization system that presents sentiment words with aspects based on natural language processing and sentiment lexicon. This paper also proposes to apply latent Dirichlet allocation (LDA) to cluster reviews into several topics in order to improve understandability of visualization. This paper explains the developed system with case studies.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115618365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.
{"title":"Valuating Queries for Data Trading in Modern Cities","authors":"Ruiming Tang, Huayu Wu, Xiuqiang He, S. Bressan","doi":"10.1109/ICDMW.2015.11","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.11","url":null,"abstract":"The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123408986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data.
{"title":"Paradigmatic Clustering for NLP","authors":"Julio Santisteban, Javier Tejada-Cárcamo","doi":"10.1109/ICDMW.2015.233","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.233","url":null,"abstract":"How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123666342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.
{"title":"Identifying Key-Players in Online Activist Groups on the Facebook Social Network","authors":"Mariam Nouh, Jason R. C. Nurse","doi":"10.1109/ICDMW.2015.88","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.88","url":null,"abstract":"Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121952936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A shikake is a trigger for behavioral change to solve a problem. We proposes a Shikake Data Market (SDM) platform for giving everyone an opportunity to implement a shikake with restricted resources, such as ideas, expert knowledge and skill, practitioners, negotiators, and budget. As a preliminary case, we analyzed the collaborative creation at a shikake hackathon and revealed that collaboration among people with diverse expert backgrounds would improve the quality of the output. Based on this result, we discuss collaborative shikake creation.
{"title":"Shikake Data Market for Collaborative Shikake Creation","authors":"N. Matsumura, Hideaki Takeda","doi":"10.1109/ICDMW.2015.130","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.130","url":null,"abstract":"A shikake is a trigger for behavioral change to solve a problem. We proposes a Shikake Data Market (SDM) platform for giving everyone an opportunity to implement a shikake with restricted resources, such as ideas, expert knowledge and skill, practitioners, negotiators, and budget. As a preliminary case, we analyzed the collaborative creation at a shikake hackathon and revealed that collaboration among people with diverse expert backgrounds would improve the quality of the output. Based on this result, we discuss collaborative shikake creation.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124031810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, "global topological similarity" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.
{"title":"Link Prediction in Large Networks by Comparing the Global View of Nodes in the Network","authors":"Mustafa Coşkun, Mehmet Koyutürk","doi":"10.1109/ICDMW.2015.195","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.195","url":null,"abstract":"Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, \"global topological similarity\" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123970720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.
{"title":"Incremental Discriminant Learning for Heterogeneous Domain Adaptation","authors":"Peng Han, Xinxiao Wu","doi":"10.1109/ICDMW.2015.186","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.186","url":null,"abstract":"This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}