Zhiqian Chen, Feng Chen, Rongjie Lai, Xuchao Zhang, Chang-Tien Lu
For node level graph encoding, a recent important state-of-art method is the graph convolutional networks (GCN), which nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from several drawbacks: (1) graph CNNs rely on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities; (2) Increasing the order of Chebyshev polynomial can reduce the oscillations issue, but also incurs unaffordable computational cost; (3) Chebyshev polynomials require degree Ω(poly(1/ε)) to approximate a jump signal such as |x|, while rational function only needs O(poly log(1/ε)). However, it is non-trivial to apply rational approximation without increasing computational complexity due to the denominator. In this paper, the superiority of rational approximation is exploited for graph signal recovering. RatioanlNet is proposed to integrate rational function and neural networks. We show that the rational function of eigenvalues can be rewritten as a function of graph Laplacian, which can avoid multiplication by the eigenvector matrix. Focusing on the analysis of approximation on graph convolution operation, a graph signal regression task is formulated. Under graph signal regression task, its time complexity can be significantly reduced by graph Fourier transform. To overcome the local minimum problem of neural networks model, a relaxed Remez algorithm is utilized to initialize the weight parameters. Convergence rate of RatioanlNet and polynomial based methods on a jump signal is analyzed for a theoretical guarantee. The extensive experimental results demonstrated that our approach could effectively characterize the jump discontinuities, outperforming competing methods by a substantial margin on both synthetic and real-world graphs.
{"title":"Rational Neural Networks for Approximating Graph Convolution Operator on Jump Discontinuities","authors":"Zhiqian Chen, Feng Chen, Rongjie Lai, Xuchao Zhang, Chang-Tien Lu","doi":"10.1109/ICDM.2018.00021","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00021","url":null,"abstract":"For node level graph encoding, a recent important state-of-art method is the graph convolutional networks (GCN), which nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from several drawbacks: (1) graph CNNs rely on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities; (2) Increasing the order of Chebyshev polynomial can reduce the oscillations issue, but also incurs unaffordable computational cost; (3) Chebyshev polynomials require degree Ω(poly(1/ε)) to approximate a jump signal such as |x|, while rational function only needs O(poly log(1/ε)). However, it is non-trivial to apply rational approximation without increasing computational complexity due to the denominator. In this paper, the superiority of rational approximation is exploited for graph signal recovering. RatioanlNet is proposed to integrate rational function and neural networks. We show that the rational function of eigenvalues can be rewritten as a function of graph Laplacian, which can avoid multiplication by the eigenvector matrix. Focusing on the analysis of approximation on graph convolution operation, a graph signal regression task is formulated. Under graph signal regression task, its time complexity can be significantly reduced by graph Fourier transform. To overcome the local minimum problem of neural networks model, a relaxed Remez algorithm is utilized to initialize the weight parameters. Convergence rate of RatioanlNet and polynomial based methods on a jump signal is analyzed for a theoretical guarantee. The extensive experimental results demonstrated that our approach could effectively characterize the jump discontinuities, outperforming competing methods by a substantial margin on both synthetic and real-world graphs.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114973806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaozhuo Li, Senzhang Wang, Lifang He, Philip S. Yu, Yanbo Liang, Zhoujun Li
The explosive use of social media makes it a popular platform for malicious users, known as social spammers, to overwhelm legitimate users with unwanted content. Most existing social spammer detection approaches are supervised and need a large number of manually labeled data for training, which is infeasible in practice. To address this issue, some semi-supervised models are proposed by incorporating side information such as user profiles and posted tweets. However, these shallow models are not effective to deeply learn the desirable user representations for spammer detection, and the multi-view data are usually loosely coupled without considering their correlations. In this paper, we propose a Semi-Supervised Deep social spammer detection model by Multi-View data fusion (SSDMV). The insight is that we aim to extensively learn the task-relevant discriminative representations for users to address the challenge of annotation scarcity. Under a unified semi-supervised learning framework, we first design a deep multi-view feature learning module which fuses information from different views, and then propose a label inference module to predict labels for users. The mutual refinement between the two modules ensures SSDMV to be able to both generate high quality features and make accurate predictions.Empirically, we evaluate SSDMV over two real social network datasets on three tasks, and the results demonstrate that SSDMV significantly outperforms the state-of-the-art methods.
{"title":"SSDMV: Semi-Supervised Deep Social Spammer Detection by Multi-view Data Fusion","authors":"Chaozhuo Li, Senzhang Wang, Lifang He, Philip S. Yu, Yanbo Liang, Zhoujun Li","doi":"10.1109/ICDM.2018.00040","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00040","url":null,"abstract":"The explosive use of social media makes it a popular platform for malicious users, known as social spammers, to overwhelm legitimate users with unwanted content. Most existing social spammer detection approaches are supervised and need a large number of manually labeled data for training, which is infeasible in practice. To address this issue, some semi-supervised models are proposed by incorporating side information such as user profiles and posted tweets. However, these shallow models are not effective to deeply learn the desirable user representations for spammer detection, and the multi-view data are usually loosely coupled without considering their correlations. In this paper, we propose a Semi-Supervised Deep social spammer detection model by Multi-View data fusion (SSDMV). The insight is that we aim to extensively learn the task-relevant discriminative representations for users to address the challenge of annotation scarcity. Under a unified semi-supervised learning framework, we first design a deep multi-view feature learning module which fuses information from different views, and then propose a label inference module to predict labels for users. The mutual refinement between the two modules ensures SSDMV to be able to both generate high quality features and make accurate predictions.Empirically, we evaluate SSDMV over two real social network datasets on three tasks, and the results demonstrate that SSDMV significantly outperforms the state-of-the-art methods.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115010650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A clinically meaningful distance metric, which is learned from measuring patient similarity, plays an important role in clinical decision support applications. Several metric learning approaches have been proposed to measure patient similarity, but they are mostly designed for learning the metric at only one time point/interval. It leads to a problem that those approaches cannot reflect the similarity variations among patients with the progression of diseases. In order to capture similarity information from multiple future time points simultaneously, we formulate a multi-task metric learning approach to identify patient similarity. However, it is challenging to directly apply traditional multi-task metric learning methods to learn such similarities due to the high dimensional, complex and noisy nature of healthcare data. Besides, the disease labels often have clinical relationships, which should not be treated as independent. Unfortunately, traditional formulation of the loss function ignores the degree of labels' similarity. To tackle the aforementioned challenges, we propose mtTSML, a multi-task triplet constrained sparse metric learning method, to monitor the similarity progression of patient pairs. In the proposed model, the distance for each task can be regarded as the combination of a common part and a task-specific one in the transformed low-rank space. We then perform sparse feature selection for each individual task to select the most discriminative information. Moreover, we use triplet constraints to guarantee the margin between similar and less similar pairs according to the ordered information of disease severity levels (i.e. labels). The experimental results on two real-world healthcare datasets show that the proposed multi-task metric learning method significantly outperforms the state-of-the-art baselines, including both single-task and multi-task metric learning methods.
{"title":"Multi-task Sparse Metric Learning for Monitoring Patient Similarity Progression","authors":"Qiuling Suo, Weida Zhong, Fenglong Ma, Ye Yuan, Mengdi Huai, Aidong Zhang","doi":"10.1109/ICDM.2018.00063","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00063","url":null,"abstract":"A clinically meaningful distance metric, which is learned from measuring patient similarity, plays an important role in clinical decision support applications. Several metric learning approaches have been proposed to measure patient similarity, but they are mostly designed for learning the metric at only one time point/interval. It leads to a problem that those approaches cannot reflect the similarity variations among patients with the progression of diseases. In order to capture similarity information from multiple future time points simultaneously, we formulate a multi-task metric learning approach to identify patient similarity. However, it is challenging to directly apply traditional multi-task metric learning methods to learn such similarities due to the high dimensional, complex and noisy nature of healthcare data. Besides, the disease labels often have clinical relationships, which should not be treated as independent. Unfortunately, traditional formulation of the loss function ignores the degree of labels' similarity. To tackle the aforementioned challenges, we propose mtTSML, a multi-task triplet constrained sparse metric learning method, to monitor the similarity progression of patient pairs. In the proposed model, the distance for each task can be regarded as the combination of a common part and a task-specific one in the transformed low-rank space. We then perform sparse feature selection for each individual task to select the most discriminative information. Moreover, we use triplet constraints to guarantee the margin between similar and less similar pairs according to the ordered information of disease severity levels (i.e. labels). The experimental results on two real-world healthcare datasets show that the proposed multi-task metric learning method significantly outperforms the state-of-the-art baselines, including both single-task and multi-task metric learning methods.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129879111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the past several years, multi-layer network community detection has drawn an increasing amount of attention and many approaches have been developed from different perspectives. Despite the success, they mainly rely on the lower-order connectivity structure at the level of individual nodes and edges. However, the higher-order connectivity structure plays the essential role as the building block for multiplex networks, which may contain better signature of community than edge. The main challenge in utilizing higher-order structure for multi-layer network community detection is that the most representative higher-order structure may vary from one layer to another. In this paper, we propose a higher-order structural approach for multi-layer network community detection, termed harmonic motif modularity (HM-Modularity). The key idea is to design a novel higher-order structure, termed harmonic motif, which is able to integrate higher-order structural information from multiple layers to construct a primary layer. The higher-order structural information of each individual layer is also extracted, which is taken as the auxiliary information for discovering the multi-layer community structure. A coupling is established between the primary layer and each auxiliary layer. Finally, a harmonic motif modularity is designed to generate the community structure. By solving the optimization problem of the harmonic motif modularity, the community labels of the primary layer can be obtained to reveal the community structure of the original multi-layer network. Experiments have been conducted to show the effectiveness of the proposed method.
{"title":"A Harmonic Motif Modularity Approach for Multi-layer Network Community Detection","authors":"Ling Huang, Changdong Wang, Hongyang Chao","doi":"10.1109/ICDM.2018.00132","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00132","url":null,"abstract":"During the past several years, multi-layer network community detection has drawn an increasing amount of attention and many approaches have been developed from different perspectives. Despite the success, they mainly rely on the lower-order connectivity structure at the level of individual nodes and edges. However, the higher-order connectivity structure plays the essential role as the building block for multiplex networks, which may contain better signature of community than edge. The main challenge in utilizing higher-order structure for multi-layer network community detection is that the most representative higher-order structure may vary from one layer to another. In this paper, we propose a higher-order structural approach for multi-layer network community detection, termed harmonic motif modularity (HM-Modularity). The key idea is to design a novel higher-order structure, termed harmonic motif, which is able to integrate higher-order structural information from multiple layers to construct a primary layer. The higher-order structural information of each individual layer is also extracted, which is taken as the auxiliary information for discovering the multi-layer community structure. A coupling is established between the primary layer and each auxiliary layer. Finally, a harmonic motif modularity is designed to generate the community structure. By solving the optimization problem of the harmonic motif modularity, the community labels of the primary layer can be obtained to reveal the community structure of the original multi-layer network. Experiments have been conducted to show the effectiveness of the proposed method.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134205194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the prosperity of the location-based social networks, next Point-of-Interest (POI) recommendation has become an important service and received much attention in recent years. The next POI is dynamically determined by the mobility pattern and various contexts associated with user check-in sequence. However, exploring spatial-temporal mobility patterns and incorporating heterogeneous contextual factors for recommendation are challenging issues to be resolved. In this paper, we introduce a novel neural network model named TMCA (Temporal and Multi-level Context Attention) for next POI recommendation. Our model employs the LSTM-based encoder-decoder framework, which is able to automatically learn deep spatial-temporal representations for historical check-in activities and integrate multiple contextual factors using the embedding method in a unified manner. We further propose the temporal and multi-level context attention mechanisms to adaptively select relevant check-in activities and contextual factors for next POI preference prediction. Extensive experiments have been conducted using two real-world check-in datasets. The results verify (1) the superior performance of our proposed method in different evaluation metrics, compared with several state-of-the-art methods; and (2) the effectiveness of the temporal and multi-level context attention mechanisms on recommendation performance.
随着基于位置的社交网络的蓬勃发展,下一个兴趣点(POI)推荐成为近年来备受关注的一项重要服务。下一个POI由移动性模式和与用户签入序列相关的各种上下文动态确定。然而,探索时空流动模式并将异质背景因素纳入推荐是一个需要解决的具有挑战性的问题。本文引入了一种新的神经网络模型TMCA (Temporal and Multi-level Context Attention),用于推荐下一个POI。我们的模型采用基于lstm的编码器-解码器框架,该框架能够自动学习历史签入活动的深度时空表示,并使用嵌入方法统一集成多个上下文因素。我们进一步提出了时间和多层次的上下文注意机制,以自适应地选择相关的签入活动和上下文因素,以进行下一个POI偏好预测。使用两个真实世界的登记数据集进行了广泛的实验。结果验证了:(1)与几种最先进的方法相比,我们提出的方法在不同的评估指标上表现优异;(2)时态和多层次上下文注意机制对推荐性能的影响。
{"title":"Next Point-of-Interest Recommendation with Temporal and Multi-level Context Attention","authors":"Ranzhen Li, Yanyan Shen, Yanmin Zhu","doi":"10.1109/ICDM.2018.00144","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00144","url":null,"abstract":"With the prosperity of the location-based social networks, next Point-of-Interest (POI) recommendation has become an important service and received much attention in recent years. The next POI is dynamically determined by the mobility pattern and various contexts associated with user check-in sequence. However, exploring spatial-temporal mobility patterns and incorporating heterogeneous contextual factors for recommendation are challenging issues to be resolved. In this paper, we introduce a novel neural network model named TMCA (Temporal and Multi-level Context Attention) for next POI recommendation. Our model employs the LSTM-based encoder-decoder framework, which is able to automatically learn deep spatial-temporal representations for historical check-in activities and integrate multiple contextual factors using the embedding method in a unified manner. We further propose the temporal and multi-level context attention mechanisms to adaptively select relevant check-in activities and contextual factors for next POI preference prediction. Extensive experiments have been conducted using two real-world check-in datasets. The results verify (1) the superior performance of our proposed method in different evaluation metrics, compared with several state-of-the-art methods; and (2) the effectiveness of the temporal and multi-level context attention mechanisms on recommendation performance.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Domain adaptation, which is able to leverage the abundant supervision from the source domain and limited supervision in the target domain to construct a model for the data in the target domain, has drawn significant attentions. Most of the existing domain adaptation methods elaborate to map the information derived from the source domain to the target domain for model construction in the target domain. However, such a 'Source' (S) to 'Target' (T) mapping usually involves 'tailoring' the information from the source domain to fit the target domain, which may lose valuable information in the source domain for model construction. Moreover, such a mapping is usually tightly coupled with the model construction, which is more complex than a separate model construction or mapping construction. In this paper, we provide an alternative way for domain adaptation, named T2S. Instead of mapping the 'S' to 'T' and constructing a model in 'T', we inversely map 'T' to 'S' and reuse the model that has been well-trained with abundant information in 'S' for prediction. Such an approach enjoys the abundant information in source domain for model construction and the simplicity of learning mapping separately with limited supervision in target domain. Experiments on both synthetic and real-world data sets indicate the effectiveness of our framework.
{"title":"T2S: Domain Adaptation Via Model-Independent Inverse Mapping and Model Reuse","authors":"Zhihui Shen, Ming Li","doi":"10.1109/ICDM.2018.00163","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00163","url":null,"abstract":"Domain adaptation, which is able to leverage the abundant supervision from the source domain and limited supervision in the target domain to construct a model for the data in the target domain, has drawn significant attentions. Most of the existing domain adaptation methods elaborate to map the information derived from the source domain to the target domain for model construction in the target domain. However, such a 'Source' (S) to 'Target' (T) mapping usually involves 'tailoring' the information from the source domain to fit the target domain, which may lose valuable information in the source domain for model construction. Moreover, such a mapping is usually tightly coupled with the model construction, which is more complex than a separate model construction or mapping construction. In this paper, we provide an alternative way for domain adaptation, named T2S. Instead of mapping the 'S' to 'T' and constructing a model in 'T', we inversely map 'T' to 'S' and reuse the model that has been well-trained with abundant information in 'S' for prediction. Such an approach enjoys the abundant information in source domain for model construction and the simplicity of learning mapping separately with limited supervision in target domain. Experiments on both synthetic and real-world data sets indicate the effectiveness of our framework.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132626285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tree inventories are important datasets for many societal applications (e.g., urban planning). However, tree inventories still remain unavailable in most urban areas. We aim to automate tree identification at individual levels in urban areas at a large scale using remote sensing datasets. The problem is challenging due to the complexity of the landscape in urban scenarios and the lack of ground truth data. In related work, tree identification algorithms have mainly focused on controlled forest regions where the landscape is mostly homogeneous with trees, making the methods difficult to generalize to urban environments. We propose a TIMBER framework to find individual trees in complex urban environments and a Core Object REduction (CORE) algorithm to improve the computational efficiency of TIMBER. Experiments show that TIMBER can efficiently detect urban trees with high accuracy.
{"title":"A TIMBER Framework for Mining Urban Tree Inventories Using Remote Sensing Datasets","authors":"Yiqun Xie, Han Bao, S. Shekhar, Joseph K. Knight","doi":"10.1109/ICDM.2018.00183","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00183","url":null,"abstract":"Tree inventories are important datasets for many societal applications (e.g., urban planning). However, tree inventories still remain unavailable in most urban areas. We aim to automate tree identification at individual levels in urban areas at a large scale using remote sensing datasets. The problem is challenging due to the complexity of the landscape in urban scenarios and the lack of ground truth data. In related work, tree identification algorithms have mainly focused on controlled forest regions where the landscape is mostly homogeneous with trees, making the methods difficult to generalize to urban environments. We propose a TIMBER framework to find individual trees in complex urban environments and a Core Object REduction (CORE) algorithm to improve the computational efficiency of TIMBER. Experiments show that TIMBER can efficiently detect urban trees with high accuracy.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132681764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Segmenting text into semantically coherent segments is an important task with applications in information retrieval and text summarization. Developing accurate topical segmentation requires the availability of training data with ground truth information at the segment level. However, generating such labeled datasets, especially for applications in which the meaning of the labels is user-defined, is expensive and time-consuming. In this paper, we develop an approach that instead of using segment-level ground truth information, it instead uses the set of labels that are associated with a document and are easier to obtain as the training data essentially corresponds to a multilabel dataset. Our method, which can be thought of as an instance of distant supervision, improves upon the previous approaches by exploiting the fact that consecutive sentences in a document tend to talk about the same topic, and hence, probably belong to the same class. Experiments on the text segmentation task on a variety of datasets show that the segmentation produced by our method beats the competing approaches on four out of five datasets and performs at par on the fifth dataset. On the multilabel text classification task, our method performs at par with the competing approaches, while requiring significantly less time to estimate than the competing approaches.
{"title":"Text Segmentation on Multilabel Documents: A Distant-Supervised Approach","authors":"Saurav Manchanda, G. Karypis","doi":"10.1109/ICDM.2018.00154","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00154","url":null,"abstract":"Segmenting text into semantically coherent segments is an important task with applications in information retrieval and text summarization. Developing accurate topical segmentation requires the availability of training data with ground truth information at the segment level. However, generating such labeled datasets, especially for applications in which the meaning of the labels is user-defined, is expensive and time-consuming. In this paper, we develop an approach that instead of using segment-level ground truth information, it instead uses the set of labels that are associated with a document and are easier to obtain as the training data essentially corresponds to a multilabel dataset. Our method, which can be thought of as an instance of distant supervision, improves upon the previous approaches by exploiting the fact that consecutive sentences in a document tend to talk about the same topic, and hence, probably belong to the same class. Experiments on the text segmentation task on a variety of datasets show that the segmentation produced by our method beats the competing approaches on four out of five datasets and performs at par on the fifth dataset. On the multilabel text classification task, our method performs at par with the competing approaches, while requiring significantly less time to estimate than the competing approaches.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133662392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A tensor (i.e., an N-mode array) is a natural representation for multidimensional data. Tucker Decomposition (TD) is one of the most popular methods, and a series of batch TD algorithms have been extensively studied and widely applied in signal/image processing, bioinformatics, etc. However, in many applications, the large-scale tensor is dynamically evolving at all modes, which poses significant challenges for existing approaches to track the TD for such dynamic tensors. In this paper, we propose an efficient Online Tucker Decomposition (eOTD) approach to track the TD of dynamic tensors with an arbitrary number of modes. We first propose corollaries on the multiplication of block tensor matrix. Based on this corollary, eOTD allows us 1) to update the projection matrices using those projection matrices from the previous timestamp and the auxiliary matrices from the current timestamp, and 2) to update the core tensor by a sum of tensors that are obtained by multiplying smaller tensors with matrices. The auxiliary matrices are obtained by solving a series of least square regression tasks, not by performing Singular Value Decompositions (SVD). This overcomes the bottleneck in computation and storage caused by computing SVDs on largescale data. A Modified Gram-Schmidt (MGS) process is further applied to orthonormalize the projection matrices. Theoretically, the output of the eOTD framework is guaranteed to be lowrank. We further prove that the MGS process will not increase Tucker decomposition error. Empirically, we demonstrate that the proposed eOTD achieves comparable accuracy with a significant speedup on both synthetic and real data, where the speedup can be more than 1,500 times on large-scale data.
{"title":"eOTD: An Efficient Online Tucker Decomposition for Higher Order Tensors","authors":"Houping Xiao, Fei Wang, Fenglong Ma, Jing Gao","doi":"10.1109/ICDM.2018.00180","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00180","url":null,"abstract":"A tensor (i.e., an N-mode array) is a natural representation for multidimensional data. Tucker Decomposition (TD) is one of the most popular methods, and a series of batch TD algorithms have been extensively studied and widely applied in signal/image processing, bioinformatics, etc. However, in many applications, the large-scale tensor is dynamically evolving at all modes, which poses significant challenges for existing approaches to track the TD for such dynamic tensors. In this paper, we propose an efficient Online Tucker Decomposition (eOTD) approach to track the TD of dynamic tensors with an arbitrary number of modes. We first propose corollaries on the multiplication of block tensor matrix. Based on this corollary, eOTD allows us 1) to update the projection matrices using those projection matrices from the previous timestamp and the auxiliary matrices from the current timestamp, and 2) to update the core tensor by a sum of tensors that are obtained by multiplying smaller tensors with matrices. The auxiliary matrices are obtained by solving a series of least square regression tasks, not by performing Singular Value Decompositions (SVD). This overcomes the bottleneck in computation and storage caused by computing SVDs on largescale data. A Modified Gram-Schmidt (MGS) process is further applied to orthonormalize the projection matrices. Theoretically, the output of the eOTD framework is guaranteed to be lowrank. We further prove that the MGS process will not increase Tucker decomposition error. Empirically, we demonstrate that the proposed eOTD achieves comparable accuracy with a significant speedup on both synthetic and real data, where the speedup can be more than 1,500 times on large-scale data.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133497338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge tracing serves as the key technique in the computer supported education environment (e.g., intelligent tutoring systems) to model student's knowledge states. While the Bayesian knowledge tracing and deep knowledge tracing models have been developed, the sparseness of student's exercise data still limits knowledge tracing's performance and applications. In order to address this issue, we advocate for and propose to incorporate the knowledge structure information, especially the prerequisite relations between pedagogical concepts, into the knowledge tracing model. Specifically, by considering how students master pedagogical concepts and their prerequisites, we model prerequisite concept pairs as ordering pairs. With a proper mathematical formulation, this property can be utilized as constraints in designing knowledge tracing model. As a result, the obtained model can have a better performance on student concept mastery prediction. In order to evaluate this model, we test it on five different real world datasets, and the experimental results show that the proposed model achieves a significant performance improvement by comparing with three knowledge tracing models.
{"title":"Prerequisite-Driven Deep Knowledge Tracing","authors":"Penghe Chen, Yu Lu, V. Zheng, Yang Pian","doi":"10.1109/ICDM.2018.00019","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00019","url":null,"abstract":"Knowledge tracing serves as the key technique in the computer supported education environment (e.g., intelligent tutoring systems) to model student's knowledge states. While the Bayesian knowledge tracing and deep knowledge tracing models have been developed, the sparseness of student's exercise data still limits knowledge tracing's performance and applications. In order to address this issue, we advocate for and propose to incorporate the knowledge structure information, especially the prerequisite relations between pedagogical concepts, into the knowledge tracing model. Specifically, by considering how students master pedagogical concepts and their prerequisites, we model prerequisite concept pairs as ordering pairs. With a proper mathematical formulation, this property can be utilized as constraints in designing knowledge tracing model. As a result, the obtained model can have a better performance on student concept mastery prediction. In order to evaluate this model, we test it on five different real world datasets, and the experimental results show that the proposed model achieves a significant performance improvement by comparing with three knowledge tracing models.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126966919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}