Providing decision support for questions such as "Which Patient to Treat Next?" requires a combination of stream-based reasoning and probabilistic reasoning. The former arises due to a multitude of sensors constantly collecting data (data streams). The latter stems from the underlying decision making problem based on a probabilistic model of the scenario at hand. The STARQL engine handles temporal data streams efficiently and the lifted dynamic junction tree algorithm handles temporal probabilistic relational data efficiently. In this paper, we leverage the two approaches and propose probabilistic stream-based reasoning. Additionally, we demonstrate that our proposed solution runs in linear time w.r.t. the maximum number of time steps to allow for real-time decision support and monitoring.
{"title":"Which Patient to Treat Next? Probabilistic Stream-Based Reasoning for Decision Support and Monitoring","authors":"M. Gehrke, Simon Schiff, Tanya Braun, R. Möller","doi":"10.1109/ICBK.2019.00018","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00018","url":null,"abstract":"Providing decision support for questions such as \"Which Patient to Treat Next?\" requires a combination of stream-based reasoning and probabilistic reasoning. The former arises due to a multitude of sensors constantly collecting data (data streams). The latter stems from the underlying decision making problem based on a probabilistic model of the scenario at hand. The STARQL engine handles temporal data streams efficiently and the lifted dynamic junction tree algorithm handles temporal probabilistic relational data efficiently. In this paper, we leverage the two approaches and propose probabilistic stream-based reasoning. Additionally, we demonstrate that our proposed solution runs in linear time w.r.t. the maximum number of time steps to allow for real-time decision support and monitoring.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116155951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangshu Chen, Pengfei Zhang, Huaizhong Lin, S. Tang
In this paper, we study the continuous path-based range keyword queries, which find the answer set continuously when the query point q moves along a given path P on the road network. This type of queries have many real applications, whereas leading to challenges as issuing the query at each point on P is expensive and infeasible. To answer the query, we transform it to the issue of identifying a set of event points. Specifically, the event point captures the query point where the answer set changes, and query points between two adjacent event points share the same answer set. To identify event points efficiently, we develop a backbone network index (BNI) over a simplified network topology, which supports efficient distance computations and offers insights for keyword tests. Moreover, we develop a two-phase progressive (TPP) query processing framework over BNI. The first phase performs range keyword queries to get answer sets for a fraction of vertices on P . Note that this can be achieved by only issuing the query once. In the second phase, event points are identified with these retrieved answer sets. Extensive experiments on both real and synthetic datasets show that our algorithm outperforms competitor by several orders of magnitude.
{"title":"Continuous Path-Based Range Keyword Queries on Road Networks","authors":"Fangshu Chen, Pengfei Zhang, Huaizhong Lin, S. Tang","doi":"10.1109/ICBK.2019.00014","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00014","url":null,"abstract":"In this paper, we study the continuous path-based range keyword queries, which find the answer set continuously when the query point q moves along a given path P on the road network. This type of queries have many real applications, whereas leading to challenges as issuing the query at each point on P is expensive and infeasible. To answer the query, we transform it to the issue of identifying a set of event points. Specifically, the event point captures the query point where the answer set changes, and query points between two adjacent event points share the same answer set. To identify event points efficiently, we develop a backbone network index (BNI) over a simplified network topology, which supports efficient distance computations and offers insights for keyword tests. Moreover, we develop a two-phase progressive (TPP) query processing framework over BNI. The first phase performs range keyword queries to get answer sets for a fraction of vertices on P . Note that this can be achieved by only issuing the query once. In the second phase, event points are identified with these retrieved answer sets. Extensive experiments on both real and synthetic datasets show that our algorithm outperforms competitor by several orders of magnitude.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122328860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anomaly login detection is a critical step towards building a secure and trustworthy system. When a new user appears in the login record, the traditional method determines that an anomaly behavior of login has occurred. However, in fact, the first login subject may be a new employee other than the attacker. In this paper, we propose an asynchronous anomaly login detection algorithm model of "Off-line Learning + On-line detection" to solve the real-time anomaly login detection problem. In addition, based on the analysis of multi-source logs, we extract the operating features of users to solve the problem of how to distinguish malicious users from legitimate users who log on to the host for the first time. Extensive experimental evaluations over large log data have shown that our algorithm can not only catch the first abnormal account effectively but also reduce the running time by tens of times compared with K-means and other cluster algorithms
{"title":"An Abnormal Login Detection Method Based on Multi-source Log Fusion Analysis","authors":"Jing Tao, Waner Wang, Ning Zheng, Ting Han, Yue Chang, Xuna Zhan","doi":"10.1109/ICBK.2019.00038","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00038","url":null,"abstract":"Anomaly login detection is a critical step towards building a secure and trustworthy system. When a new user appears in the login record, the traditional method determines that an anomaly behavior of login has occurred. However, in fact, the first login subject may be a new employee other than the attacker. In this paper, we propose an asynchronous anomaly login detection algorithm model of \"Off-line Learning + On-line detection\" to solve the real-time anomaly login detection problem. In addition, based on the analysis of multi-source logs, we extract the operating features of users to solve the problem of how to distinguish malicious users from legitimate users who log on to the host for the first time. Extensive experimental evaluations over large log data have shown that our algorithm can not only catch the first abnormal account effectively but also reduce the running time by tens of times compared with K-means and other cluster algorithms","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124044145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Welcome to IEEE ICBK 2019, the 10 th IEEE International Conference on Big Knowledge, and to Beijing, China! Beijing is China’s capital, and has a history of 3 millennia. It has an amazing combination of modern and traditional architecture, including the Great Wall, the Forbidden City (largest palace in the world), Tiananmen Square (the second largest city square in the world) and the Summer Palace. Big Knowledge seeks to systematically combine fragmented knowledge from heterogeneous, autonomous information sources for complex and evolving relationships, in addition to consolidating domain expertise. The IEEE International Conference on Big Knowledge (ICBK) is a premier international forum for the presentation of original research results in Big Knowledge opportunities and challenges, as well as for exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of Big Knowledge, including algorithms, software, systems, and applications. ICBK attracts researchers and application developers from a wide range of areas related to Big Knowledge such as statistics, machine learning, pattern recognition, knowledge visualization, expert systems, high performance computing, World Wide Web, and big data analytics. By promoting novel, high quality research findings and innovative solutions to challenging Big Knowledge problems, the conference continuously advances the state-of-the-art in Big Knowledge. ICBK 2019 is the third International Edition, and the 10 th Edition in its conference history. The first 8 editions (annually from 2010 through 2017) were all organized in Hefei, China, while the latest edition was held in Singapore last year. ICBK 2019 is co-located with the 19 IEEE International Conference on Data Mining (ICDM 2019), and we share our prominent keynote speakers: Ronald Fagin (IBM Research – Almaden Fellow of the US National Academy of Engineering) and Joseph Halpern (Cornell University, Fellow of the National Academy of Engineering). We also share the 2019 ICDM/ICBK Knowledge Graph Contest and the 2019 ICDM/ICBK Panel on Marketing Intelligence – Let Marketing Drive Efficiency and Innovation. The organization of a successful conference would not be possible without the dedicated efforts of many individuals. We would like to express our gratitude to all functional chairs on our organizing committee listed on a separate page of this proceedings. We owe special thanks to our conference sponsors: the financial and organizational support of the National Key Research and Development Program of China under grant 2016YFB1000900 and its 15 participating institutions, Mininglamp Technology and the IEEE Computer Society. We are especially grateful to all local institutions that have supported the conference, in particular Hefei University of Technology. Last but not least, we would like to thank all authors who submitted research papers to the conference, and all participants. We are encouraged by your scie
{"title":"Welcome Message from Conference Organizers","authors":"Xindong Wu","doi":"10.1109/icbk.2017.7","DOIUrl":"https://doi.org/10.1109/icbk.2017.7","url":null,"abstract":"Welcome to IEEE ICBK 2019, the 10 th IEEE International Conference on Big Knowledge, and to Beijing, China! Beijing is China’s capital, and has a history of 3 millennia. It has an amazing combination of modern and traditional architecture, including the Great Wall, the Forbidden City (largest palace in the world), Tiananmen Square (the second largest city square in the world) and the Summer Palace. Big Knowledge seeks to systematically combine fragmented knowledge from heterogeneous, autonomous information sources for complex and evolving relationships, in addition to consolidating domain expertise. The IEEE International Conference on Big Knowledge (ICBK) is a premier international forum for the presentation of original research results in Big Knowledge opportunities and challenges, as well as for exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of Big Knowledge, including algorithms, software, systems, and applications. ICBK attracts researchers and application developers from a wide range of areas related to Big Knowledge such as statistics, machine learning, pattern recognition, knowledge visualization, expert systems, high performance computing, World Wide Web, and big data analytics. By promoting novel, high quality research findings and innovative solutions to challenging Big Knowledge problems, the conference continuously advances the state-of-the-art in Big Knowledge. ICBK 2019 is the third International Edition, and the 10 th Edition in its conference history. The first 8 editions (annually from 2010 through 2017) were all organized in Hefei, China, while the latest edition was held in Singapore last year. ICBK 2019 is co-located with the 19 IEEE International Conference on Data Mining (ICDM 2019), and we share our prominent keynote speakers: Ronald Fagin (IBM Research – Almaden Fellow of the US National Academy of Engineering) and Joseph Halpern (Cornell University, Fellow of the National Academy of Engineering). We also share the 2019 ICDM/ICBK Knowledge Graph Contest and the 2019 ICDM/ICBK Panel on Marketing Intelligence – Let Marketing Drive Efficiency and Innovation. The organization of a successful conference would not be possible without the dedicated efforts of many individuals. We would like to express our gratitude to all functional chairs on our organizing committee listed on a separate page of this proceedings. We owe special thanks to our conference sponsors: the financial and organizational support of the National Key Research and Development Program of China under grant 2016YFB1000900 and its 15 participating institutions, Mininglamp Technology and the IEEE Computer Society. We are especially grateful to all local institutions that have supported the conference, in particular Hefei University of Technology. Last but not least, we would like to thank all authors who submitted research papers to the conference, and all participants. We are encouraged by your scie","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126398289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Selecting an appropriate Autoregressive Moving Average (ARMA) model for a given time series is a classic problem in statistics that is encountered in many applications. Typically this involves a human-in-the-loop and repeated parameter evaluation of candidate models, which is not ideal for learning at scale. We propose a Long Short Term Memory (LSTM) classification model for automatic ARMA model selection. Our numerical experiments show that the proposed method is fast and provides better accuracy than the traditional Box-Jenkins approach based on autocorrelations and model selection criterion. We demonstrate the application of our approach with a case study on volatility prediction of daily stock prices.
{"title":"Recurrent Neural Networks for Autoregressive Moving Average Model Selection","authors":"Bei Chen, Beat Buesser, Kelsey L. DiPietro","doi":"10.1109/ICBK.2019.00013","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00013","url":null,"abstract":"Selecting an appropriate Autoregressive Moving Average (ARMA) model for a given time series is a classic problem in statistics that is encountered in many applications. Typically this involves a human-in-the-loop and repeated parameter evaluation of candidate models, which is not ideal for learning at scale. We propose a Long Short Term Memory (LSTM) classification model for automatic ARMA model selection. Our numerical experiments show that the proposed method is fast and provides better accuracy than the traditional Box-Jenkins approach based on autocorrelations and model selection criterion. We demonstrate the application of our approach with a case study on volatility prediction of daily stock prices.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133121155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The open domain automatic question answering models have been widely studied in recent years. When dealing with automatic question answering systems, the RNN-based models are the most commonly used models. However we choose the CNN-based model to construct our question answering models, and use the attention mechanism to enhance the performance. We test our models on Microsoft open domain automatic question answering dataset. Experiments show that compared with the models without attention mechanism, our models get the best results. Experiments also show that adding the RNN network in our model can further improve the performance.
{"title":"Open-Domain Document-Based Automatic QA Models Based on CNN and Attention Mechanism","authors":"Guangjie Zhang, Xumin Fan, Canghong Jin, Ming-hui Wu","doi":"10.1109/ICBK.2019.00051","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00051","url":null,"abstract":"The open domain automatic question answering models have been widely studied in recent years. When dealing with automatic question answering systems, the RNN-based models are the most commonly used models. However we choose the CNN-based model to construct our question answering models, and use the attention mechanism to enhance the performance. We test our models on Microsoft open domain automatic question answering dataset. Experiments show that compared with the models without attention mechanism, our models get the best results. Experiments also show that adding the RNN network in our model can further improve the performance.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132219430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a differentiable nonparametric algorithm, the Delaunay triangulation learner (DTL), to solve the functional approximation problem on the basis of a p-dimensional feature space. By conducting the Delaunay triangulation algorithm on the data points, the DTL partitions the feature space into a series of p-dimensional simplices in a geometrically optimal way, and fits a linear model within each simplex. We study its theoretical properties by exploring the geometric properties of the Delaunay triangulation, and compare its performance with other statistical learners in numerical studies.
{"title":"Nonparametric Functional Approximation with Delaunay Triangulation Learner","authors":"Yehong Liu, G. Yin","doi":"10.1109/ICBK.2019.00030","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00030","url":null,"abstract":"We propose a differentiable nonparametric algorithm, the Delaunay triangulation learner (DTL), to solve the functional approximation problem on the basis of a p-dimensional feature space. By conducting the Delaunay triangulation algorithm on the data points, the DTL partitions the feature space into a series of p-dimensional simplices in a geometrically optimal way, and fits a linear model within each simplex. We study its theoretical properties by exploring the geometric properties of the Delaunay triangulation, and compare its performance with other statistical learners in numerical studies.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132717678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feature representation learning is a research focus in domain adaptation. Recently, due to the fast training speed, the marginalized Denoising Autoencoder (mDA) as a standing deep learning model has been widely utilized for feature representation learning. However, the training of mDA suffers from the lack of nonlinear relationship and does not explicitly consider the distribution discrepancy between domains. To address these problems, this paper proposes a novel method for feature representation learning, namely Nonlinear cross-domain Feature learning based Dual Constraints (NFDC), which consists of kernelization and dual constraints. Firstly, we introduce kernelization to effectively extract nonlinear relationship in feature representation learning. Secondly, we design dual constraints including Maximum Mean Discrepancy (MMD) and Manifold Regularization (MR) in order to minimize distribution discrepancy during the training process. Experimental results show that our approach is superior to several state-of-the-art methods in domain adaptation tasks.
{"title":"Nonlinear Cross-Domain Feature Representation Learning Method Based on Dual Constraints","authors":"Han Ding, Yuhong Zhang, Shuai Yang, Yaojin Lin","doi":"10.1109/ICBK.2019.00017","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00017","url":null,"abstract":"Feature representation learning is a research focus in domain adaptation. Recently, due to the fast training speed, the marginalized Denoising Autoencoder (mDA) as a standing deep learning model has been widely utilized for feature representation learning. However, the training of mDA suffers from the lack of nonlinear relationship and does not explicitly consider the distribution discrepancy between domains. To address these problems, this paper proposes a novel method for feature representation learning, namely Nonlinear cross-domain Feature learning based Dual Constraints (NFDC), which consists of kernelization and dual constraints. Firstly, we introduce kernelization to effectively extract nonlinear relationship in feature representation learning. Secondly, we design dual constraints including Maximum Mean Discrepancy (MMD) and Manifold Regularization (MR) in order to minimize distribution discrepancy during the training process. Experimental results show that our approach is superior to several state-of-the-art methods in domain adaptation tasks.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116234699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The main purpose of co-location pattern mining is to mine the set of spatial features whose instances are frequently located together in space. Because a single distance threshold is chosen in the previous methods when generating the neighbourhood relationships, some interesting spatial colocation patterns can't be extracted. In addition, previous methods don't take the neighborhood degree into consideration and they depend upon the PI (participation index) to measure the prevalence of the co-locations, which these methods are very sensitive to PI and also lead to the absence of co-location patterns. In order to overcome these limitations of traditional co-location pattern mining, considering that the neighbor relationship is a fuzzy concept, this paper introduces the fuzzy theory into co-location pattern mining, a new fuzzy spatial neighborhood relationship measurement between instances and a reasonable feature proximity measurement between spatial features are proposed. Then, a novel algorithm based on fuzzy C-medoids clustering algorithm, FCB, is proposed, extensive experiments on synthetic and real-world data sets prove the practicability and efficiency of the proposed mining algorithm, it also proves that the algorithm has low sensitivity to thresholds and has high robustness.
{"title":"Mining Spatial Co-location Patterns by the Fuzzy Technology","authors":"Le Lei, Lizhen Wang, Xiaoxuan Wang","doi":"10.1109/ICBK.2019.00025","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00025","url":null,"abstract":"The main purpose of co-location pattern mining is to mine the set of spatial features whose instances are frequently located together in space. Because a single distance threshold is chosen in the previous methods when generating the neighbourhood relationships, some interesting spatial colocation patterns can't be extracted. In addition, previous methods don't take the neighborhood degree into consideration and they depend upon the PI (participation index) to measure the prevalence of the co-locations, which these methods are very sensitive to PI and also lead to the absence of co-location patterns. In order to overcome these limitations of traditional co-location pattern mining, considering that the neighbor relationship is a fuzzy concept, this paper introduces the fuzzy theory into co-location pattern mining, a new fuzzy spatial neighborhood relationship measurement between instances and a reasonable feature proximity measurement between spatial features are proposed. Then, a novel algorithm based on fuzzy C-medoids clustering algorithm, FCB, is proposed, extensive experiments on synthetic and real-world data sets prove the practicability and efficiency of the proposed mining algorithm, it also proves that the algorithm has low sensitivity to thresholds and has high robustness.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"50 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120906767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most of the existing data stream algorithms assume a single label as the target variable. However, in many applications, each observation is assigned to several labels with latent dependencies among them, which their target function may change over time. Classification of such non-stationary multi-label streaming data with the consideration of dependencies among labels and potential drifts is a challenging task. The few existing studies mostly cope with drifts implicitly, and all learn models on the original label space, which requires a lot of time and memory. None of them consider recurrent drifts in multi-label streams and particularly drifts and recurrences visible in a latent label space. In this paper, we propose a graph-based framework that maintains a pool of multi-label concepts with transitions among them and the corresponding multi-label classifiers. As a base classifier, a fast linear label space dimension reduction method is developed that transforms the labels into a random encoded space and trains models in the reduced space. An analytical method updates the decoding matrix which is used during the test phase to map the labels back into the original space. Experimental results show the effectiveness of the proposed framework in terms of prediction performance and pool management.
{"title":"Modeling Multi-label Recurrence in Data Streams","authors":"Zahra Ahmadi, S. Kramer","doi":"10.1109/ICBK.2019.00010","DOIUrl":"https://doi.org/10.1109/ICBK.2019.00010","url":null,"abstract":"Most of the existing data stream algorithms assume a single label as the target variable. However, in many applications, each observation is assigned to several labels with latent dependencies among them, which their target function may change over time. Classification of such non-stationary multi-label streaming data with the consideration of dependencies among labels and potential drifts is a challenging task. The few existing studies mostly cope with drifts implicitly, and all learn models on the original label space, which requires a lot of time and memory. None of them consider recurrent drifts in multi-label streams and particularly drifts and recurrences visible in a latent label space. In this paper, we propose a graph-based framework that maintains a pool of multi-label concepts with transitions among them and the corresponding multi-label classifiers. As a base classifier, a fast linear label space dimension reduction method is developed that transforms the labels into a random encoded space and trains models in the reduced space. An analytical method updates the decoding matrix which is used during the test phase to map the labels back into the original space. Experimental results show the effectiveness of the proposed framework in terms of prediction performance and pool management.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124546312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}