Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00034
Xikun Huang, Chuanqing Wang, Qilin Sun, Yangyang Li, Weizhuo Li
Knowledge network has played an important role in revealing knowledge correlations, exploring innovation trends, and implementing knowledge-guided machine learning. Previous work has studied knowledge network as a static network. However, there is much less study on the evolution of knowledge networks. In this paper, we investigate the evolution of knowledge networks from a temporal network perspective. We extract knowledge networks of different topics from Wikipedia, and examine how local and global properties of these networks evolve over time. We find that many properties such as the power-law exponent of in(out)-degree distribution, density, clustering coefficient, effective diameter, and reciprocity either stay stable or vary little over time after a certain stage. And the shape of macro topology structure of each network is more like a coffee pot rather than a bow-tie. In addition, preferential attachment phenomena are found in the evolution of these knowledge networks. All the code and data are publicly available at https://github.com/XikunHuang/TAKN.
{"title":"Temporal Analysis of Knowledge Networks","authors":"Xikun Huang, Chuanqing Wang, Qilin Sun, Yangyang Li, Weizhuo Li","doi":"10.1109/ICKG52313.2021.00034","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00034","url":null,"abstract":"Knowledge network has played an important role in revealing knowledge correlations, exploring innovation trends, and implementing knowledge-guided machine learning. Previous work has studied knowledge network as a static network. However, there is much less study on the evolution of knowledge networks. In this paper, we investigate the evolution of knowledge networks from a temporal network perspective. We extract knowledge networks of different topics from Wikipedia, and examine how local and global properties of these networks evolve over time. We find that many properties such as the power-law exponent of in(out)-degree distribution, density, clustering coefficient, effective diameter, and reciprocity either stay stable or vary little over time after a certain stage. And the shape of macro topology structure of each network is more like a coffee pot rather than a bow-tie. In addition, preferential attachment phenomena are found in the evolution of these knowledge networks. All the code and data are publicly available at https://github.com/XikunHuang/TAKN.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00027
Ze Yang, Changyang Tai, Gongqing Wu, Zan Zhang, Xianyu Bao
Few clustering methods show good performance on multivariate time series (MTS) data. Traditional methods rely too much on similarity measures and perform poorly on the MTS data with complex structures. This paper proposes an MTS clustering algorithm based on graph embedding called MTSC-GE to improve the performance of MTS clustering. MTSC-GE can map MTS samples to the feature representations in a low-dimensional space and then cluster them. While mining the information of the samples themselves, MTSC-GE builds the whole time series data into a graph, paying attention to the connections between samples from an overall perspective and discovering the local structural feature of MTS data. The proposed MTSC-G E consists of three stages. The first stage builds a graph using the original dataset, where each of the MTS samples is regarded as a node in the graph. The second stage uses the graph embedding technique to obtain a new representation of each node. Finally, MTSC-G E uses the K - Means algorithm to cluster based on the newly obtained representation. We compare MTSC-GE with six state-of-the-art benchmark methods on five public datasets, experimental results show that MTSC-GE has achieved good performance.
{"title":"MTSC-GE: A Novel Graph based Method for Multivariate Time Series Clustering","authors":"Ze Yang, Changyang Tai, Gongqing Wu, Zan Zhang, Xianyu Bao","doi":"10.1109/ICKG52313.2021.00027","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00027","url":null,"abstract":"Few clustering methods show good performance on multivariate time series (MTS) data. Traditional methods rely too much on similarity measures and perform poorly on the MTS data with complex structures. This paper proposes an MTS clustering algorithm based on graph embedding called MTSC-GE to improve the performance of MTS clustering. MTSC-GE can map MTS samples to the feature representations in a low-dimensional space and then cluster them. While mining the information of the samples themselves, MTSC-GE builds the whole time series data into a graph, paying attention to the connections between samples from an overall perspective and discovering the local structural feature of MTS data. The proposed MTSC-G E consists of three stages. The first stage builds a graph using the original dataset, where each of the MTS samples is regarded as a node in the graph. The second stage uses the graph embedding technique to obtain a new representation of each node. Finally, MTSC-G E uses the K - Means algorithm to cluster based on the newly obtained representation. We compare MTSC-GE with six state-of-the-art benchmark methods on five public datasets, experimental results show that MTSC-GE has achieved good performance.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121161194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00038
Ru Chen, Guliu Liu, Yi Zhu, Xindong Wu
With the rapid development of artificial intelligence and semantic networks, knowledge graphs have received extensive attention in various application domains. As a domain knowledge graph, a genealogical knowledge graph has significant research value in the family blood, family culture, and medical genetic analysis. However, as the identical relationship often has different names and complex relationships such as divorce, remarriage, and polygamy, the reasoning based on the genealogical knowledge graph is a challenging task. In response to this problem, we propose a scheme for kinship reasoning in the genealogical knowledge graph. First, based on real genealogical revision experiences, a character ontology framework in the genealogical knowledge graph is defined, and basic kinship reasoning rules are designed. Then, given different definitions of kinship in different surnames, the solution of custom reasoning rules is integrated into the reasoning framework. In addition, aiming at complex relationships in family trees, such as multiple generations of ancestors and multiple wives, a series of inference optimization methods are proposed. Finally, we implement this scheme in the Huapu system, and the experimental results conducted on a real genealogical dataset demonstrate the effectiveness and practicality of our proposed scheme.
{"title":"A Scheme for Kinship Reasoning based on Ontology","authors":"Ru Chen, Guliu Liu, Yi Zhu, Xindong Wu","doi":"10.1109/ICKG52313.2021.00038","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00038","url":null,"abstract":"With the rapid development of artificial intelligence and semantic networks, knowledge graphs have received extensive attention in various application domains. As a domain knowledge graph, a genealogical knowledge graph has significant research value in the family blood, family culture, and medical genetic analysis. However, as the identical relationship often has different names and complex relationships such as divorce, remarriage, and polygamy, the reasoning based on the genealogical knowledge graph is a challenging task. In response to this problem, we propose a scheme for kinship reasoning in the genealogical knowledge graph. First, based on real genealogical revision experiences, a character ontology framework in the genealogical knowledge graph is defined, and basic kinship reasoning rules are designed. Then, given different definitions of kinship in different surnames, the solution of custom reasoning rules is integrated into the reasoning framework. In addition, aiming at complex relationships in family trees, such as multiple generations of ancestors and multiple wives, a series of inference optimization methods are proposed. Finally, we implement this scheme in the Huapu system, and the experimental results conducted on a real genealogical dataset demonstrate the effectiveness and practicality of our proposed scheme.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122467993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00040
Wei Emma Zhang, Queen Nguyen
Creating domain-specific glossaries that are both time-consuming and requires domain expertise. An effective and efficient automatic process will facilitate the glossary generation and its downstream applications for better decision making. In this project, we aim to build a domain-specific glossary from a large text corpus. We form the task as a knowledge graph construction problem with minimum supervision. We adapt both supervised pre-trained models and unsupervised methods for extracting relations for terms appear in the large corpus of scientific articles. We then utilize an off-the-shelf graph database to construct and store the knowledge graph. Furthermore, we develop an interactive Web-based tool for visualizing, exploring and querying the constructed knowledge graph. The project is sourced and funded by AI4DM initiative from the Office of National Intelligence (ONI) and the Defence Science and Technology (DST) Group, Australia. Although the fund requires the usage of a dataset of COVID-19 related literature collection, the solution to be presented in this paper is generic and could be easilt applied to any domain.
{"title":"Constructing COVID-19 Knowledge Graph from A Large Corpus of Scientific Articles","authors":"Wei Emma Zhang, Queen Nguyen","doi":"10.1109/ICKG52313.2021.00040","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00040","url":null,"abstract":"Creating domain-specific glossaries that are both time-consuming and requires domain expertise. An effective and efficient automatic process will facilitate the glossary generation and its downstream applications for better decision making. In this project, we aim to build a domain-specific glossary from a large text corpus. We form the task as a knowledge graph construction problem with minimum supervision. We adapt both supervised pre-trained models and unsupervised methods for extracting relations for terms appear in the large corpus of scientific articles. We then utilize an off-the-shelf graph database to construct and store the knowledge graph. Furthermore, we develop an interactive Web-based tool for visualizing, exploring and querying the constructed knowledge graph. The project is sourced and funded by AI4DM initiative from the Office of National Intelligence (ONI) and the Defence Science and Technology (DST) Group, Australia. Although the fund requires the usage of a dataset of COVID-19 related literature collection, the solution to be presented in this paper is generic and could be easilt applied to any domain.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123438132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ickg52313.2021.00008
{"title":"ICBK 2021 Track Chairs","authors":"","doi":"10.1109/ickg52313.2021.00008","DOIUrl":"https://doi.org/10.1109/ickg52313.2021.00008","url":null,"abstract":"","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128421110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00070
Zahra Ghasemi, H. A. Khorshidi, U. Aickelin
Clustering methods are developed for categorizing data points into different groups so that data points within each group have high similarities. Classic clustering algorithms are unsupervised, meaning that there is not any kind of complementary information to be utilized for attaining better clustering results. However, in some clustering problems, one may have supplementary information which can be employed for guiding the clustering process. In the presence of such information, the problem is semi-supervised clustering. In some articles, the problem of semi-supervised clustering is modeled as an optimization problem. In this research, optimization-based semi-supervised clustering papers from 2013 to 2020 are reviewed. This review is conducted based on a four-step procedure. It is attempted to explore objective functions and optimization algorithms used in these articles, as well as application domain and types of supervised information.
{"title":"A survey on Optimisation-based Semi-supervised Clustering Methods","authors":"Zahra Ghasemi, H. A. Khorshidi, U. Aickelin","doi":"10.1109/ICKG52313.2021.00070","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00070","url":null,"abstract":"Clustering methods are developed for categorizing data points into different groups so that data points within each group have high similarities. Classic clustering algorithms are unsupervised, meaning that there is not any kind of complementary information to be utilized for attaining better clustering results. However, in some clustering problems, one may have supplementary information which can be employed for guiding the clustering process. In the presence of such information, the problem is semi-supervised clustering. In some articles, the problem of semi-supervised clustering is modeled as an optimization problem. In this research, optimization-based semi-supervised clustering papers from 2013 to 2020 are reviewed. This review is conducted based on a four-step procedure. It is attempted to explore objective functions and optimization algorithms used in these articles, as well as application domain and types of supervised information.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130773678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dynamics of temporal networks lie in the con-tinuous interactions between nodes, which exhibit the dynamic node preferences with time elapsing. The challenges of mining temporal networks are thus two-fold: the dynamic structure of networks and the dynamic node preferences. In this paper, we investigate the dynamic graph sampling problem, aiming to capture the preference structure of nodes dynamically in cooperation with GNNs. Our proposed Dynamic Preference Structure (DPS) framework consists of two stages: structure sampling and graph fusion. In the first stage, two parameterized samplers are de-signed to learn the preference structure adaptively with network reconstruction tasks. In the second stage, an additional attention layer is designed to fuse two sampled temporal subgraphs of a node, generating temporal node embeddings for downstream tasks. Experimental results on many real-life temporal networks show that our DPS outperforms several state-of-the-art methods substantially owing to learning an adaptive preference structure. The code will be released soon at https://github.com/doujiang-zheng/Dynamic-Preference-Structure.
{"title":"Learning Dynamic Preference Structure Embedding From Temporal Networks","authors":"Tongya Zheng, Zunlei Feng, Yu Wang, Chengchao Shen, Mingli Song, Xingen Wang, Xinyu Wang, Chun Chen, Hao Xu","doi":"10.1109/ICKG52313.2021.00059","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00059","url":null,"abstract":"The dynamics of temporal networks lie in the con-tinuous interactions between nodes, which exhibit the dynamic node preferences with time elapsing. The challenges of mining temporal networks are thus two-fold: the dynamic structure of networks and the dynamic node preferences. In this paper, we investigate the dynamic graph sampling problem, aiming to capture the preference structure of nodes dynamically in cooperation with GNNs. Our proposed Dynamic Preference Structure (DPS) framework consists of two stages: structure sampling and graph fusion. In the first stage, two parameterized samplers are de-signed to learn the preference structure adaptively with network reconstruction tasks. In the second stage, an additional attention layer is designed to fuse two sampled temporal subgraphs of a node, generating temporal node embeddings for downstream tasks. Experimental results on many real-life temporal networks show that our DPS outperforms several state-of-the-art methods substantially owing to learning an adaptive preference structure. The code will be released soon at https://github.com/doujiang-zheng/Dynamic-Preference-Structure.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133679438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-05DOI: 10.1109/ICKG52313.2021.00033
Kentaro Ohno, Atsutoshi Kumagai
Recurrent neural networks with a gating mechanism such as an LSTM or GRU are powerful tools to model sequential data. In the mechanism, a forget gate, which was introduced to control information flow in a hidden state in the RNN, has recently been re-interpreted as a representative of the time scale of the state, i.e., a measure how long the RNN retains information on inputs. On the basis of this interpretation, several parameter initialization methods to exploit prior knowledge on temporal dependencies in data have been proposed to improve learn-ability. However, the interpretation relies on various unrealistic assumptions, such as that there are no inputs after a certain time point. In this work, we reconsider this interpretation of the forget gate in a more realistic setting. We first generalize the existing theory on gated RNNs so that we can consider the case where inputs are successively given. We then argue that the interpretation of a forget gate as a temporal representation is valid when the gradient of loss with respect to the state decreases exponentially as time goes back. We empirically demonstrate that existing RNNs satisfy this gradient condition at the initial training phase on several tasks, which is in good agreement with previous initialization methods. On the basis of this finding, we propose an approach to construct new RNNs that can represent a longer time scale than conventional models, which will improve the learnability for long-term sequential data. We verify the effectiveness of our method by experiments with real-world datasets.
{"title":"Recurrent Neural Networks for Learning Long-term Temporal Dependencies with Reanalysis of Time Scale Representation","authors":"Kentaro Ohno, Atsutoshi Kumagai","doi":"10.1109/ICKG52313.2021.00033","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00033","url":null,"abstract":"Recurrent neural networks with a gating mechanism such as an LSTM or GRU are powerful tools to model sequential data. In the mechanism, a forget gate, which was introduced to control information flow in a hidden state in the RNN, has recently been re-interpreted as a representative of the time scale of the state, i.e., a measure how long the RNN retains information on inputs. On the basis of this interpretation, several parameter initialization methods to exploit prior knowledge on temporal dependencies in data have been proposed to improve learn-ability. However, the interpretation relies on various unrealistic assumptions, such as that there are no inputs after a certain time point. In this work, we reconsider this interpretation of the forget gate in a more realistic setting. We first generalize the existing theory on gated RNNs so that we can consider the case where inputs are successively given. We then argue that the interpretation of a forget gate as a temporal representation is valid when the gradient of loss with respect to the state decreases exponentially as time goes back. We empirically demonstrate that existing RNNs satisfy this gradient condition at the initial training phase on several tasks, which is in good agreement with previous initialization methods. On the basis of this finding, we propose an approach to construct new RNNs that can represent a longer time scale than conventional models, which will improve the learnability for long-term sequential data. We verify the effectiveness of our method by experiments with real-world datasets.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121375563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICKG52313.2021.00057
Yushi Hirose, M. Shimbo, Taro Watanabe
For knowledge graph completion, two major types of prediction models exist: one based on graph embeddings, and the other based on relation path rule induction. They have different advantages and disadvantages. To take advantage of both types, hybrid models have been proposed recently. One of the hybrid models, UniKER, alternately augments training data by relation path rules and trains an embedding model. Despite its high prediction accuracy, it does not take full advantage of relation path rules, as it disregards low-confidence rules in order to maintain the quality of augmented data. To mitigate this limitation, we propose transductive data augmentation by relation path rules and confidence-based weighting of augmented data. The results and analysis show that our proposed method effectively improves the performance of the embedding model by augmenting data that include true answers or entities similar to them.
{"title":"Transductive Data Augmentation with Relational Path Rule Mining for Knowledge Graph Embedding","authors":"Yushi Hirose, M. Shimbo, Taro Watanabe","doi":"10.1109/ICKG52313.2021.00057","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00057","url":null,"abstract":"For knowledge graph completion, two major types of prediction models exist: one based on graph embeddings, and the other based on relation path rule induction. They have different advantages and disadvantages. To take advantage of both types, hybrid models have been proposed recently. One of the hybrid models, UniKER, alternately augments training data by relation path rules and trains an embedding model. Despite its high prediction accuracy, it does not take full advantage of relation path rules, as it disregards low-confidence rules in order to maintain the quality of augmented data. To mitigate this limitation, we propose transductive data augmentation by relation path rules and confidence-based weighting of augmented data. The results and analysis show that our proposed method effectively improves the performance of the embedding model by augmenting data that include true answers or entities similar to them.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115945083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}