Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00031
Zhong Chen, Zhide Fang, Victor S. Sheng, Andrea Edwards, Kun Zhang
Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.
{"title":"CSRDA: Cost-sensitive Regularized Dual Averaging for Handling Imbalanced and High-dimensional Streaming Data","authors":"Zhong Chen, Zhide Fang, Victor S. Sheng, Andrea Edwards, Kun Zhang","doi":"10.1109/ICKG52313.2021.00031","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00031","url":null,"abstract":"Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130579015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00041
R. Lu, Chaoqun Fei, Chuanqing Wang, Yu Huang, Songmao Zhang
Knowledge graph and its processing techniques have got wide spread attention from the AI and knowledge engineering society. However, the knowledge graph supporting platforms have gained much less concern. This paper emphasizes the role of knowledge graph platforms as an independent product of knowledge engineering. Starting from the introduction of HAPE - a programmable universal big knowledge graph platform, which is a predecessor of YABKO, we introduce the idea and technique of Web-based resource sharing public knowledge graph laboratory and its implementation YABKO, which has a threefold target. Firstly, it is an open source platform for researchers doing experimental research on knowledge graphs supported by YABKO's own resources. Secondly, it supports research on big knowledge engineering, in particular in the knowledge graph area. Thirdly, it supports a full life cycle research on big knowledge graphs. Further we introduce YABKOS, a constellation of YABKOs on the Web, which is a decentralized research lab for large scale knowledge graph experiments. Also the wide area programming language Knorc for knowledge graphs' operation orchestration is introduced.
{"title":"YABKO-Yet Another Big Knowledge Organization","authors":"R. Lu, Chaoqun Fei, Chuanqing Wang, Yu Huang, Songmao Zhang","doi":"10.1109/ICKG52313.2021.00041","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00041","url":null,"abstract":"Knowledge graph and its processing techniques have got wide spread attention from the AI and knowledge engineering society. However, the knowledge graph supporting platforms have gained much less concern. This paper emphasizes the role of knowledge graph platforms as an independent product of knowledge engineering. Starting from the introduction of HAPE - a programmable universal big knowledge graph platform, which is a predecessor of YABKO, we introduce the idea and technique of Web-based resource sharing public knowledge graph laboratory and its implementation YABKO, which has a threefold target. Firstly, it is an open source platform for researchers doing experimental research on knowledge graphs supported by YABKO's own resources. Secondly, it supports research on big knowledge engineering, in particular in the knowledge graph area. Thirdly, it supports a full life cycle research on big knowledge graphs. Further we introduce YABKOS, a constellation of YABKOs on the Web, which is a decentralized research lab for large scale knowledge graph experiments. Also the wide area programming language Knorc for knowledge graphs' operation orchestration is introduced.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123208903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00022
Yujie Wang, Shuai Yang, Xianjie Guo, Kui Yu
Directed acyclic graph (DAG) learning plays a fun-damental role in causal inference and other scientific scenes, which aims to uncover the relationships between variables. However, identifying a DAG from observational data has al-ways been a challenging task. Recently, gradient-based DAG learning algorithms that convert a combination-optimization DAG learning problem into a continuous-optimization problem have achieved emerging successes. These algorithms are easy to optimize and able to deal with both parametric and non-parametric data but suffer from many reversed edges learnt by these algorithms. In this paper, we propose a framework named Residual Independence Test (RIT) to correct those reversed edges by leveraging the structural asymmetry reflected in the depen-dence between regression residual and direct cause. We conduct extensive experiments on both synthetic and benchmark datasets, the results show that the RIT framework significantly improve the performance of gradient-based DAG learning algorithms.
{"title":"Improving Gradient-based DAG Learning by Structural Asymmetry","authors":"Yujie Wang, Shuai Yang, Xianjie Guo, Kui Yu","doi":"10.1109/ICKG52313.2021.00022","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00022","url":null,"abstract":"Directed acyclic graph (DAG) learning plays a fun-damental role in causal inference and other scientific scenes, which aims to uncover the relationships between variables. However, identifying a DAG from observational data has al-ways been a challenging task. Recently, gradient-based DAG learning algorithms that convert a combination-optimization DAG learning problem into a continuous-optimization problem have achieved emerging successes. These algorithms are easy to optimize and able to deal with both parametric and non-parametric data but suffer from many reversed edges learnt by these algorithms. In this paper, we propose a framework named Residual Independence Test (RIT) to correct those reversed edges by leveraging the structural asymmetry reflected in the depen-dence between regression residual and direct cause. We conduct extensive experiments on both synthetic and benchmark datasets, the results show that the RIT framework significantly improve the performance of gradient-based DAG learning algorithms.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123238267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00043
Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu
Many eCommerce catalogs rely on structured prod-uct data to provide a good experience for customers. For large scale services, product information is provided by millions of different manufacturer and vendor schemas. Due to inherent heterogeneity of this data, unifying it to a consistent catalog schema remains a challenge. Schema matching is the problem of finding such correspondences between concepts in different distributed, heterogeneous data sources. Most approaches in automated schema matching assume either a small number of source schemas, attributes, and contexts (i.e., matching movie attributes from media knowledge bases). By contrast, schema matching in product catalogs encounter the problem of scaling across millions of noisy, heterogenous schemas spanning thou-sands of categories and attributes. In this paper, we introduce a scalable schema matching framework that utilizes unsupervised domain-specific attribute representations and general attribute similarity metrics. Our method first identifies relevant attributes for a given product based on existing customer signals, and then prioritizes among candidate attributes to consolidate only those relevant product facts from multiple manufacturers and vendors with little to no labeled data. We demonstrate value by experiments that enriched catalog data containing millions of attribute enumer-ations sourced from tens of thousands of schemas across a wide range of product categories. Experimental results show reduced manual annotation efforts by 75% from competing schema matching efforts by automating schema matching on targeted product facts, resulting in high accuracy, precision, and recall for important attributes that contribute to customer interest. We also demonstrate performance improvements of 8% MRR using our approach compared against two well-established approaches to unsupervised schema matching.
{"title":"Attribute Similarity and Relevance-Based Product Schema Matching for Targeted Catalog Enrichment","authors":"Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu","doi":"10.1109/ICKG52313.2021.00043","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00043","url":null,"abstract":"Many eCommerce catalogs rely on structured prod-uct data to provide a good experience for customers. For large scale services, product information is provided by millions of different manufacturer and vendor schemas. Due to inherent heterogeneity of this data, unifying it to a consistent catalog schema remains a challenge. Schema matching is the problem of finding such correspondences between concepts in different distributed, heterogeneous data sources. Most approaches in automated schema matching assume either a small number of source schemas, attributes, and contexts (i.e., matching movie attributes from media knowledge bases). By contrast, schema matching in product catalogs encounter the problem of scaling across millions of noisy, heterogenous schemas spanning thou-sands of categories and attributes. In this paper, we introduce a scalable schema matching framework that utilizes unsupervised domain-specific attribute representations and general attribute similarity metrics. Our method first identifies relevant attributes for a given product based on existing customer signals, and then prioritizes among candidate attributes to consolidate only those relevant product facts from multiple manufacturers and vendors with little to no labeled data. We demonstrate value by experiments that enriched catalog data containing millions of attribute enumer-ations sourced from tens of thousands of schemas across a wide range of product categories. Experimental results show reduced manual annotation efforts by 75% from competing schema matching efforts by automating schema matching on targeted product facts, resulting in high accuracy, precision, and recall for important attributes that contribute to customer interest. We also demonstrate performance improvements of 8% MRR using our approach compared against two well-established approaches to unsupervised schema matching.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116356656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00028
Changyang Tai, Ze Yang, Huicheng Zhang, Gongqing Wu, Junwei Lv, Xianyu Bao
Multivariate time series (MTS) classification has been regarded as one of the most challenging problems in data mining due to the difficulty in modeling the correlation of variables and samples. In addition, high-dimensional MTS modeling has a large time and space consumption. This paper proposes a novel method, Gaussian Model-based Fully Convolutional Networks (GM-FCN), to improve the performance of high-dimensional MTS classification. Each original MTS is converted into multivariate Gaussian model parameters as the input of FCN. These parameters effectively capture the correlation be-tween MTS variables and significantly reduce the data scale by aligning an MTS size to its dimension. FCN is designed to learn more in-depth features of MTS based on these parameters for modeling the correlation between samples. Thus, GM-FCN can not only model the correlation between variables, but also the correlation between samples. We compare GM-FCN with nine state-of-the-art MTS classification methods, INN-ED, INN-DTW-i, INN-DTW-D, KLD-GMC, MLP, ResNet, Encoder, MCNN, and MCDCNN, on four high-dimensional public datasets, experimen-tal results show that the accuracy of G M - FCN is significantly superior to the others. Besides, the training time of GM-FCN is dozens of times faster than FCN using the original equal-length MTS data as input.
{"title":"Gaussian Model-Based Fully Convolutional Networks for Multivariate Time Series Classification","authors":"Changyang Tai, Ze Yang, Huicheng Zhang, Gongqing Wu, Junwei Lv, Xianyu Bao","doi":"10.1109/ICKG52313.2021.00028","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00028","url":null,"abstract":"Multivariate time series (MTS) classification has been regarded as one of the most challenging problems in data mining due to the difficulty in modeling the correlation of variables and samples. In addition, high-dimensional MTS modeling has a large time and space consumption. This paper proposes a novel method, Gaussian Model-based Fully Convolutional Networks (GM-FCN), to improve the performance of high-dimensional MTS classification. Each original MTS is converted into multivariate Gaussian model parameters as the input of FCN. These parameters effectively capture the correlation be-tween MTS variables and significantly reduce the data scale by aligning an MTS size to its dimension. FCN is designed to learn more in-depth features of MTS based on these parameters for modeling the correlation between samples. Thus, GM-FCN can not only model the correlation between variables, but also the correlation between samples. We compare GM-FCN with nine state-of-the-art MTS classification methods, INN-ED, INN-DTW-i, INN-DTW-D, KLD-GMC, MLP, ResNet, Encoder, MCNN, and MCDCNN, on four high-dimensional public datasets, experimen-tal results show that the accuracy of G M - FCN is significantly superior to the others. Besides, the training time of GM-FCN is dozens of times faster than FCN using the original equal-length MTS data as input.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121635404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00030
Yan Yu, Dong Qiu, Ruiteng Yan
Accurately extracting the semantic information and the syntactic structure of sentences is important in natural language processing. Existing methods mainly combine the dependency tree to deep learning with complex computation time to achieve enough semantic information. It is essential to obtain sufficient semantic information and syntactic structures without any prior knowledge excepting word2vec. This paper proposes a model on sentence representation inspired by quantum entanglement using the tensor product to entangle both two consecutive notional words and words with depen-dencies. Inspired by quantum entanglement coefficients, we construct two different entanglement coefficients to weight the different semantic contributions of words with different relations. Finally, the proposed model is applied to SICK_train to verify their performances. The experimental results show that the provided methods achieve perfect results.
{"title":"An efficient framework for sentence similarity inspired by quantum computing","authors":"Yan Yu, Dong Qiu, Ruiteng Yan","doi":"10.1109/ICKG52313.2021.00030","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00030","url":null,"abstract":"Accurately extracting the semantic information and the syntactic structure of sentences is important in natural language processing. Existing methods mainly combine the dependency tree to deep learning with complex computation time to achieve enough semantic information. It is essential to obtain sufficient semantic information and syntactic structures without any prior knowledge excepting word2vec. This paper proposes a model on sentence representation inspired by quantum entanglement using the tensor product to entangle both two consecutive notional words and words with depen-dencies. Inspired by quantum entanglement coefficients, we construct two different entanglement coefficients to weight the different semantic contributions of words with different relations. Finally, the proposed model is applied to SICK_train to verify their performances. The experimental results show that the provided methods achieve perfect results.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00051
Joseph Kuebler, Lingbo Tong, Meng Jiang
Information extraction (IE) in scientific literature has facilitated many down-stream knowledge-driven tasks. Ope-nIE, which does not require any relation schema but identifies a relational phrase to describe the relationship between a subject and an object, is being a trending topic of IE in sciences. The subjects, objects, and relations are often multiword expressions, which brings challenges for methods to identify the boundaries of the expressions given very limited or even no training data. In this work, we present a set of rules for extracting structured information based on dependency parsing that can be applied to any scientific dataset requiring no expert's annotation. Results on novel datasets show the effectiveness of the proposed method. We discuss negative results as well.
{"title":"Multi-Round Parsing-based Multiword Rules for Scientific Knowledge Extraction","authors":"Joseph Kuebler, Lingbo Tong, Meng Jiang","doi":"10.1109/ICKG52313.2021.00051","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00051","url":null,"abstract":"Information extraction (IE) in scientific literature has facilitated many down-stream knowledge-driven tasks. Ope-nIE, which does not require any relation schema but identifies a relational phrase to describe the relationship between a subject and an object, is being a trending topic of IE in sciences. The subjects, objects, and relations are often multiword expressions, which brings challenges for methods to identify the boundaries of the expressions given very limited or even no training data. In this work, we present a set of rules for extracting structured information based on dependency parsing that can be applied to any scientific dataset requiring no expert's annotation. Results on novel datasets show the effectiveness of the proposed method. We discuss negative results as well.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134445594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00032
Chenyu Cao, C. Yan, Fangtao Li, Zihe Liu, Z. Wang, Bin Wu
Video contains rich semantic knowledge of multiple modalities related to a person. Mining deep or potential semantic knowledge in the video could help artificial intelligence better understand the behavior and emotion of humans in the video. The researches for deep and context semantic knowledge in the video are few at present. Many researches on the knowledge mining of characters and visual relationships between humans still remain on static picture, lacking attention to the temporal visual features and other important modalities. In order to better mine the semantic knowledge in the video, we propose the novel Global-local VLAD (GL-VLAD) module, using the convolution of different scales to enlarge different receptive fields and extract the global and local information of features in the video. In addition, we propose a Multimodal Fusion Graph(MFG) to focus on the knowledge of different modalities, which can represent the general features in multi-modal video scenes. We use this method to conduct a large number of experiments of social relation extraction and person recognition on the dataset MovieGraphs and IQIYI- VID-2019. The accuracy and mAP respectively reach 90.23% and 89.87% on IQIYI-VID-2019. The accuracy achieves 56.13 % on the fine-grained dataset MovieGraphs for relation extraction task, while the person recognition of which has values 89.31 % and 85.24% on accuracy and mAP. The experimental results show that our proposed method has better performance than the state-of-the-art methods.
视频包含与人相关的多种模态的丰富语义知识。挖掘视频中深层或潜在的语义知识可以帮助人工智能更好地理解视频中人类的行为和情感。目前对视频中深度和语境语义知识的研究还很少。许多关于人物和人与人之间的视觉关系的知识挖掘研究仍然停留在静态图像上,缺乏对时间视觉特征和其他重要模态的关注。为了更好地挖掘视频中的语义知识,我们提出了一种新的全局-局部VLAD (GL-VLAD)模块,利用不同尺度的卷积来扩大不同的感受域,提取视频中特征的全局和局部信息。此外,我们提出了一个多模态融合图(Multimodal Fusion Graph, MFG)来关注不同模态的知识,它可以代表多模态视频场景的一般特征。我们利用该方法在电影图和爱奇艺- VID-2019数据集上进行了大量的社会关系提取和人物识别实验。在爱奇艺- vid -2019上,准确率和mAP分别达到90.23%和89.87%。在细粒度数据MovieGraphs上进行关系提取的准确率达到56.13%,其中人物识别的准确率和mAP值分别为89.31%和85.24%。实验结果表明,该方法比现有方法具有更好的性能。
{"title":"Recognizing Characters and Relationships from Videos via Spatial-Temporal and Multimodal Cues","authors":"Chenyu Cao, C. Yan, Fangtao Li, Zihe Liu, Z. Wang, Bin Wu","doi":"10.1109/ICKG52313.2021.00032","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00032","url":null,"abstract":"Video contains rich semantic knowledge of multiple modalities related to a person. Mining deep or potential semantic knowledge in the video could help artificial intelligence better understand the behavior and emotion of humans in the video. The researches for deep and context semantic knowledge in the video are few at present. Many researches on the knowledge mining of characters and visual relationships between humans still remain on static picture, lacking attention to the temporal visual features and other important modalities. In order to better mine the semantic knowledge in the video, we propose the novel Global-local VLAD (GL-VLAD) module, using the convolution of different scales to enlarge different receptive fields and extract the global and local information of features in the video. In addition, we propose a Multimodal Fusion Graph(MFG) to focus on the knowledge of different modalities, which can represent the general features in multi-modal video scenes. We use this method to conduct a large number of experiments of social relation extraction and person recognition on the dataset MovieGraphs and IQIYI- VID-2019. The accuracy and mAP respectively reach 90.23% and 89.87% on IQIYI-VID-2019. The accuracy achieves 56.13 % on the fine-grained dataset MovieGraphs for relation extraction task, while the person recognition of which has values 89.31 % and 85.24% on accuracy and mAP. The experimental results show that our proposed method has better performance than the state-of-the-art methods.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122046670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00011
Yi Zhu, Bingbing Dong, Zhiqing Sha
With the rapid increase in the amount of website data, it has been a more difficult task for users to get the infor-mation they are interested in. Personalized recommendation is an important bridge to find the information which users really need on the website. Many recent studies have introduced additional attribute information about users and/or items to the rating matrix for alleviating the problem of data sparsity. In order to make full use of the attribute information and scoring matrix, deep learning based recommendation methods are proposed, especially the autoencoder model has attracted much attention because of its strong ability to learn hidden features. However, most of the existing autoencoder- based models require that the dimension of the input layer is equal to the dimension of the output layer, which may increase model complexity and certain information loss when using attribute information. In addition, as users' awareness of privacy protection increases, user attribute information is difficult to obtain. To address the above problems, in this paper, we propose a hybrid personalized recommendation model, which uses a semi-autoencoder to jointly embed the item's score vector and internal graph features (short for Co-Agpre). Specifically, we regard the user-item historical interaction matrix as a bipartite graph, and the Laplacian of the user-item co-occurrence graph is utilized to obtain the graph features of the item for solving the problem of sparse attributes. Then a semi-autoencoder is introduced to learn the hidden features of the item and perform rating prediction. The proposed model can flexibly use information from different sources to reduce the complexity of the model. Experiments on two real-world datasets demonstrate the effectiveness of the proposed Co-Agpre compared with state-of-the-art methods.
{"title":"Personalized Recommendation Based On Entity Attributes and Graph Features","authors":"Yi Zhu, Bingbing Dong, Zhiqing Sha","doi":"10.1109/ICKG52313.2021.00011","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00011","url":null,"abstract":"With the rapid increase in the amount of website data, it has been a more difficult task for users to get the infor-mation they are interested in. Personalized recommendation is an important bridge to find the information which users really need on the website. Many recent studies have introduced additional attribute information about users and/or items to the rating matrix for alleviating the problem of data sparsity. In order to make full use of the attribute information and scoring matrix, deep learning based recommendation methods are proposed, especially the autoencoder model has attracted much attention because of its strong ability to learn hidden features. However, most of the existing autoencoder- based models require that the dimension of the input layer is equal to the dimension of the output layer, which may increase model complexity and certain information loss when using attribute information. In addition, as users' awareness of privacy protection increases, user attribute information is difficult to obtain. To address the above problems, in this paper, we propose a hybrid personalized recommendation model, which uses a semi-autoencoder to jointly embed the item's score vector and internal graph features (short for Co-Agpre). Specifically, we regard the user-item historical interaction matrix as a bipartite graph, and the Laplacian of the user-item co-occurrence graph is utilized to obtain the graph features of the item for solving the problem of sparse attributes. Then a semi-autoencoder is introduced to learn the hidden features of the item and perform rating prediction. The proposed model can flexibly use information from different sources to reduce the complexity of the model. Experiments on two real-world datasets demonstrate the effectiveness of the proposed Co-Agpre compared with state-of-the-art methods.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130725767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICKG52313.2021.00017
Shujing Ke, P. Spronck, B. Goertzel, Alex Van der Peet
Pattern mining usually results in huge amounts of patterns, among which only small percentages are interesting. In this paper, Surprisingness (including Surpringness_I and Surpringness_II) is proposed as an innovative objective multivariate interestingness measure for automatically identifying interesting patterns from a large quantity of patterns. Surprisingness is applicable in unstructured or semi-structured, multi-domain or mixed-domain data compared to existing measures. An experiment has been conducted enabling unsupervised learning of common sense, interesting patterns and exceptions from a knowledge graph database built from Wikipedia 1 extracted data (represented as directed labeled hypergraphs), using Surpringness.
{"title":"Surprisingness - A Novel Objective Interestingness Measure in Hypergraph Pattern Mining from Knowledge Graphs for Common Sense Learning","authors":"Shujing Ke, P. Spronck, B. Goertzel, Alex Van der Peet","doi":"10.1109/ICKG52313.2021.00017","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00017","url":null,"abstract":"Pattern mining usually results in huge amounts of patterns, among which only small percentages are interesting. In this paper, Surprisingness (including Surpringness_I and Surpringness_II) is proposed as an innovative objective multivariate interestingness measure for automatically identifying interesting patterns from a large quantity of patterns. Surprisingness is applicable in unstructured or semi-structured, multi-domain or mixed-domain data compared to existing measures. An experiment has been conducted enabling unsupervised learning of common sense, interesting patterns and exceptions from a knowledge graph database built from Wikipedia 1 extracted data (represented as directed labeled hypergraphs), using Surpringness.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126543680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}