首页 > 最新文献

2021 IEEE International Conference on Big Knowledge (ICBK)最新文献

英文 中文
CSRDA: Cost-sensitive Regularized Dual Averaging for Handling Imbalanced and High-dimensional Streaming Data 处理不平衡和高维流数据的成本敏感正则化双平均
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00031
Zhong Chen, Zhide Fang, Victor S. Sheng, Andrea Edwards, Kun Zhang
Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.
类不平衡影响数据流挖掘模型的预测能力,是在线学习中最具挑战性的问题之一。大多数现有的在线学习方法缺乏有效的机制来处理类分布偏态的高维流数据,导致模型解释不足和在线性能下降。本文提出了一种代价敏感正则化对偶平均(CSRDA)方法来解决这一问题。我们提出的方法通过构造一个新的凸优化函数,大大扩展了有影响力的正则化对偶平均(RDA)方法。具体而言,分别直接优化两个$R$ 1范数正则化代价敏感目标函数。然后从理论上分析了CSRDA的遗憾界和原始变量界。因此,CSRDA受益于实现平衡成本和稀疏性的理论收敛,用于严重不平衡和高维流数据挖掘。为了验证我们的方法,我们在六个具有不同失衡比率的基准流数据集上进行了广泛的实验。实验结果表明,与其他基线方法相比,CSRDA不仅提高了分类性能,而且更有效地捕获了稀疏特征,具有更好的可解释性。
{"title":"CSRDA: Cost-sensitive Regularized Dual Averaging for Handling Imbalanced and High-dimensional Streaming Data","authors":"Zhong Chen, Zhide Fang, Victor S. Sheng, Andrea Edwards, Kun Zhang","doi":"10.1109/ICKG52313.2021.00031","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00031","url":null,"abstract":"Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130579015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
YABKO-Yet Another Big Knowledge Organization yabko——又一个大型知识组织
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00041
R. Lu, Chaoqun Fei, Chuanqing Wang, Yu Huang, Songmao Zhang
Knowledge graph and its processing techniques have got wide spread attention from the AI and knowledge engineering society. However, the knowledge graph supporting platforms have gained much less concern. This paper emphasizes the role of knowledge graph platforms as an independent product of knowledge engineering. Starting from the introduction of HAPE - a programmable universal big knowledge graph platform, which is a predecessor of YABKO, we introduce the idea and technique of Web-based resource sharing public knowledge graph laboratory and its implementation YABKO, which has a threefold target. Firstly, it is an open source platform for researchers doing experimental research on knowledge graphs supported by YABKO's own resources. Secondly, it supports research on big knowledge engineering, in particular in the knowledge graph area. Thirdly, it supports a full life cycle research on big knowledge graphs. Further we introduce YABKOS, a constellation of YABKOs on the Web, which is a decentralized research lab for large scale knowledge graph experiments. Also the wide area programming language Knorc for knowledge graphs' operation orchestration is introduced.
知识图及其处理技术受到了人工智能和知识工程社会的广泛关注。然而,知识图谱支持平台却很少受到关注。本文强调了知识图谱平台作为知识工程的独立产物的作用。本文从YABKO的前身——可编程通用大知识图谱平台HAPE开始,介绍了基于web的资源共享公共知识图谱实验室的思想和技术及其实现YABKO的三个目标。首先,它是一个开源平台,供研究人员在YABKO自有资源的支持下进行知识图谱的实验研究。其次,支持大知识工程,特别是知识图谱领域的研究。第三,支持大知识图谱的全生命周期研究。我们进一步介绍了YABKOS,它是一个分散的研究实验室,用于大规模的知识图谱实验。介绍了知识图操作编排的广域编程语言Knorc。
{"title":"YABKO-Yet Another Big Knowledge Organization","authors":"R. Lu, Chaoqun Fei, Chuanqing Wang, Yu Huang, Songmao Zhang","doi":"10.1109/ICKG52313.2021.00041","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00041","url":null,"abstract":"Knowledge graph and its processing techniques have got wide spread attention from the AI and knowledge engineering society. However, the knowledge graph supporting platforms have gained much less concern. This paper emphasizes the role of knowledge graph platforms as an independent product of knowledge engineering. Starting from the introduction of HAPE - a programmable universal big knowledge graph platform, which is a predecessor of YABKO, we introduce the idea and technique of Web-based resource sharing public knowledge graph laboratory and its implementation YABKO, which has a threefold target. Firstly, it is an open source platform for researchers doing experimental research on knowledge graphs supported by YABKO's own resources. Secondly, it supports research on big knowledge engineering, in particular in the knowledge graph area. Thirdly, it supports a full life cycle research on big knowledge graphs. Further we introduce YABKOS, a constellation of YABKOs on the Web, which is a decentralized research lab for large scale knowledge graph experiments. Also the wide area programming language Knorc for knowledge graphs' operation orchestration is introduced.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123208903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Gradient-based DAG Learning by Structural Asymmetry 基于结构不对称的梯度DAG学习改进
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00022
Yujie Wang, Shuai Yang, Xianjie Guo, Kui Yu
Directed acyclic graph (DAG) learning plays a fun-damental role in causal inference and other scientific scenes, which aims to uncover the relationships between variables. However, identifying a DAG from observational data has al-ways been a challenging task. Recently, gradient-based DAG learning algorithms that convert a combination-optimization DAG learning problem into a continuous-optimization problem have achieved emerging successes. These algorithms are easy to optimize and able to deal with both parametric and non-parametric data but suffer from many reversed edges learnt by these algorithms. In this paper, we propose a framework named Residual Independence Test (RIT) to correct those reversed edges by leveraging the structural asymmetry reflected in the depen-dence between regression residual and direct cause. We conduct extensive experiments on both synthetic and benchmark datasets, the results show that the RIT framework significantly improve the performance of gradient-based DAG learning algorithms.
有向无环图(DAG)学习在因果推理和其他科学场景中发挥着重要作用,旨在揭示变量之间的关系。然而,从观测数据中确定DAG一直是一项具有挑战性的任务。最近,基于梯度的DAG学习算法将组合优化DAG学习问题转化为连续优化问题,已经取得了一些成功。这些算法易于优化,能够处理参数和非参数数据,但这些算法存在许多反向边。本文提出了残差独立性检验(RIT)框架,利用回归残差与直接原因之间的依赖关系所反映的结构不对称性来纠正这些反向边。我们在合成数据集和基准数据集上进行了大量实验,结果表明RIT框架显著提高了基于梯度的DAG学习算法的性能。
{"title":"Improving Gradient-based DAG Learning by Structural Asymmetry","authors":"Yujie Wang, Shuai Yang, Xianjie Guo, Kui Yu","doi":"10.1109/ICKG52313.2021.00022","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00022","url":null,"abstract":"Directed acyclic graph (DAG) learning plays a fun-damental role in causal inference and other scientific scenes, which aims to uncover the relationships between variables. However, identifying a DAG from observational data has al-ways been a challenging task. Recently, gradient-based DAG learning algorithms that convert a combination-optimization DAG learning problem into a continuous-optimization problem have achieved emerging successes. These algorithms are easy to optimize and able to deal with both parametric and non-parametric data but suffer from many reversed edges learnt by these algorithms. In this paper, we propose a framework named Residual Independence Test (RIT) to correct those reversed edges by leveraging the structural asymmetry reflected in the depen-dence between regression residual and direct cause. We conduct extensive experiments on both synthetic and benchmark datasets, the results show that the RIT framework significantly improve the performance of gradient-based DAG learning algorithms.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123238267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attribute Similarity and Relevance-Based Product Schema Matching for Targeted Catalog Enrichment 基于属性相似度和相关性的目标目录丰富产品模式匹配
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00043
Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu
Many eCommerce catalogs rely on structured prod-uct data to provide a good experience for customers. For large scale services, product information is provided by millions of different manufacturer and vendor schemas. Due to inherent heterogeneity of this data, unifying it to a consistent catalog schema remains a challenge. Schema matching is the problem of finding such correspondences between concepts in different distributed, heterogeneous data sources. Most approaches in automated schema matching assume either a small number of source schemas, attributes, and contexts (i.e., matching movie attributes from media knowledge bases). By contrast, schema matching in product catalogs encounter the problem of scaling across millions of noisy, heterogenous schemas spanning thou-sands of categories and attributes. In this paper, we introduce a scalable schema matching framework that utilizes unsupervised domain-specific attribute representations and general attribute similarity metrics. Our method first identifies relevant attributes for a given product based on existing customer signals, and then prioritizes among candidate attributes to consolidate only those relevant product facts from multiple manufacturers and vendors with little to no labeled data. We demonstrate value by experiments that enriched catalog data containing millions of attribute enumer-ations sourced from tens of thousands of schemas across a wide range of product categories. Experimental results show reduced manual annotation efforts by 75% from competing schema matching efforts by automating schema matching on targeted product facts, resulting in high accuracy, precision, and recall for important attributes that contribute to customer interest. We also demonstrate performance improvements of 8% MRR using our approach compared against two well-established approaches to unsupervised schema matching.
许多电子商务目录依赖于结构化的产品数据来为客户提供良好的体验。对于大规模服务,产品信息由数百万个不同的制造商和供应商模式提供。由于这些数据固有的异构性,将其统一到一致的目录模式仍然是一个挑战。模式匹配是在不同的分布式异构数据源中找到概念之间的对应关系的问题。自动化模式匹配中的大多数方法都假设有少量的源模式、属性和上下文(例如,匹配来自媒体知识库的电影属性)。相比之下,产品目录中的模式匹配遇到了跨数百万个嘈杂的异构模式进行扩展的问题,这些模式跨越数千个类别和属性。在本文中,我们引入了一个可扩展的模式匹配框架,该框架利用无监督的特定于领域的属性表示和通用的属性相似度度量。我们的方法首先根据现有的客户信号识别给定产品的相关属性,然后在候选属性中确定优先级,仅合并来自多个制造商和供应商的相关产品事实,几乎没有标记数据。我们通过实验证明了它的价值,这些实验丰富了包含数百万个属性枚举的目录数据,这些属性枚举来自广泛产品类别中的数万个模式。实验结果表明,通过对目标产品事实进行自动化模式匹配,可以减少75%的手动注释工作,从而提高有助于客户兴趣的重要属性的准确性、精确度和召回率。我们还证明,与两种成熟的无监督模式匹配方法相比,使用我们的方法可以提高8%的MRR性能。
{"title":"Attribute Similarity and Relevance-Based Product Schema Matching for Targeted Catalog Enrichment","authors":"Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu","doi":"10.1109/ICKG52313.2021.00043","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00043","url":null,"abstract":"Many eCommerce catalogs rely on structured prod-uct data to provide a good experience for customers. For large scale services, product information is provided by millions of different manufacturer and vendor schemas. Due to inherent heterogeneity of this data, unifying it to a consistent catalog schema remains a challenge. Schema matching is the problem of finding such correspondences between concepts in different distributed, heterogeneous data sources. Most approaches in automated schema matching assume either a small number of source schemas, attributes, and contexts (i.e., matching movie attributes from media knowledge bases). By contrast, schema matching in product catalogs encounter the problem of scaling across millions of noisy, heterogenous schemas spanning thou-sands of categories and attributes. In this paper, we introduce a scalable schema matching framework that utilizes unsupervised domain-specific attribute representations and general attribute similarity metrics. Our method first identifies relevant attributes for a given product based on existing customer signals, and then prioritizes among candidate attributes to consolidate only those relevant product facts from multiple manufacturers and vendors with little to no labeled data. We demonstrate value by experiments that enriched catalog data containing millions of attribute enumer-ations sourced from tens of thousands of schemas across a wide range of product categories. Experimental results show reduced manual annotation efforts by 75% from competing schema matching efforts by automating schema matching on targeted product facts, resulting in high accuracy, precision, and recall for important attributes that contribute to customer interest. We also demonstrate performance improvements of 8% MRR using our approach compared against two well-established approaches to unsupervised schema matching.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116356656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian Model-Based Fully Convolutional Networks for Multivariate Time Series Classification 基于高斯模型的全卷积网络多变量时间序列分类
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00028
Changyang Tai, Ze Yang, Huicheng Zhang, Gongqing Wu, Junwei Lv, Xianyu Bao
Multivariate time series (MTS) classification has been regarded as one of the most challenging problems in data mining due to the difficulty in modeling the correlation of variables and samples. In addition, high-dimensional MTS modeling has a large time and space consumption. This paper proposes a novel method, Gaussian Model-based Fully Convolutional Networks (GM-FCN), to improve the performance of high-dimensional MTS classification. Each original MTS is converted into multivariate Gaussian model parameters as the input of FCN. These parameters effectively capture the correlation be-tween MTS variables and significantly reduce the data scale by aligning an MTS size to its dimension. FCN is designed to learn more in-depth features of MTS based on these parameters for modeling the correlation between samples. Thus, GM-FCN can not only model the correlation between variables, but also the correlation between samples. We compare GM-FCN with nine state-of-the-art MTS classification methods, INN-ED, INN-DTW-i, INN-DTW-D, KLD-GMC, MLP, ResNet, Encoder, MCNN, and MCDCNN, on four high-dimensional public datasets, experimen-tal results show that the accuracy of G M - FCN is significantly superior to the others. Besides, the training time of GM-FCN is dozens of times faster than FCN using the original equal-length MTS data as input.
多变量时间序列(MTS)分类由于难以对变量和样本之间的相关性进行建模,一直被认为是数据挖掘中最具挑战性的问题之一。此外,高维MTS建模具有较大的时间和空间消耗。本文提出了一种新的基于高斯模型的全卷积网络(GM-FCN)方法来提高高维MTS分类的性能。每个原始MTS被转换成多元高斯模型参数作为FCN的输入。这些参数有效地捕获了MTS变量之间的相关性,并通过将MTS大小与其维度对齐来显著减小数据规模。FCN的目的是基于这些参数学习更深入的MTS特征,对样本间的相关性进行建模。因此,GM-FCN不仅可以模拟变量之间的相关性,还可以模拟样本之间的相关性。在4个高维公共数据集上,将GM-FCN与9种最先进的MTS分类方法(INN-ED、INN-DTW-i、INN-DTW-D、KLD-GMC、MLP、ResNet、Encoder、MCNN和MCDCNN)进行了比较,实验结果表明GM-FCN的准确率明显优于其他方法。GM-FCN的训练时间比使用原始等长MTS数据作为输入的FCN快几十倍。
{"title":"Gaussian Model-Based Fully Convolutional Networks for Multivariate Time Series Classification","authors":"Changyang Tai, Ze Yang, Huicheng Zhang, Gongqing Wu, Junwei Lv, Xianyu Bao","doi":"10.1109/ICKG52313.2021.00028","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00028","url":null,"abstract":"Multivariate time series (MTS) classification has been regarded as one of the most challenging problems in data mining due to the difficulty in modeling the correlation of variables and samples. In addition, high-dimensional MTS modeling has a large time and space consumption. This paper proposes a novel method, Gaussian Model-based Fully Convolutional Networks (GM-FCN), to improve the performance of high-dimensional MTS classification. Each original MTS is converted into multivariate Gaussian model parameters as the input of FCN. These parameters effectively capture the correlation be-tween MTS variables and significantly reduce the data scale by aligning an MTS size to its dimension. FCN is designed to learn more in-depth features of MTS based on these parameters for modeling the correlation between samples. Thus, GM-FCN can not only model the correlation between variables, but also the correlation between samples. We compare GM-FCN with nine state-of-the-art MTS classification methods, INN-ED, INN-DTW-i, INN-DTW-D, KLD-GMC, MLP, ResNet, Encoder, MCNN, and MCDCNN, on four high-dimensional public datasets, experimen-tal results show that the accuracy of G M - FCN is significantly superior to the others. Besides, the training time of GM-FCN is dozens of times faster than FCN using the original equal-length MTS data as input.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121635404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An efficient framework for sentence similarity inspired by quantum computing 基于量子计算的句子相似度分析框架
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00030
Yan Yu, Dong Qiu, Ruiteng Yan
Accurately extracting the semantic information and the syntactic structure of sentences is important in natural language processing. Existing methods mainly combine the dependency tree to deep learning with complex computation time to achieve enough semantic information. It is essential to obtain sufficient semantic information and syntactic structures without any prior knowledge excepting word2vec. This paper proposes a model on sentence representation inspired by quantum entanglement using the tensor product to entangle both two consecutive notional words and words with depen-dencies. Inspired by quantum entanglement coefficients, we construct two different entanglement coefficients to weight the different semantic contributions of words with different relations. Finally, the proposed model is applied to SICK_train to verify their performances. The experimental results show that the provided methods achieve perfect results.
准确提取句子的语义信息和句法结构是自然语言处理的重要内容。现有的方法主要是将依赖树与计算时间复杂的深度学习相结合,以获得足够的语义信息。除了word2vec之外,在没有任何先验知识的情况下获得足够的语义信息和句法结构是至关重要的。本文提出了一个受量子纠缠启发的句子表示模型,利用张量积来纠缠两个连续的概念词和有依赖关系的词。受量子纠缠系数的启发,我们构建了两个不同的纠缠系数来加权具有不同关系的词的不同语义贡献。最后,将该模型应用于SICK_train,验证了其性能。实验结果表明,所提供的方法取得了较好的效果。
{"title":"An efficient framework for sentence similarity inspired by quantum computing","authors":"Yan Yu, Dong Qiu, Ruiteng Yan","doi":"10.1109/ICKG52313.2021.00030","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00030","url":null,"abstract":"Accurately extracting the semantic information and the syntactic structure of sentences is important in natural language processing. Existing methods mainly combine the dependency tree to deep learning with complex computation time to achieve enough semantic information. It is essential to obtain sufficient semantic information and syntactic structures without any prior knowledge excepting word2vec. This paper proposes a model on sentence representation inspired by quantum entanglement using the tensor product to entangle both two consecutive notional words and words with depen-dencies. Inspired by quantum entanglement coefficients, we construct two different entanglement coefficients to weight the different semantic contributions of words with different relations. Finally, the proposed model is applied to SICK_train to verify their performances. The experimental results show that the provided methods achieve perfect results.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Round Parsing-based Multiword Rules for Scientific Knowledge Extraction 基于多轮解析的多词规则科学知识抽取
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00051
Joseph Kuebler, Lingbo Tong, Meng Jiang
Information extraction (IE) in scientific literature has facilitated many down-stream knowledge-driven tasks. Ope-nIE, which does not require any relation schema but identifies a relational phrase to describe the relationship between a subject and an object, is being a trending topic of IE in sciences. The subjects, objects, and relations are often multiword expressions, which brings challenges for methods to identify the boundaries of the expressions given very limited or even no training data. In this work, we present a set of rules for extracting structured information based on dependency parsing that can be applied to any scientific dataset requiring no expert's annotation. Results on novel datasets show the effectiveness of the proposed method. We discuss negative results as well.
科学文献中的信息提取(IE)为许多下游知识驱动任务提供了便利。open - nie不需要任何关系模式,而是识别一个关系短语来描述主体和客体之间的关系,正在成为科学领域IE的一个热门话题。主题、对象和关系往往是多词表达,这给在非常有限甚至没有训练数据的情况下识别表达边界的方法带来了挑战。在这项工作中,我们提出了一套基于依赖解析提取结构化信息的规则,可以应用于任何不需要专家注释的科学数据集。在新数据集上的结果表明了该方法的有效性。我们也会讨论负面结果。
{"title":"Multi-Round Parsing-based Multiword Rules for Scientific Knowledge Extraction","authors":"Joseph Kuebler, Lingbo Tong, Meng Jiang","doi":"10.1109/ICKG52313.2021.00051","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00051","url":null,"abstract":"Information extraction (IE) in scientific literature has facilitated many down-stream knowledge-driven tasks. Ope-nIE, which does not require any relation schema but identifies a relational phrase to describe the relationship between a subject and an object, is being a trending topic of IE in sciences. The subjects, objects, and relations are often multiword expressions, which brings challenges for methods to identify the boundaries of the expressions given very limited or even no training data. In this work, we present a set of rules for extracting structured information based on dependency parsing that can be applied to any scientific dataset requiring no expert's annotation. Results on novel datasets show the effectiveness of the proposed method. We discuss negative results as well.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134445594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recognizing Characters and Relationships from Videos via Spatial-Temporal and Multimodal Cues 通过时空和多模态线索识别视频中的人物和关系
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00032
Chenyu Cao, C. Yan, Fangtao Li, Zihe Liu, Z. Wang, Bin Wu
Video contains rich semantic knowledge of multiple modalities related to a person. Mining deep or potential semantic knowledge in the video could help artificial intelligence better understand the behavior and emotion of humans in the video. The researches for deep and context semantic knowledge in the video are few at present. Many researches on the knowledge mining of characters and visual relationships between humans still remain on static picture, lacking attention to the temporal visual features and other important modalities. In order to better mine the semantic knowledge in the video, we propose the novel Global-local VLAD (GL-VLAD) module, using the convolution of different scales to enlarge different receptive fields and extract the global and local information of features in the video. In addition, we propose a Multimodal Fusion Graph(MFG) to focus on the knowledge of different modalities, which can represent the general features in multi-modal video scenes. We use this method to conduct a large number of experiments of social relation extraction and person recognition on the dataset MovieGraphs and IQIYI- VID-2019. The accuracy and mAP respectively reach 90.23% and 89.87% on IQIYI-VID-2019. The accuracy achieves 56.13 % on the fine-grained dataset MovieGraphs for relation extraction task, while the person recognition of which has values 89.31 % and 85.24% on accuracy and mAP. The experimental results show that our proposed method has better performance than the state-of-the-art methods.
视频包含与人相关的多种模态的丰富语义知识。挖掘视频中深层或潜在的语义知识可以帮助人工智能更好地理解视频中人类的行为和情感。目前对视频中深度和语境语义知识的研究还很少。许多关于人物和人与人之间的视觉关系的知识挖掘研究仍然停留在静态图像上,缺乏对时间视觉特征和其他重要模态的关注。为了更好地挖掘视频中的语义知识,我们提出了一种新的全局-局部VLAD (GL-VLAD)模块,利用不同尺度的卷积来扩大不同的感受域,提取视频中特征的全局和局部信息。此外,我们提出了一个多模态融合图(Multimodal Fusion Graph, MFG)来关注不同模态的知识,它可以代表多模态视频场景的一般特征。我们利用该方法在电影图和爱奇艺- VID-2019数据集上进行了大量的社会关系提取和人物识别实验。在爱奇艺- vid -2019上,准确率和mAP分别达到90.23%和89.87%。在细粒度数据MovieGraphs上进行关系提取的准确率达到56.13%,其中人物识别的准确率和mAP值分别为89.31%和85.24%。实验结果表明,该方法比现有方法具有更好的性能。
{"title":"Recognizing Characters and Relationships from Videos via Spatial-Temporal and Multimodal Cues","authors":"Chenyu Cao, C. Yan, Fangtao Li, Zihe Liu, Z. Wang, Bin Wu","doi":"10.1109/ICKG52313.2021.00032","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00032","url":null,"abstract":"Video contains rich semantic knowledge of multiple modalities related to a person. Mining deep or potential semantic knowledge in the video could help artificial intelligence better understand the behavior and emotion of humans in the video. The researches for deep and context semantic knowledge in the video are few at present. Many researches on the knowledge mining of characters and visual relationships between humans still remain on static picture, lacking attention to the temporal visual features and other important modalities. In order to better mine the semantic knowledge in the video, we propose the novel Global-local VLAD (GL-VLAD) module, using the convolution of different scales to enlarge different receptive fields and extract the global and local information of features in the video. In addition, we propose a Multimodal Fusion Graph(MFG) to focus on the knowledge of different modalities, which can represent the general features in multi-modal video scenes. We use this method to conduct a large number of experiments of social relation extraction and person recognition on the dataset MovieGraphs and IQIYI- VID-2019. The accuracy and mAP respectively reach 90.23% and 89.87% on IQIYI-VID-2019. The accuracy achieves 56.13 % on the fine-grained dataset MovieGraphs for relation extraction task, while the person recognition of which has values 89.31 % and 85.24% on accuracy and mAP. The experimental results show that our proposed method has better performance than the state-of-the-art methods.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122046670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Personalized Recommendation Based On Entity Attributes and Graph Features 基于实体属性和图特征的个性化推荐
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00011
Yi Zhu, Bingbing Dong, Zhiqing Sha
With the rapid increase in the amount of website data, it has been a more difficult task for users to get the infor-mation they are interested in. Personalized recommendation is an important bridge to find the information which users really need on the website. Many recent studies have introduced additional attribute information about users and/or items to the rating matrix for alleviating the problem of data sparsity. In order to make full use of the attribute information and scoring matrix, deep learning based recommendation methods are proposed, especially the autoencoder model has attracted much attention because of its strong ability to learn hidden features. However, most of the existing autoencoder- based models require that the dimension of the input layer is equal to the dimension of the output layer, which may increase model complexity and certain information loss when using attribute information. In addition, as users' awareness of privacy protection increases, user attribute information is difficult to obtain. To address the above problems, in this paper, we propose a hybrid personalized recommendation model, which uses a semi-autoencoder to jointly embed the item's score vector and internal graph features (short for Co-Agpre). Specifically, we regard the user-item historical interaction matrix as a bipartite graph, and the Laplacian of the user-item co-occurrence graph is utilized to obtain the graph features of the item for solving the problem of sparse attributes. Then a semi-autoencoder is introduced to learn the hidden features of the item and perform rating prediction. The proposed model can flexibly use information from different sources to reduce the complexity of the model. Experiments on two real-world datasets demonstrate the effectiveness of the proposed Co-Agpre compared with state-of-the-art methods.
随着网站数据量的快速增长,用户获取自己感兴趣的信息变得越来越困难。个性化推荐是在网站上找到用户真正需要的信息的重要桥梁。最近的许多研究在评级矩阵中引入了关于用户和/或项目的附加属性信息,以减轻数据稀疏性问题。为了充分利用属性信息和评分矩阵,提出了基于深度学习的推荐方法,特别是自编码器模型因其学习隐藏特征的能力强而备受关注。然而,现有的基于自编码器的模型大多要求输入层的维数与输出层的维数相等,这可能会增加模型的复杂度,并且在使用属性信息时存在一定的信息损失。此外,随着用户隐私保护意识的增强,用户属性信息难以获取。为了解决上述问题,本文提出了一种混合个性化推荐模型,该模型使用半自动编码器联合嵌入项目的分数向量和内部图特征(简称Co-Agpre)。具体而言,我们将用户-物品历史交互矩阵视为二部图,利用用户-物品共现图的拉普拉斯算子获得物品的图特征,解决属性稀疏问题。然后引入半自动编码器来学习项目的隐藏特征并进行评分预测。该模型可以灵活地利用不同来源的信息,降低了模型的复杂性。在两个真实数据集上的实验表明,与最先进的方法相比,所提出的Co-Agpre方法是有效的。
{"title":"Personalized Recommendation Based On Entity Attributes and Graph Features","authors":"Yi Zhu, Bingbing Dong, Zhiqing Sha","doi":"10.1109/ICKG52313.2021.00011","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00011","url":null,"abstract":"With the rapid increase in the amount of website data, it has been a more difficult task for users to get the infor-mation they are interested in. Personalized recommendation is an important bridge to find the information which users really need on the website. Many recent studies have introduced additional attribute information about users and/or items to the rating matrix for alleviating the problem of data sparsity. In order to make full use of the attribute information and scoring matrix, deep learning based recommendation methods are proposed, especially the autoencoder model has attracted much attention because of its strong ability to learn hidden features. However, most of the existing autoencoder- based models require that the dimension of the input layer is equal to the dimension of the output layer, which may increase model complexity and certain information loss when using attribute information. In addition, as users' awareness of privacy protection increases, user attribute information is difficult to obtain. To address the above problems, in this paper, we propose a hybrid personalized recommendation model, which uses a semi-autoencoder to jointly embed the item's score vector and internal graph features (short for Co-Agpre). Specifically, we regard the user-item historical interaction matrix as a bipartite graph, and the Laplacian of the user-item co-occurrence graph is utilized to obtain the graph features of the item for solving the problem of sparse attributes. Then a semi-autoencoder is introduced to learn the hidden features of the item and perform rating prediction. The proposed model can flexibly use information from different sources to reduce the complexity of the model. Experiments on two real-world datasets demonstrate the effectiveness of the proposed Co-Agpre compared with state-of-the-art methods.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130725767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surprisingness - A Novel Objective Interestingness Measure in Hypergraph Pattern Mining from Knowledge Graphs for Common Sense Learning 惊奇度——基于常识学习的知识图超图模式挖掘中的一种新的客观兴趣度度量
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00017
Shujing Ke, P. Spronck, B. Goertzel, Alex Van der Peet
Pattern mining usually results in huge amounts of patterns, among which only small percentages are interesting. In this paper, Surprisingness (including Surpringness_I and Surpringness_II) is proposed as an innovative objective multivariate interestingness measure for automatically identifying interesting patterns from a large quantity of patterns. Surprisingness is applicable in unstructured or semi-structured, multi-domain or mixed-domain data compared to existing measures. An experiment has been conducted enabling unsupervised learning of common sense, interesting patterns and exceptions from a knowledge graph database built from Wikipedia 1 extracted data (represented as directed labeled hypergraphs), using Surpringness.
模式挖掘通常会产生大量的模式,其中只有一小部分是有趣的。本文提出了一种创新的客观多元兴趣度度量,用于从大量模式中自动识别有趣模式。与现有度量相比,惊奇度适用于非结构化或半结构化、多域或混合域数据。已经进行了一个实验,使用Surpringness从维基百科1提取的数据(表示为有向标记超图)构建的知识图谱数据库中实现常识、有趣模式和例外的无监督学习。
{"title":"Surprisingness - A Novel Objective Interestingness Measure in Hypergraph Pattern Mining from Knowledge Graphs for Common Sense Learning","authors":"Shujing Ke, P. Spronck, B. Goertzel, Alex Van der Peet","doi":"10.1109/ICKG52313.2021.00017","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00017","url":null,"abstract":"Pattern mining usually results in huge amounts of patterns, among which only small percentages are interesting. In this paper, Surprisingness (including Surpringness_I and Surpringness_II) is proposed as an innovative objective multivariate interestingness measure for automatically identifying interesting patterns from a large quantity of patterns. Surprisingness is applicable in unstructured or semi-structured, multi-domain or mixed-domain data compared to existing measures. An experiment has been conducted enabling unsupervised learning of common sense, interesting patterns and exceptions from a knowledge graph database built from Wikipedia 1 extracted data (represented as directed labeled hypergraphs), using Surpringness.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126543680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 IEEE International Conference on Big Knowledge (ICBK)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1