首页 > 最新文献

2021 IEEE International Conference on Big Knowledge (ICBK)最新文献

英文 中文
Unsupervised Type Constraint Inference in Bilinear Knowledge Graph Completion Models 双线性知识图补全模型中的无监督型约束推理
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00012
Yuxun Lu, R. Ichise
Knowledge graph completion (KGC) models aim to provide a feasible way of manipulating facts in knowledge graphs. Most KGC models do not consider type constraint in relations due to the scarcity of type information in training data. We proposed an unsupervised method for inferring type constraint based on existing bilinear KGC models. Our method induces two type indicators into every relation and adjusts the location of entity embeddings in feature space to match the type indicators. Our approach eliminates the external feature space for entity types and type constraints in relations and has a consistent feature space; therefore, it has fewer parameters than other methods. Experiments show that our methods can improve the performance of the base models and outperform other methods on datasets about general knowledge.
知识图谱补全(KGC)模型旨在提供一种可行的方法来操纵知识图谱中的事实。由于训练数据中类型信息的稀缺性,大多数KGC模型没有考虑关系中的类型约束。基于已有的双线性KGC模型,提出了一种推断类型约束的无监督方法。该方法在每个关系中引入两个类型指标,并调整实体嵌入在特征空间中的位置以匹配类型指标。该方法消除了实体类型的外部特征空间和关系中的类型约束,具有一致的特征空间;因此,它的参数比其他方法少。实验表明,我们的方法可以提高基本模型的性能,并且在关于一般知识的数据集上优于其他方法。
{"title":"Unsupervised Type Constraint Inference in Bilinear Knowledge Graph Completion Models","authors":"Yuxun Lu, R. Ichise","doi":"10.1109/ICKG52313.2021.00012","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00012","url":null,"abstract":"Knowledge graph completion (KGC) models aim to provide a feasible way of manipulating facts in knowledge graphs. Most KGC models do not consider type constraint in relations due to the scarcity of type information in training data. We proposed an unsupervised method for inferring type constraint based on existing bilinear KGC models. Our method induces two type indicators into every relation and adjusts the location of entity embeddings in feature space to match the type indicators. Our approach eliminates the external feature space for entity types and type constraints in relations and has a consistent feature space; therefore, it has fewer parameters than other methods. Experiments show that our methods can improve the performance of the base models and outperform other methods on datasets about general knowledge.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123059353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fuzzy c-Means Clustering with Discriminative Projection 判别投影模糊c均值聚类
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00062
Wenjun Wu, Lingling Zhang, Yiwei Chen, Xuan Luo, Bifan Wei, Jun Liu
The clustering technique plays an important role in data mining and machine learning fields. Clustering for high-dimensional data, such as texts, images, and videos, remains a challenging task due to the existence of many noise features. The widely used methods for this issue focus on mining a effective pattern in high-dimensional data using some dimensionality reduction techniques before clustering. This strategy slightly mitigates the effects of irrelevant and redundant features, but cannot significantly improve the clustering performance because the captured pattern by dimensionality reduction is not directly related to the clustering task. In this paper, we propose a unified framework to achieve discriminative dimensionality reduction and fuzzy clustering for high-dimensional data simultaneously. The proposed framework not only utilizes the clustering results to directly guide or supervise the process of discriminative dimensionality reduction, but also controls the clustering fuzziness more easily by a $F$ -norm regularization term. An efficient optimization algorithm is exploited to address the objective function of our method, which is proved to converge to the local optimal solution in theory. We evaluate the proposed method on three large-scale fine-grained image datasets, including Birds, Flowers, and Cars, for clustering and retrieval two tasks. The experimental results on metrics ACC, NMI, ARI and Recall@K indicate that our method achieves the comparable performance over the state-of-the-art methods.
聚类技术在数据挖掘和机器学习领域发挥着重要作用。由于存在许多噪声特征,高维数据(如文本、图像和视频)的聚类仍然是一项具有挑战性的任务。目前广泛使用的方法是在聚类前利用降维技术挖掘高维数据中的有效模式。该策略略微减轻了不相关和冗余特征的影响,但不能显著提高聚类性能,因为通过降维捕获的模式与聚类任务没有直接关系。本文提出了一个统一的框架来同时实现高维数据的判别降维和模糊聚类。该框架不仅利用聚类结果直接指导或监督判别降维过程,而且通过$F$范数正则化项更容易控制聚类的模糊性。利用一种有效的优化算法来求解该方法的目标函数,并在理论上证明该算法收敛于局部最优解。我们在三个大规模的细粒度图像数据集(包括鸟、花和车)上对该方法进行了聚类和检索两个任务的评估。在ACC, NMI, ARI和Recall@K指标上的实验结果表明,我们的方法达到了与最先进的方法相当的性能。
{"title":"Fuzzy c-Means Clustering with Discriminative Projection","authors":"Wenjun Wu, Lingling Zhang, Yiwei Chen, Xuan Luo, Bifan Wei, Jun Liu","doi":"10.1109/ICKG52313.2021.00062","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00062","url":null,"abstract":"The clustering technique plays an important role in data mining and machine learning fields. Clustering for high-dimensional data, such as texts, images, and videos, remains a challenging task due to the existence of many noise features. The widely used methods for this issue focus on mining a effective pattern in high-dimensional data using some dimensionality reduction techniques before clustering. This strategy slightly mitigates the effects of irrelevant and redundant features, but cannot significantly improve the clustering performance because the captured pattern by dimensionality reduction is not directly related to the clustering task. In this paper, we propose a unified framework to achieve discriminative dimensionality reduction and fuzzy clustering for high-dimensional data simultaneously. The proposed framework not only utilizes the clustering results to directly guide or supervise the process of discriminative dimensionality reduction, but also controls the clustering fuzziness more easily by a $F$ -norm regularization term. An efficient optimization algorithm is exploited to address the objective function of our method, which is proved to converge to the local optimal solution in theory. We evaluate the proposed method on three large-scale fine-grained image datasets, including Birds, Flowers, and Cars, for clustering and retrieval two tasks. The experimental results on metrics ACC, NMI, ARI and Recall@K indicate that our method achieves the comparable performance over the state-of-the-art methods.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"31 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122445511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global Semantics with Boundary Constraint Knowledge Graph for Chinese Financial Event Detection 基于边界约束知识图的全局语义中文金融事件检测
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00045
Yin Wang, Nan Xia, Xiangfeng Luo, Jinhui Li
Chinese financial event detection has a great significance in the application of financial risk analysis, en-terprise management and decision-making. The existing tasks of Chinese event detection are mainly regarded as character-based or word-based classification, which suffers from the ambiguity of trigger words. These tasks only concentrate on local information (e.g character and word), which loses sight of global information like sentence semantics. Furthermore, in the finance field, there exists the problem of fuzzy boundary between different event types. In this paper, we propose a global semantics with boundary constraint knowledge graph (BCKG) for Chinese financial event detection, which considers both sentence semantics and boundary knowledge. At first, Chinese financial dataset (CFD) is constructed by considering the complexity in financial area. And then, the sentence seman-tics embedding is obtained by pre-training BERT fine-tuning mechanism to address the problem of ambiguity of trigger words, which considers both syntactic information and context sentence semantics comprehensively. Finally, we construct the BCKG for financial event, which can add additional prior knowledge to solve fuzzy boundary problem. The proposed method for event detection achieves outstanding performance on standard ACE 2005 Chinese dataset and constructed CFD. The experimental results demonstrate the effectiveness of the proposed method.
中国财务事件检测在财务风险分析、企业管理和决策等方面的应用具有重要意义。现有的汉语事件检测任务主要是基于字符或基于词的分类,受到触发词歧义的影响。这些任务只关注局部信息(如字符和单词),而忽略了句子语义等全局信息。此外,在金融领域,不同事件类型之间存在模糊边界问题。本文提出了一种同时考虑句子语义和边界知识的中文金融事件检测全局语义与边界约束知识图(BCKG)。首先,考虑金融领域的复杂性,构建中国金融数据集(CFD)。然后,综合考虑句法信息和上下文句子语义,通过预训练BERT微调机制获得句子语义嵌入,解决触发词歧义问题。最后,我们构建了财务事件的BCKG,它可以增加额外的先验知识来解决模糊边界问题。本文提出的事件检测方法在标准的ACE 2005中文数据集和构建的CFD上取得了优异的性能。实验结果证明了该方法的有效性。
{"title":"Global Semantics with Boundary Constraint Knowledge Graph for Chinese Financial Event Detection","authors":"Yin Wang, Nan Xia, Xiangfeng Luo, Jinhui Li","doi":"10.1109/ICKG52313.2021.00045","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00045","url":null,"abstract":"Chinese financial event detection has a great significance in the application of financial risk analysis, en-terprise management and decision-making. The existing tasks of Chinese event detection are mainly regarded as character-based or word-based classification, which suffers from the ambiguity of trigger words. These tasks only concentrate on local information (e.g character and word), which loses sight of global information like sentence semantics. Furthermore, in the finance field, there exists the problem of fuzzy boundary between different event types. In this paper, we propose a global semantics with boundary constraint knowledge graph (BCKG) for Chinese financial event detection, which considers both sentence semantics and boundary knowledge. At first, Chinese financial dataset (CFD) is constructed by considering the complexity in financial area. And then, the sentence seman-tics embedding is obtained by pre-training BERT fine-tuning mechanism to address the problem of ambiguity of trigger words, which considers both syntactic information and context sentence semantics comprehensively. Finally, we construct the BCKG for financial event, which can add additional prior knowledge to solve fuzzy boundary problem. The proposed method for event detection achieves outstanding performance on standard ACE 2005 Chinese dataset and constructed CFD. The experimental results demonstrate the effectiveness of the proposed method.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115069927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Answer-Centric Local and Global Information Fusion for Conversational Question Generation 会话问题生成中以答案为中心的局部和全局信息融合
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00067
Panpan Lei, Xiao Sun
Conversational Question Generation (CQG) is a new concern in Question Generation (QG) study. Recently Seq2Seq neural network model has been widely used in the QG area. CQG model is also based on the Seq2Seq neural network model. We note a problem: the CQG model's input is not a single sentence, but a long text and conversation history. Seq2Seq model can't effectively process long input, the model will generate questions not related to the answer. To solve this problem, we propose an answer-centric local and global information fusion model. We extract the evidence sentence containing the answer in the passage and encode the evidence sentence and the passage information separately. On the one hand, we add answer-centered position tags in the passage to reinforce the attention of information related to the answer. On the other hand, we put the key sentence into the question type prediction model. By combining the answer position embedding to predict the question type, and then put the predicted question types in the key sentence to guide the generation of the question. Finally, we use a gate mechanism to merge key sentence information and passage information. The experimental results show that we have achieved better results.
会话问题生成(Conversational Question Generation, CQG)是问题生成研究中的一个新热点。近年来,Seq2Seq神经网络模型在QG领域得到了广泛的应用。CQG模型也是基于Seq2Seq神经网络模型。我们注意到一个问题:CQG模型的输入不是一个句子,而是一个很长的文本和对话历史。Seq2Seq模型不能有效处理长输入,模型会生成与答案不相关的问题。为了解决这个问题,我们提出了一个以答案为中心的局部和全局信息融合模型。我们提取出文章中包含答案的证据句,并将证据句和文章信息分别编码。一方面,我们在文章中添加以答案为中心的位置标签,以加强对与答案相关信息的关注。另一方面,我们将关键句放入题型预测模型中。通过结合答案位置嵌入来预测问题类型,然后将预测的问题类型放在关键句中来指导问题的生成。最后,我们使用门机制来合并关键句子信息和段落信息。实验结果表明,我们取得了较好的效果。
{"title":"Answer-Centric Local and Global Information Fusion for Conversational Question Generation","authors":"Panpan Lei, Xiao Sun","doi":"10.1109/ICKG52313.2021.00067","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00067","url":null,"abstract":"Conversational Question Generation (CQG) is a new concern in Question Generation (QG) study. Recently Seq2Seq neural network model has been widely used in the QG area. CQG model is also based on the Seq2Seq neural network model. We note a problem: the CQG model's input is not a single sentence, but a long text and conversation history. Seq2Seq model can't effectively process long input, the model will generate questions not related to the answer. To solve this problem, we propose an answer-centric local and global information fusion model. We extract the evidence sentence containing the answer in the passage and encode the evidence sentence and the passage information separately. On the one hand, we add answer-centered position tags in the passage to reinforce the attention of information related to the answer. On the other hand, we put the key sentence into the question type prediction model. By combining the answer position embedding to predict the question type, and then put the predicted question types in the key sentence to guide the generation of the question. Finally, we use a gate mechanism to merge key sentence information and passage information. The experimental results show that we have achieved better results.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124959532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Knowledge Enhanced Chinese GaoKao Reading Comprehension Method 一种知识强化的中国高考阅读理解方法
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00053
Xiao Zhang, Heqi Zheng, Heyan Huang, Zewen Chi, Xian-Ling Mao
Chinese GaoKao Reading Comprehension is a chal-lenging NLP task. It requires strong logical reasoning ability to capture deep semantic relations between the questions and answers. However, most traditional models cannot learn sufficient inference ability, because of the scarcity of Chinese GaoKao reading comprehension data. Intuitively, there are two methods to improve the reading comprehension ability for Chinese GaoKao reading comprehension task. 1). Increase the scale of data. 2). Introduce additional related knowledge. In this paper, we propose a novel method based on adversarial training and knowledge distillation, which can be trained in other knowledge-rich datasets and transferred to the Chinese GaoKao reading comprehension task. Extensive experiments show that our proposed model performs better than the state-of-the-art baselines. The code and the relevant dataset will be publicly avaible.
中国高考阅读理解是一项具有挑战性的NLP任务。它需要很强的逻辑推理能力来捕捉问题和答案之间的深层语义关系。然而,由于中国高考阅读理解数据的缺乏,大多数传统模型无法学习到足够的推理能力。直观地说,提高中国高考阅读理解任务的阅读理解能力有两种方法。1)增加数据规模。2)引入额外的相关知识。在本文中,我们提出了一种基于对抗训练和知识蒸馏的新方法,该方法可以在其他知识丰富的数据集上进行训练,并转移到中国高考阅读理解任务中。大量的实验表明,我们提出的模型比最先进的基线性能更好。代码和相关数据集将向公众开放。
{"title":"A Knowledge Enhanced Chinese GaoKao Reading Comprehension Method","authors":"Xiao Zhang, Heqi Zheng, Heyan Huang, Zewen Chi, Xian-Ling Mao","doi":"10.1109/ICKG52313.2021.00053","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00053","url":null,"abstract":"Chinese GaoKao Reading Comprehension is a chal-lenging NLP task. It requires strong logical reasoning ability to capture deep semantic relations between the questions and answers. However, most traditional models cannot learn sufficient inference ability, because of the scarcity of Chinese GaoKao reading comprehension data. Intuitively, there are two methods to improve the reading comprehension ability for Chinese GaoKao reading comprehension task. 1). Increase the scale of data. 2). Introduce additional related knowledge. In this paper, we propose a novel method based on adversarial training and knowledge distillation, which can be trained in other knowledge-rich datasets and transferred to the Chinese GaoKao reading comprehension task. Extensive experiments show that our proposed model performs better than the state-of-the-art baselines. The code and the relevant dataset will be publicly avaible.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125878522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UFreS: A New Technique for Discovering Frequent Subgraph Patterns in Uncertain Graph Databases 一种发现不确定图数据库中频繁子图模式的新技术
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00042
Riddho Ridwanul Haque, Chowdhury Farhan Ahmed, M. Samiullah, C. Leung
Large graph data repositories are becoming in-creasingly common. Identifying frequently appearing subgraph patterns in such databases can reveal useful information, and such patterns have been used for a variety of applications. Im-perfections and stochasticity are often unavoidable in real-world graph data, and the existence of edges in the graphs within such databases is often uncertain. Taking this uncertainty into account while mining frequent patterns poses considerable computational challenges. However, doing so is crucial for accurately mining relevant patterns. Existing frequent subgraph mining approaches that consider uncertainty rely on approximation schemes, and are both inefficient and inaccurate. In this paper, we present UFreS, an exact algorithm for mining frequent subgraph patterns from uncertain graph databases. We also introduce Edge-Embedding graphs, the first data structure designed to efficiently and exactly infer the expected support of a subgraph pattern in an uncer-tain graph. Experimental evaluations conducted on real-world datasets show that UFreS is efficient, scalable, and outperforms the existing approaches in terms of runtime, memory usage and accuracy.
大型图形数据存储库正变得越来越普遍。识别此类数据库中频繁出现的子图模式可以揭示有用的信息,并且此类模式已用于各种应用程序。在现实世界的图数据中,非完美性和随机性往往是不可避免的,而在这样的数据库中,图中是否存在边通常是不确定的。在挖掘频繁模式时考虑这种不确定性会带来相当大的计算挑战。然而,这样做对于准确挖掘相关模式至关重要。现有的考虑不确定性的频繁子图挖掘方法依赖于近似方案,既低效又不准确。本文提出了一种从不确定图数据库中挖掘频繁子图模式的精确算法ures。我们还介绍了边嵌入图,这是第一个设计用于在不确定图中有效准确地推断子图模式的期望支持度的数据结构。在真实数据集上进行的实验评估表明,UFreS是高效的、可扩展的,并且在运行时间、内存使用和准确性方面优于现有方法。
{"title":"UFreS: A New Technique for Discovering Frequent Subgraph Patterns in Uncertain Graph Databases","authors":"Riddho Ridwanul Haque, Chowdhury Farhan Ahmed, M. Samiullah, C. Leung","doi":"10.1109/ICKG52313.2021.00042","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00042","url":null,"abstract":"Large graph data repositories are becoming in-creasingly common. Identifying frequently appearing subgraph patterns in such databases can reveal useful information, and such patterns have been used for a variety of applications. Im-perfections and stochasticity are often unavoidable in real-world graph data, and the existence of edges in the graphs within such databases is often uncertain. Taking this uncertainty into account while mining frequent patterns poses considerable computational challenges. However, doing so is crucial for accurately mining relevant patterns. Existing frequent subgraph mining approaches that consider uncertainty rely on approximation schemes, and are both inefficient and inaccurate. In this paper, we present UFreS, an exact algorithm for mining frequent subgraph patterns from uncertain graph databases. We also introduce Edge-Embedding graphs, the first data structure designed to efficiently and exactly infer the expected support of a subgraph pattern in an uncer-tain graph. Experimental evaluations conducted on real-world datasets show that UFreS is efficient, scalable, and outperforms the existing approaches in terms of runtime, memory usage and accuracy.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126022676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ToFM: Topic-specific Facet Mining by Facet Propagation within Clusters ToFM:在集群内通过Facet传播进行特定主题的Facet挖掘
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00060
Hongxuan Li, Bifan Wei, Jun Liu, Zhaotong Guo, Jingchao Qi, Bei Wu, Yong Liu, Yuanyuan Shi
Mining the facets of topics is an essential task for information retrieval, information extraction and knowledge base construction. For the topics in courses, there are three challenges: different topics have different facet, the labels of facets rarely appear in the topic description text and not all topics have enough textural information to mine facets. In this paper we propose a weakly-supervised algorithm for topic-specific facet mining (ToFM for short) based on our finding that similar topics in a cluster have similar facet sets. For example, topics Binary Search Tree, Suffix Tree and AVL tree in Tree cluster have example, insertion, deletion, traversal and other similar facets. ToFM first splits topics in a domain into several topic clusters based on the topic description text. Then ToFM extracts initial facet sets for all topics from the corresponding Wikipedia article pages. Finally, ToFM performs a normalized facet propagation within each topic cluster to acquire final facet sets of every topic. We evaluate the performance of ToFM on six real-world datasets and experimental results show that ToFM achieves better performance than the existing facet mining algorithms.
主题方面的挖掘是信息检索、信息抽取和知识库建设的重要任务。对于课程中的主题,存在三个挑战:不同的主题有不同的facet, facet的标签很少出现在主题描述文本中,并不是所有的主题都有足够的纹理信息来挖掘facet。在本文中,我们提出了一种弱监督算法,用于特定主题的facet挖掘(简称ToFM),这是基于我们发现集群中相似的主题具有相似的facet集。例如,树簇中的主题二叉搜索树、后缀树和AVL树具有示例、插入、删除、遍历等类似方面。ToFM首先根据主题描述文本将领域中的主题划分为几个主题集群。然后ToFM从相应的Wikipedia文章页面中提取所有主题的初始facet集。最后,ToFM在每个主题集群内执行规范化的facet传播,以获取每个主题的最终facet集。我们在六个真实数据集上评估了ToFM的性能,实验结果表明ToFM比现有的facet挖掘算法取得了更好的性能。
{"title":"ToFM: Topic-specific Facet Mining by Facet Propagation within Clusters","authors":"Hongxuan Li, Bifan Wei, Jun Liu, Zhaotong Guo, Jingchao Qi, Bei Wu, Yong Liu, Yuanyuan Shi","doi":"10.1109/ICKG52313.2021.00060","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00060","url":null,"abstract":"Mining the facets of topics is an essential task for information retrieval, information extraction and knowledge base construction. For the topics in courses, there are three challenges: different topics have different facet, the labels of facets rarely appear in the topic description text and not all topics have enough textural information to mine facets. In this paper we propose a weakly-supervised algorithm for topic-specific facet mining (ToFM for short) based on our finding that similar topics in a cluster have similar facet sets. For example, topics Binary Search Tree, Suffix Tree and AVL tree in Tree cluster have example, insertion, deletion, traversal and other similar facets. ToFM first splits topics in a domain into several topic clusters based on the topic description text. Then ToFM extracts initial facet sets for all topics from the corresponding Wikipedia article pages. Finally, ToFM performs a normalized facet propagation within each topic cluster to acquire final facet sets of every topic. We evaluate the performance of ToFM on six real-world datasets and experimental results show that ToFM achieves better performance than the existing facet mining algorithms.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132840288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Question-formed Query Suggestion 问题形成的查询建议
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00071
Y. He, Xian-Ling Mao, Wei Wei, Heyan Huang
Traditional Query Suggestion (TQS) aims to retrieve or generate completed queries given input keywords and query logs, which plays a vital role in information retrieval. Nearly all existing TQS methods obtain suggested queries, which are usually in the form of keywords or phrases. However, queries like keywords or phrases suffer from incomplete or ambiguous se-mantics. Ideally, question-formed queries are more intuitive and closer to the information needs of users, which can improve their satisfaction during a search. Motivated by this idea, thus, this paper defines a novel question-formed query suggestion task that generates question-formed queries given input keywords and web page texts. Moreover, we also propose a novel pipeline method for this novel task. Specifically, a query generation module is first employed to generate related question-formed queries given keywords and web page texts. Then, a selection module selects the most representative tops among all generated queries as the final suggestion. Extensive experiments demonstrate that our method outperforms the state-of-the-art baselines in human evaluation.
传统的查询建议(Query Suggestion, TQS)是在给定的关键字和查询日志中检索或生成完整的查询,在信息检索中起着至关重要的作用。几乎所有现有的TQS方法都获得建议查询,这些建议查询通常以关键字或短语的形式出现。然而,像关键字或短语这样的查询存在语义不完整或含糊的问题。理想情况下,问题形式的查询更直观,更接近用户的信息需求,这可以提高他们在搜索过程中的满意度。基于这一思路,本文定义了一种新的提问式查询建议任务,在给定输入关键词和网页文本的情况下生成提问式查询。此外,我们还提出了一种新的流水线方法。具体而言,首先使用查询生成模块生成给定关键字和网页文本的相关问题形式查询。然后,选择模块在所有生成的查询中选择最具代表性的top作为最终建议。广泛的实验表明,我们的方法优于人类评估的最先进的基线。
{"title":"Question-formed Query Suggestion","authors":"Y. He, Xian-Ling Mao, Wei Wei, Heyan Huang","doi":"10.1109/ICKG52313.2021.00071","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00071","url":null,"abstract":"Traditional Query Suggestion (TQS) aims to retrieve or generate completed queries given input keywords and query logs, which plays a vital role in information retrieval. Nearly all existing TQS methods obtain suggested queries, which are usually in the form of keywords or phrases. However, queries like keywords or phrases suffer from incomplete or ambiguous se-mantics. Ideally, question-formed queries are more intuitive and closer to the information needs of users, which can improve their satisfaction during a search. Motivated by this idea, thus, this paper defines a novel question-formed query suggestion task that generates question-formed queries given input keywords and web page texts. Moreover, we also propose a novel pipeline method for this novel task. Specifically, a query generation module is first employed to generate related question-formed queries given keywords and web page texts. Then, a selection module selects the most representative tops among all generated queries as the final suggestion. Extensive experiments demonstrate that our method outperforms the state-of-the-art baselines in human evaluation.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
HSNP-Miner: High Utility Self-Adaptive Nonoverlapping Pattern Mining HSNP-Miner:高实用自适应无重叠模式挖掘
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00019
Motaher Hossain, Youxi Wu, Philippe Fournier-Viger, Zhao Li, Lei Guo, Yan Li
Sequential pattern mining (SPM) under the nonoverlapping condition (or nonoverlapping SPM) is a type of data mining used to extract frequent gapped subsequences (known as patterns) from sequences, which is more valuable and versatile than other related methods. In nonoverlapping SPM, two occurrences cannot reuse the same sequence letter in the exact location as the occurrences. This method evaluates the frequency of the patterns in the sequence, and ignores the impact of external utility (item price or profit). Therefore, some low-frequency and essential patterns are overlooked. To address this issue, this paper introduces High Utility Self-adaptive Nonoverlapping Pattern (HSNP) mining and proposes HSNP-Miner, which includes two steps: support calculation and candi-date pattern generation. To calculate the support, we propose the NoSup algorithm, which can effectively calculate support while avoiding the creation of redundant nodes. An advanced upper bound method is employed to generate the candidate patterns more efficiently. Compared to other competitive methods, the experimental results demonstrate the efficiency of the proposed algorithm and the uniqueness of nonoverlapping sequence pat-tarns.
非重叠条件下的顺序模式挖掘(SPM)是一种用于从序列中提取频繁间隙子序列(称为模式)的数据挖掘方法,它比其他相关方法更有价值和通用性。在非重叠SPM中,两个序列不能在相同的位置重复使用相同的序列字母。该方法评估序列中模式的频率,并忽略外部效用(项目价格或利润)的影响。因此,忽略了一些低频和基本模式。为了解决这一问题,本文引入了HSNP (High Utility Self-adaptive non - overlap Pattern)挖掘方法,并提出了HSNP- miner算法,该算法包括支持度计算和候选数据模式生成两个步骤。为了计算支持度,我们提出了NoSup算法,该算法可以有效地计算支持度,同时避免冗余节点的产生。采用一种先进的上界方法,更有效地生成候选模式。与其他竞争方法相比,实验结果证明了该算法的有效性和非重叠序列模式的唯一性。
{"title":"HSNP-Miner: High Utility Self-Adaptive Nonoverlapping Pattern Mining","authors":"Motaher Hossain, Youxi Wu, Philippe Fournier-Viger, Zhao Li, Lei Guo, Yan Li","doi":"10.1109/ICKG52313.2021.00019","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00019","url":null,"abstract":"Sequential pattern mining (SPM) under the nonoverlapping condition (or nonoverlapping SPM) is a type of data mining used to extract frequent gapped subsequences (known as patterns) from sequences, which is more valuable and versatile than other related methods. In nonoverlapping SPM, two occurrences cannot reuse the same sequence letter in the exact location as the occurrences. This method evaluates the frequency of the patterns in the sequence, and ignores the impact of external utility (item price or profit). Therefore, some low-frequency and essential patterns are overlooked. To address this issue, this paper introduces High Utility Self-adaptive Nonoverlapping Pattern (HSNP) mining and proposes HSNP-Miner, which includes two steps: support calculation and candi-date pattern generation. To calculate the support, we propose the NoSup algorithm, which can effectively calculate support while avoiding the creation of redundant nodes. An advanced upper bound method is employed to generate the candidate patterns more efficiently. Compared to other competitive methods, the experimental results demonstrate the efficiency of the proposed algorithm and the uniqueness of nonoverlapping sequence pat-tarns.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132428957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Consistency-aware Multi-modal Network for Hierarchical Multi-label Classification in Online Education System 面向在线教育系统分层多标签分类的一致性感知多模态网络
Pub Date : 2021-12-01 DOI: 10.1109/ICKG52313.2021.00063
Siqi Lei, Wei Huang, Shiwei Tong, Qi Liu, Zhenya Huang, Enhong Chen, Yu Su
In the online education system, predicting the knowledge of exercises is a fundamental task of many applications, such as cognitive diagnosis. Usually, experts consider this problem as Hierarchical Multi-label Classification (HMC), since the knowledge concepts exhibit a multi-level structure. However, existing methods either sacrificed knowledge consistency for classification accuracy or sacrificed classification accuracy for knowledge consistency. Maintaining the balance is difficult. To forgo this dilemma, in this paper, we develop a novel frame-work called Consistency-Aware Multi-modal Network (Cam-Net). Specifically, we develop a multi-modal embedding module to learn the representation of the multi-modal exercise. Then, we adopt a hybrid prediction method consisting of the flat prediction module and the local prediction module. The local prediction module deals with the relation between the knowledge hierarchy and the input exercise. The flat prediction module focuses on maintaining knowledge consistency. Finally, to balance classification accuracy and knowledge consistency, we combine the outputs of two modules to make a final prediction. Extensive experimental results on two real-world datasets demonstrate the high performance and the ability to reduce knowledge inconsistency of CamNet.
在在线教育系统中,预测练习的知识是认知诊断等许多应用的基础任务。由于知识概念呈现出多层次的结构,专家通常将此问题称为层次多标签分类(HMC)。然而,现有的方法要么牺牲知识的一致性来换取分类的准确性,要么牺牲知识的准确性来换取知识的一致性。保持平衡是困难的。为了摆脱这种困境,在本文中,我们开发了一种新的框架,称为一致性感知多模态网络(Cam-Net)。具体来说,我们开发了一个多模态嵌入模块来学习多模态练习的表示。然后,我们采用由平面预测模块和局部预测模块组成的混合预测方法。局部预测模块处理知识层次与输入练习之间的关系。平面预测模块侧重于保持知识的一致性。最后,为了平衡分类精度和知识一致性,我们将两个模块的输出结合起来进行最终预测。在两个真实数据集上的大量实验结果证明了CamNet的高性能和减少知识不一致的能力。
{"title":"Consistency-aware Multi-modal Network for Hierarchical Multi-label Classification in Online Education System","authors":"Siqi Lei, Wei Huang, Shiwei Tong, Qi Liu, Zhenya Huang, Enhong Chen, Yu Su","doi":"10.1109/ICKG52313.2021.00063","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00063","url":null,"abstract":"In the online education system, predicting the knowledge of exercises is a fundamental task of many applications, such as cognitive diagnosis. Usually, experts consider this problem as Hierarchical Multi-label Classification (HMC), since the knowledge concepts exhibit a multi-level structure. However, existing methods either sacrificed knowledge consistency for classification accuracy or sacrificed classification accuracy for knowledge consistency. Maintaining the balance is difficult. To forgo this dilemma, in this paper, we develop a novel frame-work called Consistency-Aware Multi-modal Network (Cam-Net). Specifically, we develop a multi-modal embedding module to learn the representation of the multi-modal exercise. Then, we adopt a hybrid prediction method consisting of the flat prediction module and the local prediction module. The local prediction module deals with the relation between the knowledge hierarchy and the input exercise. The flat prediction module focuses on maintaining knowledge consistency. Finally, to balance classification accuracy and knowledge consistency, we combine the outputs of two modules to make a final prediction. Extensive experimental results on two real-world datasets demonstrate the high performance and the ability to reduce knowledge inconsistency of CamNet.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122395402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 IEEE International Conference on Big Knowledge (ICBK)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1