Combining offline and on-the-fly disambiguation to perform semantic-aware XML querying

IF 1.8 4区计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Computer Science and Information Systems Pub Date : 2023-01-01 DOI:10.2298/csis220228063t

Joe Tekli, Gilbert Tekli, R. Chbeir

{"title":"Combining offline and on-the-fly disambiguation to perform semantic-aware XML querying","authors":"Joe Tekli, Gilbert Tekli, R. Chbeir","doi":"10.2298/csis220228063t","DOIUrl":null,"url":null,"abstract":"Many efforts have been deployed by the IR community to extend free-text query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow almost linear computation efficiency. We use a semantic-aware inverted index to allow semantic-aware search, result selection, and result ranking functionality. The semantically augmented XML data tree is processed for structural node clustering, based on semantic query concepts (i.e., key-concepts), in order to identify and rank candidate answer sub-trees containing related occurrences of query key-concepts. Dedicated weighting functions and various search algorithms have been developed for that purpose and will be presented here. Experimental results highlight the quality and potential of our approach.","PeriodicalId":50636,"journal":{"name":"Computer Science and Information Systems","volume":"2 1","pages":"423-457"},"PeriodicalIF":1.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2298/csis220228063t","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Many efforts have been deployed by the IR community to extend free-text query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow almost linear computation efficiency. We use a semantic-aware inverted index to allow semantic-aware search, result selection, and result ranking functionality. The semantically augmented XML data tree is processed for structural node clustering, based on semantic query concepts (i.e., key-concepts), in order to identify and rank candidate answer sub-trees containing related occurrences of query key-concepts. Dedicated weighting functions and various search algorithms have been developed for that purpose and will be presented here. Experimental results highlight the quality and potential of our approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

结合离线和实时消歧来执行语义感知的XML查询

IR社区已经做了很多工作，将自由文本查询处理扩展到半结构化的XML搜索。大多数方法依赖于两个或多个结构节点之间的最低评论祖先(LCA)概念来标识包含用户发布的查询关键字的最特定的XML元素。然而，很少有现有的方法考虑XML语义，处理语义的方法通常依赖于计算代价高昂的词义消歧(WSD)技术，或者只在一个阶段应用语义分析:在词包检索模型上执行查询放松/细化，以减少处理时间。在本文中，我们描述了一种新的XML关键字搜索方法，旨在解决上述限制。我们的解决方案首先使用基于上下文和全局消歧方法将XML文档集合(离线)和关键字查询(实时)转换为有意义的语义表示，这些方法专门设计用于实现几乎线性的计算效率。我们使用语义感知的倒排索引来实现语义感知的搜索、结果选择和结果排序功能。基于语义查询概念(即键概念)，对语义增强的XML数据树进行结构化节点聚类处理，以便识别包含相关查询键概念出现的候选答案子树并对其进行排序。专门的加权函数和各种搜索算法已经为此目的而开发，并将在这里介绍。实验结果突出了我们方法的质量和潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Science and Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

2.30

自引率

21.40%

发文量

审稿时长

7.5 months

期刊介绍： About the journal Home page Contact information Aims and scope Indexing information Editorial policies ComSIS consortium Journal boards Managing board For authors Information for contributors Paper submission Article submission through OJS Copyright transfer form Download section For readers Forthcoming articles Current issue Archive Subscription For reviewers View and review submissions News Journal''s Facebook page Call for special issue New issue notification Aims and scope Computer Science and Information Systems (ComSIS) is an international refereed journal, published in Serbia. The objective of ComSIS is to communicate important research and development results in the areas of computer science, software engineering, and information systems.