{"title":"科学文献检索使用结构编码字符串与trie索引","authors":"Sourish Dhar, S. Roy, Arnab Paul","doi":"10.3233/isu-220155","DOIUrl":null,"url":null,"abstract":"Retrieving mathematical expressions from scientific documents is a challenging task as mathematical expressions or formulae are quite different from the traditional text. Mathematical expressions are highly symbolic and complex. Moreover, the structure of a mathematical formula conveys a semantic meaning which cannot be overlooked. This paper proposes a scientific document retrieval system based on mathematical formula query. The paper explores the concept of Structure Encoded String (SES), which has been employed for mathematical expressions to capture the relations among the formula structures. A pattern based trie indexing scheme has been proposed for faster retrieval. The Jaro-Winkler Similarity has been adopted for matching and ranking. Experiments are conducted, results are reported using standard evaluation measures and compared with similar existing systems.","PeriodicalId":39698,"journal":{"name":"Information Services and Use","volume":"23 1","pages":"241-259"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scientific document retrieval using structure encoded string with trie indexing\",\"authors\":\"Sourish Dhar, S. Roy, Arnab Paul\",\"doi\":\"10.3233/isu-220155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Retrieving mathematical expressions from scientific documents is a challenging task as mathematical expressions or formulae are quite different from the traditional text. Mathematical expressions are highly symbolic and complex. Moreover, the structure of a mathematical formula conveys a semantic meaning which cannot be overlooked. This paper proposes a scientific document retrieval system based on mathematical formula query. The paper explores the concept of Structure Encoded String (SES), which has been employed for mathematical expressions to capture the relations among the formula structures. A pattern based trie indexing scheme has been proposed for faster retrieval. The Jaro-Winkler Similarity has been adopted for matching and ranking. Experiments are conducted, results are reported using standard evaluation measures and compared with similar existing systems.\",\"PeriodicalId\":39698,\"journal\":{\"name\":\"Information Services and Use\",\"volume\":\"23 1\",\"pages\":\"241-259\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Services and Use\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/isu-220155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Services and Use","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/isu-220155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
Scientific document retrieval using structure encoded string with trie indexing
Retrieving mathematical expressions from scientific documents is a challenging task as mathematical expressions or formulae are quite different from the traditional text. Mathematical expressions are highly symbolic and complex. Moreover, the structure of a mathematical formula conveys a semantic meaning which cannot be overlooked. This paper proposes a scientific document retrieval system based on mathematical formula query. The paper explores the concept of Structure Encoded String (SES), which has been employed for mathematical expressions to capture the relations among the formula structures. A pattern based trie indexing scheme has been proposed for faster retrieval. The Jaro-Winkler Similarity has been adopted for matching and ranking. Experiments are conducted, results are reported using standard evaluation measures and compared with similar existing systems.
期刊介绍:
Information Services & Use is an information and information technology oriented publication with a wide scope of subject matters. International in terms of both audience and authorship, the journal aims at leaders in information management and applications in an attempt to keep them fully informed of fast-moving developments in fields such as: online systems, offline systems, electronic publishing, library automation, education and training, word processing and telecommunications. These areas are treated not only in general, but also in specific contexts; applications to business and scientific fields are sought so that a balanced view is offered to the reader.