M. Lentschat, P. Buche, Juliette Dibie-Barthélemy, Mathieu Roche
{"title":"基于文本数据新表示的语义和词汇组合得分,从科学出版物中提取实验数据","authors":"M. Lentschat, P. Buche, Juliette Dibie-Barthélemy, Mathieu Roche","doi":"10.1504/ijiids.2021.10042229","DOIUrl":null,"url":null,"abstract":": This article presents an ontological and terminological resource guided process for targeted extraction of scientific experimental data. Our method relies on the scientific publication representation ( SciPuRe ) describing the extracted data through ontological, lexical and structural (using segments in the scientific documents) features. Relevance scores based on these features are computed to rank the results and filter out the numerous false positives. Linear and sequential combinations of these scores are presented and evaluated. Experiments were carried out on a corpus of 50 English language scientific papers in the food packaging field. They revealed that article segment are an effective criterion for filtering out a majority of the quantitative entity false positives using lexical scores. Moreover the best symbolic entity extraction results were obtained with a sequential combinations of semantic and lexical scores. These results enable the ranking of entities by relevance and the filtering of false positive results.","PeriodicalId":39658,"journal":{"name":"International Journal of Intelligent Information and Database Systems","volume":"6 1","pages":"78-103"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications\",\"authors\":\"M. Lentschat, P. Buche, Juliette Dibie-Barthélemy, Mathieu Roche\",\"doi\":\"10.1504/ijiids.2021.10042229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": This article presents an ontological and terminological resource guided process for targeted extraction of scientific experimental data. Our method relies on the scientific publication representation ( SciPuRe ) describing the extracted data through ontological, lexical and structural (using segments in the scientific documents) features. Relevance scores based on these features are computed to rank the results and filter out the numerous false positives. Linear and sequential combinations of these scores are presented and evaluated. Experiments were carried out on a corpus of 50 English language scientific papers in the food packaging field. They revealed that article segment are an effective criterion for filtering out a majority of the quantitative entity false positives using lexical scores. Moreover the best symbolic entity extraction results were obtained with a sequential combinations of semantic and lexical scores. These results enable the ranking of entities by relevance and the filtering of false positive results.\",\"PeriodicalId\":39658,\"journal\":{\"name\":\"International Journal of Intelligent Information and Database Systems\",\"volume\":\"6 1\",\"pages\":\"78-103\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Information and Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijiids.2021.10042229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Information and Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijiids.2021.10042229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications
: This article presents an ontological and terminological resource guided process for targeted extraction of scientific experimental data. Our method relies on the scientific publication representation ( SciPuRe ) describing the extracted data through ontological, lexical and structural (using segments in the scientific documents) features. Relevance scores based on these features are computed to rank the results and filter out the numerous false positives. Linear and sequential combinations of these scores are presented and evaluated. Experiments were carried out on a corpus of 50 English language scientific papers in the food packaging field. They revealed that article segment are an effective criterion for filtering out a majority of the quantitative entity false positives using lexical scores. Moreover the best symbolic entity extraction results were obtained with a sequential combinations of semantic and lexical scores. These results enable the ranking of entities by relevance and the filtering of false positive results.
期刊介绍:
Intelligent information systems and intelligent database systems are a very dynamically developing field in computer sciences. IJIIDS provides a medium for exchanging scientific research and technological achievements accomplished by the international community. It focuses on research in applications of advanced intelligent technologies for data storing and processing in a wide-ranging context. The issues addressed by IJIIDS involve solutions of real-life problems, in which it is necessary to apply intelligent technologies for achieving effective results. The emphasis of the reported work is on new and original research and technological developments rather than reports on the application of existing technology to different sets of data.