Martin Lentschat, Patrice Buche, Juliette Dibie-Barthélemy, Mathieu Roche
{"title":"SciPuRe","authors":"Martin Lentschat, Patrice Buche, Juliette Dibie-Barthélemy, Mathieu Roche","doi":"10.1145/3405962.3405978","DOIUrl":null,"url":null,"abstract":"Retrieving entities associated with experimental data in the textual content of scientific documents faces numbers of challenges. One of them is the assessment of the extracted entities for further process, especially the identification of false positives. We present in this paper SciPuRe (Scientific Publication Representation): a new representation of entities. The extraction process presented in this paper is driven by an Ontological and Terminological Resource (OTR). It is applied to the extraction of entities associated with food packaging permeabilities, that can be symbolic (e.g. the Packaging \"low density polyethylene\") or quantitative (e.g. the Temperature \"25\", \"°C\" or the H20_Permeability \"4.34 * 10-3\", \"cm3 μm-2 d-1 kPa\"). A representation of each entity, composed of a set of features, is built during the extraction process. These features can be gathered in three categories: Ontological, Lexical and Structural. The features of SciPuRe are used to compute Relevance scores that consider the different information available for each entity extracted. Such Relevance scores inform the usefulness of SciPuRe and can then be used to rank the extraction results and discard false positives.","PeriodicalId":247414,"journal":{"name":"Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics","volume":"203 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3405962.3405978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Retrieving entities associated with experimental data in the textual content of scientific documents faces numbers of challenges. One of them is the assessment of the extracted entities for further process, especially the identification of false positives. We present in this paper SciPuRe (Scientific Publication Representation): a new representation of entities. The extraction process presented in this paper is driven by an Ontological and Terminological Resource (OTR). It is applied to the extraction of entities associated with food packaging permeabilities, that can be symbolic (e.g. the Packaging "low density polyethylene") or quantitative (e.g. the Temperature "25", "°C" or the H20_Permeability "4.34 * 10-3", "cm3 μm-2 d-1 kPa"). A representation of each entity, composed of a set of features, is built during the extraction process. These features can be gathered in three categories: Ontological, Lexical and Structural. The features of SciPuRe are used to compute Relevance scores that consider the different information available for each entity extracted. Such Relevance scores inform the usefulness of SciPuRe and can then be used to rank the extraction results and discard false positives.