Using lexicography to characterise relations between species mentions in the biodiversity literature

Sandra Young
{"title":"Using lexicography to characterise relations between species mentions in the biodiversity literature","authors":"Sandra Young","doi":"10.1145/3322905.3322918","DOIUrl":null,"url":null,"abstract":"The biodiversity literature is one of the longest-standing examples of recording heritage in the world. Today there are many efforts to standardise and integrate the literature to ensure access to the information, both for heritage and research purposes. Ontologies are increasingly being turned to as knowledge representation tools in these efforts. However, the validity of using ontological frameworks to represent biological taxonomies has been questioned. Biological taxonomies use the scientific nomenclature to assign names to described species. While the nomenclature is a useful classification tool, it can also be a source of confusion because of its synonymous, homonymous and fluid nature. Despite this, no empirical evaluation of scientific nomenclature use in the literature has ever been performed. Corpus-based analysis is already used in automatic ontology extraction, and this study explores the possibility of applying recently developed lexicography techniques to the problem to provide an evaluation of the empirical data in the literature, and serve as a comparison with existing ontologies. This paper focuses on the work flow, parameters and preliminary findings of the research investigating how to extract structures from the literature to perform these comparisons. It uses the manipulation of corpus analysis techniques, visualisation and filtering methods to do so and evaluates potential classification and disambiguation qualities of the resulting graphs for future work. Preliminary results look at the effects of frequency and salience when filtering the graphs, which indicate that these filter parameters could be used for different purposes in revealing relationships between organism mentions.","PeriodicalId":418911,"journal":{"name":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3322905.3322918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The biodiversity literature is one of the longest-standing examples of recording heritage in the world. Today there are many efforts to standardise and integrate the literature to ensure access to the information, both for heritage and research purposes. Ontologies are increasingly being turned to as knowledge representation tools in these efforts. However, the validity of using ontological frameworks to represent biological taxonomies has been questioned. Biological taxonomies use the scientific nomenclature to assign names to described species. While the nomenclature is a useful classification tool, it can also be a source of confusion because of its synonymous, homonymous and fluid nature. Despite this, no empirical evaluation of scientific nomenclature use in the literature has ever been performed. Corpus-based analysis is already used in automatic ontology extraction, and this study explores the possibility of applying recently developed lexicography techniques to the problem to provide an evaluation of the empirical data in the literature, and serve as a comparison with existing ontologies. This paper focuses on the work flow, parameters and preliminary findings of the research investigating how to extract structures from the literature to perform these comparisons. It uses the manipulation of corpus analysis techniques, visualisation and filtering methods to do so and evaluates potential classification and disambiguation qualities of the resulting graphs for future work. Preliminary results look at the effects of frequency and salience when filtering the graphs, which indicate that these filter parameters could be used for different purposes in revealing relationships between organism mentions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用词典学来描述生物多样性文献中提到的物种之间的关系
生物多样性文献是世界上历史最悠久的遗产记录之一。今天,有许多努力标准化和整合文献,以确保获取信息,无论是为了遗产还是研究目的。在这些努力中,本体越来越多地被用作知识表示工具。然而,使用本体框架来表示生物分类的有效性一直受到质疑。生物分类学使用科学的命名法给被描述的物种命名。虽然命名法是一种有用的分类工具,但由于其同义性、同义性和流动性,它也可能成为混淆的根源。尽管如此,尚无文献中科学术语使用的实证评估。基于语料库的分析已经用于自动本体提取,本研究探索了将最新发展的词典编纂技术应用于该问题的可能性,以提供文献中经验数据的评估,并与现有本体进行比较。本文重点介绍了研究的工作流程、参数和初步结果,探讨了如何从文献中提取结构来进行这些比较。它使用语料库分析技术、可视化和过滤方法来操作,并评估结果图的潜在分类和消歧质量,以供将来的工作使用。初步结果考察了过滤图表时频率和显著性的影响,这表明这些过滤参数可以用于揭示生物体提及之间关系的不同目的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771-1929: Early Results Using the PIVAJ Software OCR for Greek polytonic (multi accent) historical printed documents: development, optimization and quality control Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts Validating 126 million MARC records Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification: A Case Study on Daniel Sander's Wörterbuch der Deutschen Sprache
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1