基于KOS和深度学习的旅游领域数据集标注

IF 1.7 3区 管理学 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE Journal of Documentation Pub Date : 2023-05-02 DOI:10.1108/jd-02-2023-0019
G. Aracri, A. Folino, Stefano Silvestri
{"title":"基于KOS和深度学习的旅游领域数据集标注","authors":"G. Aracri, A. Folino, Stefano Silvestri","doi":"10.1108/jd-02-2023-0019","DOIUrl":null,"url":null,"abstract":"PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.","PeriodicalId":47969,"journal":{"name":"Journal of Documentation","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated use of KOS and deep learning for data set annotation in tourism domain\",\"authors\":\"G. Aracri, A. Folino, Stefano Silvestri\",\"doi\":\"10.1108/jd-02-2023-0019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.\",\"PeriodicalId\":47969,\"journal\":{\"name\":\"Journal of Documentation\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Documentation\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1108/jd-02-2023-0019\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Documentation","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1108/jd-02-2023-0019","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

目的本文的目的是提出一种丰富和定制知识组织系统(KOS)的方法,以支持旅游领域文档分析的信息提取(IE)任务。特别是,KOS用于开发命名实体识别(NER)系统。设计/方法/方法首次提出了一种利用与意大利旅游业相关的文件来改进和定制可用词库的方法。然后,使用获得的词库创建带注释的NER语料库,利用远程监督、深度学习和轻度人类监督。发现研究表明,当应用于属于用于构建的相同域和类型的文档时,定制的KOS可以有效地支持IE任务。此外,使用所提出的方法来支持和简化注释任务是非常有用的,允许用手动注释所需的一小部分工作量来注释语料库。原创性/价值本文探讨了KOS的另一种使用方式,提出了一种创新的NER语料库注释方法。此外,KOS和注释的NER数据集将公开。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Integrated use of KOS and deep learning for data set annotation in tourism domain
PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Documentation
Journal of Documentation INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
4.20
自引率
14.30%
发文量
72
期刊介绍: The scope of the Journal of Documentation is broadly information sciences, encompassing all of the academic and professional disciplines which deal with recorded information. These include, but are certainly not limited to: ■Information science, librarianship and related disciplines ■Information and knowledge management ■Information and knowledge organisation ■Information seeking and retrieval, and human information behaviour ■Information and digital literacies
期刊最新文献
Dancing with the devil: the use and perceptions of academic journal ranking lists in the management field From amused to : enriching mood metadata by mapping textual descriptors to emojis for fiction reading The in-between: information experience within human-companion animal living environments Influence of Dervin’s sensemaking methodology determined through citation context analysis, content analysis and bibliometrics Toward an extended metadata standard for digital art
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1