Corpus and Baseline Model for Domain-Specific Entity Recognition in German

Sunna Torge, Waldemar Hahn, R. Jäkel, W. Nagel
{"title":"Corpus and Baseline Model for Domain-Specific Entity Recognition in German","authors":"Sunna Torge, Waldemar Hahn, R. Jäkel, W. Nagel","doi":"10.1109/CiSt49399.2021.9357189","DOIUrl":null,"url":null,"abstract":"Transfer Learning approaches are a promising means to analyze low-resource domain specific texts. The German SmartData corpus is the first German corpus, annotated with entities from different domains, and thus allows to investigate transfer learning approaches for Named Entity Recognition (NER) on different domains. In order to prepare such investigations, this work includes a thorough analysis of the SmartData corpus, and a revision w.r.t. annotations and the split into training and test data, considering the distribution of document and entity types. Based on that a baseline model for NER using BiLSTM-CRF neural networks including hyperparameter optimization is presented.","PeriodicalId":253233,"journal":{"name":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CiSt49399.2021.9357189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Transfer Learning approaches are a promising means to analyze low-resource domain specific texts. The German SmartData corpus is the first German corpus, annotated with entities from different domains, and thus allows to investigate transfer learning approaches for Named Entity Recognition (NER) on different domains. In order to prepare such investigations, this work includes a thorough analysis of the SmartData corpus, and a revision w.r.t. annotations and the split into training and test data, considering the distribution of document and entity types. Based on that a baseline model for NER using BiLSTM-CRF neural networks including hyperparameter optimization is presented.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
德语特定领域实体识别的语料库和基线模型
迁移学习方法是分析低资源领域特定文本的一种很有前途的方法。德语SmartData语料库是第一个德语语料库,标注了来自不同领域的实体,从而允许在不同领域研究命名实体识别(NER)的迁移学习方法。为了准备这样的调查,这项工作包括对SmartData语料库进行彻底的分析,并考虑到文档和实体类型的分布,对w.r.t.注释进行修订,并将训练数据和测试数据分开。在此基础上,提出了一种包含超参数优化的基于BiLSTM-CRF神经网络的NER基线模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Hybrid CNN-CRF Inference Models for 3D Mesh Segmentation An Effective Packet Loss Recovery Scheme Using a Cache Server in IPTV Multicast Service TermInteract: An Online Tool for Terminologists Aimed at Providing Terminology Quality Metrics Proposing solutions with an application server implementing telephony services in the IMS network Corpus and Baseline Model for Domain-Specific Entity Recognition in German
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1