序列标注的词嵌入与词类型:CV解析的奇特案例

VS@HLT-NAACL Pub Date : 2015-06-01 DOI:10.3115/v1/W15-1517
Melanie Tosik, C. Hansen, Gerard Goossen, M. Rotaru
{"title":"序列标注的词嵌入与词类型:CV解析的奇特案例","authors":"Melanie Tosik, C. Hansen, Gerard Goossen, M. Rotaru","doi":"10.3115/v1/W15-1517","DOIUrl":null,"url":null,"abstract":"We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are generated from a large sample of German CVs. The best results on the extraction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features. The improvements are consistent throughout different sections of the target documents. The effect of the word embeddings is strongest on semi-structured, out-of-sample data.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"410 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing\",\"authors\":\"Melanie Tosik, C. Hansen, Gerard Goossen, M. Rotaru\",\"doi\":\"10.3115/v1/W15-1517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are generated from a large sample of German CVs. The best results on the extraction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features. The improvements are consistent throughout different sections of the target documents. The effect of the word embeddings is strongest on semi-structured, out-of-sample data.\",\"PeriodicalId\":299646,\"journal\":{\"name\":\"VS@HLT-NAACL\",\"volume\":\"410 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"VS@HLT-NAACL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/v1/W15-1517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"VS@HLT-NAACL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/v1/W15-1517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

本文通过应用词嵌入在自然语言处理(NLP)中的最新研究,探索了改进德文文档简历(CV)解析的新方法。我们的方法集成了词嵌入作为依赖于条件随机场(CRF)框架的概率序列标记模型的输入特征。表现最好的词嵌入是从大量德国简历样本中生成的。该模型将词嵌入与许多手工特征相结合,在提取任务中获得了最好的结果。这些改进在目标文档的不同部分是一致的。词嵌入对半结构化、样本外数据的影响是最强的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing
We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are generated from a large sample of German CVs. The best results on the extraction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features. The improvements are consistent throughout different sections of the target documents. The effect of the word embeddings is strongest on semi-structured, out-of-sample data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Distributional Semantic Concept Models for Entity Relation Discovery Learning Distributed Representations for Multilingual Text Sequences Vector Space Models for Scientific Document Summarization A Deep Architecture for Non-Projective Dependency Parsing Dependency Link Embeddings: Continuous Representations of Syntactic Substructures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1