Job description parsing with explainable transformer based ensemble models to extract the technical and non-technical skills

Abbas Akkasi
{"title":"Job description parsing with explainable transformer based ensemble models to extract the technical and non-technical skills","authors":"Abbas Akkasi","doi":"10.1016/j.nlp.2024.100102","DOIUrl":null,"url":null,"abstract":"<div><p>The rapid digitization of the economy is transforming the job market, creating new roles and reshaping existing ones. As skill requirements evolve, identifying essential competencies becomes increasingly critical. This paper introduces a novel ensemble model that combines traditional and transformer-based neural networks to extract both technical and non-technical skills from job descriptions. A substantial dataset of job descriptions from reputable platforms was meticulously annotated for 22 IT roles. The model demonstrated superior performance in extracting both non-technical (67% F-score) and technical skills (72% F-score) compared to conventional CRF and hybrid deep learning models. Specifically, the proposed model outperformed these baselines by an average margin of 10% and 6%, respectively, for non-technical skills, and 29% and 6.8% for technical skills. A 5 × 2cv paired t-test confirmed the statistical significance of these improvements. In addition, to enhance model interpretability, Local Interpretable Model-Agnostic Explanations (LIME) were employed in the experiments.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"9 ","pages":"Article 100102"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000505/pdfft?md5=a597d9732dfab2f3ac80c6409cc94264&pid=1-s2.0-S2949719124000505-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid digitization of the economy is transforming the job market, creating new roles and reshaping existing ones. As skill requirements evolve, identifying essential competencies becomes increasingly critical. This paper introduces a novel ensemble model that combines traditional and transformer-based neural networks to extract both technical and non-technical skills from job descriptions. A substantial dataset of job descriptions from reputable platforms was meticulously annotated for 22 IT roles. The model demonstrated superior performance in extracting both non-technical (67% F-score) and technical skills (72% F-score) compared to conventional CRF and hybrid deep learning models. Specifically, the proposed model outperformed these baselines by an average margin of 10% and 6%, respectively, for non-technical skills, and 29% and 6.8% for technical skills. A 5 × 2cv paired t-test confirmed the statistical significance of these improvements. In addition, to enhance model interpretability, Local Interpretable Model-Agnostic Explanations (LIME) were employed in the experiments.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于可解释变换器的集合模型进行职位描述解析,以提取技术和非技术技能
经济的快速数字化正在改变就业市场,创造新的岗位,重塑现有岗位。随着技能要求的发展,识别基本能力变得越来越重要。本文介绍了一种新颖的集合模型,该模型结合了传统神经网络和基于变压器的神经网络,可从职位描述中提取技术和非技术技能。我们对来自知名平台的大量职位描述数据集进行了细致的注释,涉及 22 个 IT 职位。与传统的 CRF 模型和混合深度学习模型相比,该模型在提取非技术技能(67% 的 F-score)和技术技能(72% 的 F-score)方面均表现出卓越的性能。具体来说,在非技术技能方面,所提出的模型的平均性能分别比这些基线模型高出 10% 和 6%,在技术技能方面,分别高出 29% 和 6.8%。5 × 2cv 配对 t 检验证实了这些改进的统计意义。此外,为了提高模型的可解释性,实验中还采用了本地可解释模型诊断解释(LIME)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Uzbek language morphology analyser Evaluation of google translate for Mandarin Chinese translation using sentiment and semantic analysis Bridging gaps in natural language processing for Yorùbá: A systematic review of a decade of progress and prospects Llama3SP: A resource-Efficient large language model for agile story point estimation A systematic review of figurative language detection: Methods, challenges, and multilingual perspectives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1