Job description parsing with explainable transformer based ensemble models to extract the technical and non-technical skills

Natural Language Processing Journal Pub Date : 2024-12-01 Epub Date: 2024-09-03 DOI:10.1016/j.nlp.2024.100102

Abbas Akkasi

{"title":"Job description parsing with explainable transformer based ensemble models to extract the technical and non-technical skills","authors":"Abbas Akkasi","doi":"10.1016/j.nlp.2024.100102","DOIUrl":null,"url":null,"abstract":"<div><p>The rapid digitization of the economy is transforming the job market, creating new roles and reshaping existing ones. As skill requirements evolve, identifying essential competencies becomes increasingly critical. This paper introduces a novel ensemble model that combines traditional and transformer-based neural networks to extract both technical and non-technical skills from job descriptions. A substantial dataset of job descriptions from reputable platforms was meticulously annotated for 22 IT roles. The model demonstrated superior performance in extracting both non-technical (67% F-score) and technical skills (72% F-score) compared to conventional CRF and hybrid deep learning models. Specifically, the proposed model outperformed these baselines by an average margin of 10% and 6%, respectively, for non-technical skills, and 29% and 6.8% for technical skills. A 5 × 2cv paired t-test confirmed the statistical significance of these improvements. In addition, to enhance model interpretability, Local Interpretable Model-Agnostic Explanations (LIME) were employed in the experiments.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"9 ","pages":"Article 100102"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000505/pdfft?md5=a597d9732dfab2f3ac80c6409cc94264&pid=1-s2.0-S2949719124000505-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid digitization of the economy is transforming the job market, creating new roles and reshaping existing ones. As skill requirements evolve, identifying essential competencies becomes increasingly critical. This paper introduces a novel ensemble model that combines traditional and transformer-based neural networks to extract both technical and non-technical skills from job descriptions. A substantial dataset of job descriptions from reputable platforms was meticulously annotated for 22 IT roles. The model demonstrated superior performance in extracting both non-technical (67% F-score) and technical skills (72% F-score) compared to conventional CRF and hybrid deep learning models. Specifically, the proposed model outperformed these baselines by an average margin of 10% and 6%, respectively, for non-technical skills, and 29% and 6.8% for technical skills. A 5 × 2cv paired t-test confirmed the statistical significance of these improvements. In addition, to enhance model interpretability, Local Interpretable Model-Agnostic Explanations (LIME) were employed in the experiments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用基于可解释变换器的集合模型进行职位描述解析，以提取技术和非技术技能

经济的快速数字化正在改变就业市场，创造新的岗位，重塑现有岗位。随着技能要求的发展，识别基本能力变得越来越重要。本文介绍了一种新颖的集合模型，该模型结合了传统神经网络和基于变压器的神经网络，可从职位描述中提取技术和非技术技能。我们对来自知名平台的大量职位描述数据集进行了细致的注释，涉及 22 个 IT 职位。与传统的 CRF 模型和混合深度学习模型相比，该模型在提取非技术技能（67% 的 F-score）和技术技能（72% 的 F-score）方面均表现出卓越的性能。具体来说，在非技术技能方面，所提出的模型的平均性能分别比这些基线模型高出 10% 和 6%，在技术技能方面，分别高出 29% 和 6.8%。5 × 2cv 配对 t 检验证实了这些改进的统计意义。此外，为了提高模型的可解释性，实验中还采用了本地可解释模型诊断解释（LIME）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量