Machine Learning on Wikipedia Text for the Automatic Identification of Vocational Domains of Significance for Displaced Communities

Maria Nefeli Nikiforos, Konstantina Deliveri, Katia Lida Kermanidis, Adamantia G. Pateli
{"title":"Machine Learning on Wikipedia Text for the Automatic Identification of Vocational Domains of Significance for Displaced Communities","authors":"Maria Nefeli Nikiforos, Konstantina Deliveri, Katia Lida Kermanidis, Adamantia G. Pateli","doi":"10.1109/SMAP56125.2022.9941803","DOIUrl":null,"url":null,"abstract":"Despite their educational level and professional qualifications, an important percentage of highly-skilled migrants and refugees find employment in low-skill vocations throughout the world. Typical vocational domains include agriculture, cooking, crafting, construction, and hospitality. As a first step towards developing an educational tool for helping such underprivileged communities become acquainted with the sublanguage of their vocational domain in their host country, automatic domain identification among the aforementioned domains was attempted in this paper, using domain-specific textual data. Wikis and social networks provide a valuable data source for data mining, Natural Language Processing and machine learning tasks. Wikipedia articles, in regard to these domains, were collected and processed in order to create a novel text data set. Extracted linguistic features were used in the experiments with Random Forest combined with Adaboost, and Gradient Boosted Trees. The machine learning models achieved high performance in vocational domain identification (up to 99.93% accuracy).","PeriodicalId":432172,"journal":{"name":"2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMAP56125.2022.9941803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Despite their educational level and professional qualifications, an important percentage of highly-skilled migrants and refugees find employment in low-skill vocations throughout the world. Typical vocational domains include agriculture, cooking, crafting, construction, and hospitality. As a first step towards developing an educational tool for helping such underprivileged communities become acquainted with the sublanguage of their vocational domain in their host country, automatic domain identification among the aforementioned domains was attempted in this paper, using domain-specific textual data. Wikis and social networks provide a valuable data source for data mining, Natural Language Processing and machine learning tasks. Wikipedia articles, in regard to these domains, were collected and processed in order to create a novel text data set. Extracted linguistic features were used in the experiments with Random Forest combined with Adaboost, and Gradient Boosted Trees. The machine learning models achieved high performance in vocational domain identification (up to 99.93% accuracy).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在维基百科文本上的机器学习,用于自动识别流离失所社区的重要职业领域
尽管他们的教育水平和专业资格,但在世界各地,有很大比例的高技能移民和难民在低技能职业中找到工作。典型的职业领域包括农业、烹饪、手工艺、建筑和酒店。作为开发教育工具的第一步,帮助这些贫困社区熟悉其东道国职业领域的子语言,本文尝试使用特定领域的文本数据在上述领域中进行自动领域识别。维基和社交网络为数据挖掘、自然语言处理和机器学习任务提供了有价值的数据源。收集和处理维基百科关于这些领域的文章,以创建一个新的文本数据集。将提取的语言特征与随机森林、Adaboost和梯度增强树相结合进行实验。机器学习模型在职业领域识别方面取得了优异的成绩(准确率高达99.93%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Supporting conservation and restoration through digital media modeling and exploitation - the example of the Acropolis of Ancient Tiryns SMAP 2022 Blank Page Classification of Student Affective States in Online Learning using Neural Networks SMAP 2022 Blank Page A Multi-class Classification Approach for Weather Forecasting with Machine Learning Techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1