Maria Nefeli Nikiforos, Konstantina Deliveri, Katia Lida Kermanidis, Adamantia G. Pateli
{"title":"在维基百科文本上的机器学习,用于自动识别流离失所社区的重要职业领域","authors":"Maria Nefeli Nikiforos, Konstantina Deliveri, Katia Lida Kermanidis, Adamantia G. Pateli","doi":"10.1109/SMAP56125.2022.9941803","DOIUrl":null,"url":null,"abstract":"Despite their educational level and professional qualifications, an important percentage of highly-skilled migrants and refugees find employment in low-skill vocations throughout the world. Typical vocational domains include agriculture, cooking, crafting, construction, and hospitality. As a first step towards developing an educational tool for helping such underprivileged communities become acquainted with the sublanguage of their vocational domain in their host country, automatic domain identification among the aforementioned domains was attempted in this paper, using domain-specific textual data. Wikis and social networks provide a valuable data source for data mining, Natural Language Processing and machine learning tasks. Wikipedia articles, in regard to these domains, were collected and processed in order to create a novel text data set. Extracted linguistic features were used in the experiments with Random Forest combined with Adaboost, and Gradient Boosted Trees. The machine learning models achieved high performance in vocational domain identification (up to 99.93% accuracy).","PeriodicalId":432172,"journal":{"name":"2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Machine Learning on Wikipedia Text for the Automatic Identification of Vocational Domains of Significance for Displaced Communities\",\"authors\":\"Maria Nefeli Nikiforos, Konstantina Deliveri, Katia Lida Kermanidis, Adamantia G. Pateli\",\"doi\":\"10.1109/SMAP56125.2022.9941803\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite their educational level and professional qualifications, an important percentage of highly-skilled migrants and refugees find employment in low-skill vocations throughout the world. Typical vocational domains include agriculture, cooking, crafting, construction, and hospitality. As a first step towards developing an educational tool for helping such underprivileged communities become acquainted with the sublanguage of their vocational domain in their host country, automatic domain identification among the aforementioned domains was attempted in this paper, using domain-specific textual data. Wikis and social networks provide a valuable data source for data mining, Natural Language Processing and machine learning tasks. Wikipedia articles, in regard to these domains, were collected and processed in order to create a novel text data set. Extracted linguistic features were used in the experiments with Random Forest combined with Adaboost, and Gradient Boosted Trees. The machine learning models achieved high performance in vocational domain identification (up to 99.93% accuracy).\",\"PeriodicalId\":432172,\"journal\":{\"name\":\"2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMAP56125.2022.9941803\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMAP56125.2022.9941803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Learning on Wikipedia Text for the Automatic Identification of Vocational Domains of Significance for Displaced Communities
Despite their educational level and professional qualifications, an important percentage of highly-skilled migrants and refugees find employment in low-skill vocations throughout the world. Typical vocational domains include agriculture, cooking, crafting, construction, and hospitality. As a first step towards developing an educational tool for helping such underprivileged communities become acquainted with the sublanguage of their vocational domain in their host country, automatic domain identification among the aforementioned domains was attempted in this paper, using domain-specific textual data. Wikis and social networks provide a valuable data source for data mining, Natural Language Processing and machine learning tasks. Wikipedia articles, in regard to these domains, were collected and processed in order to create a novel text data set. Extracted linguistic features were used in the experiments with Random Forest combined with Adaboost, and Gradient Boosted Trees. The machine learning models achieved high performance in vocational domain identification (up to 99.93% accuracy).