多任务罗马尼亚电子邮件分类在商业环境

Inf. Comput. Pub Date : 2023-06-03 DOI:10.3390/info14060321

A. Dima, Stefan Ruseti, Denis Iorga, C. Banica, Mihai Dascalu

{"title":"多任务罗马尼亚电子邮件分类在商业环境","authors":"A. Dima, Stefan Ruseti, Denis Iorga, C. Banica, Mihai Dascalu","doi":"10.3390/info14060321","DOIUrl":null,"url":null,"abstract":"Email classification systems are essential for handling and organizing the massive flow of communication, especially in a business context. Although many solutions exist, the lack of standardized classification categories limits their applicability. Furthermore, the lack of Romanian language business-oriented public datasets makes the development of such solutions difficult. To this end, we introduce a versatile automated email classification system based on a novel public dataset of 1447 manually annotated Romanian business-oriented emails. Our corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes. We establish a strong baseline using pre-trained Transformer models for token classification and multi-task classification, achieving an F1-score of 0.752 and 0.764, respectively. We publicly release our code together with the dataset of labeled emails.","PeriodicalId":13622,"journal":{"name":"Inf. Comput.","volume":"241 1","pages":"321"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Task Romanian Email Classification in a Business Context\",\"authors\":\"A. Dima, Stefan Ruseti, Denis Iorga, C. Banica, Mihai Dascalu\",\"doi\":\"10.3390/info14060321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Email classification systems are essential for handling and organizing the massive flow of communication, especially in a business context. Although many solutions exist, the lack of standardized classification categories limits their applicability. Furthermore, the lack of Romanian language business-oriented public datasets makes the development of such solutions difficult. To this end, we introduce a versatile automated email classification system based on a novel public dataset of 1447 manually annotated Romanian business-oriented emails. Our corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes. We establish a strong baseline using pre-trained Transformer models for token classification and multi-task classification, achieving an F1-score of 0.752 and 0.764, respectively. We publicly release our code together with the dataset of labeled emails.\",\"PeriodicalId\":13622,\"journal\":{\"name\":\"Inf. Comput.\",\"volume\":\"241 1\",\"pages\":\"321\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Inf. Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/info14060321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info14060321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

电子邮件分类系统对于处理和组织大量的通信流至关重要，特别是在业务环境中。虽然存在许多解决方案，但缺乏标准化的分类类别限制了它们的适用性。此外，缺乏面向商业的罗马尼亚语公共数据集使得开发此类解决方案变得困难。为此，我们介绍了一个多功能的自动电子邮件分类系统，该系统基于一个新的公共数据集，该数据集包含1447封手动注释的罗马尼亚商业电子邮件。我们的语料库有5个与标记相关的标签，以及5个与序列相关的类。我们使用预训练的Transformer模型建立了一个强大的基线，用于令牌分类和多任务分类，分别获得了0.752和0.764的f1得分。我们公开发布了我们的代码以及标记电子邮件的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-Task Romanian Email Classification in a Business Context

Email classification systems are essential for handling and organizing the massive flow of communication, especially in a business context. Although many solutions exist, the lack of standardized classification categories limits their applicability. Furthermore, the lack of Romanian language business-oriented public datasets makes the development of such solutions difficult. To this end, we introduce a versatile automated email classification system based on a novel public dataset of 1447 manually annotated Romanian business-oriented emails. Our corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes. We establish a strong baseline using pre-trained Transformer models for token classification and multi-task classification, achieving an F1-score of 0.752 and 0.764, respectively. We publicly release our code together with the dataset of labeled emails.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Inf. Comput.

自引率

0.00%

发文量

期刊最新文献

Traceable Constant-Size Multi-authority Credentials Pspace-Completeness of the Temporal Logic of Sub-Intervals and Suffixes Employee Productivity Assessment Using Fuzzy Inference System Correction of Threshold Determination in Rapid-Guessing Behaviour Detection Combining Classifiers for Deep Learning Mask Face Recognition