TexPrax:一个道德的、实时的数据收集和注释的消息传递应用程序

Q3 Environmental Science AACL Bioflux Pub Date : 2022-08-16 DOI:10.48550/arXiv.2208.07846

Lorenz Stangier, Ji-Ung Lee, Yuxi Wang, Marvin Müller, Nicholas R. J. Frick, J. Metternich, Iryna Gurevych

{"title":"TexPrax:一个道德的、实时的数据收集和注释的消息传递应用程序","authors":"Lorenz Stangier, Ji-Ung Lee, Yuxi Wang, Marvin Müller, Nicholas R. J. Frick, J. Metternich, Iryna Gurevych","doi":"10.48550/arXiv.2208.07846","DOIUrl":null,"url":null,"abstract":"Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate _problems_, _causes_, and _solutions_ that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"6 1","pages":"9-16"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation\",\"authors\":\"Lorenz Stangier, Ji-Ung Lee, Yuxi Wang, Marvin Müller, Nicholas R. J. Frick, J. Metternich, Iryna Gurevych\",\"doi\":\"10.48550/arXiv.2208.07846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate _problems_, _causes_, and _solutions_ that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.\",\"PeriodicalId\":39298,\"journal\":{\"name\":\"AACL Bioflux\",\"volume\":\"6 1\",\"pages\":\"9-16\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AACL Bioflux\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2208.07846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AACL Bioflux","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2208.07846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Environmental Science","Score":null,"Total":0}

引用次数: 2

摘要

收集和注释面向任务的对话框数据是困难的，特别是对于需要专家知识的高度特定的领域。与此同时，诸如即时通讯工具之类的非正式沟通渠道也越来越多地用于工作中。这导致大量与工作相关的信息通过这些渠道传播，需要员工手工进行后期处理。为了缓解这个问题，我们介绍了TexPrax，这是一个消息传递系统，用于收集和注释与工作相关的聊天中出现的问题、原因和解决方案。TexPrax使用聊天机器人直接吸引员工，为他们的对话提供轻量级注释，并简化他们的文档工作。为了遵守数据隐私和安全法规，我们使用端到端消息加密，让我们的用户完全控制他们的数据，这比传统的注释工具有很多优点。我们在对德国工厂员工的用户研究中对TexPrax进行了评估，这些员工在日常工作中遇到问题时会向同事寻求解决方案。总的来说，我们收集了202个面向任务的德语对话，其中包含1,027个句子，其中包含句子级专家注释。我们的数据分析还表明，现实世界的对话经常包含代码切换的实例，同一实体的不同缩写，以及NLP系统应该能够处理的方言。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate _problems_, _causes_, and _solutions_ that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AACL Bioflux Environmental Science-Management, Monitoring, Policy and Law

CiteScore

1.40

自引率

0.00%

发文量