TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Q3 Environmental Science AACL Bioflux Pub Date : 2022-08-16 DOI:10.48550/arXiv.2208.07846
Lorenz Stangier, Ji-Ung Lee, Yuxi Wang, Marvin Müller, Nicholas R. J. Frick, J. Metternich, Iryna Gurevych
{"title":"TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation","authors":"Lorenz Stangier, Ji-Ung Lee, Yuxi Wang, Marvin Müller, Nicholas R. J. Frick, J. Metternich, Iryna Gurevych","doi":"10.48550/arXiv.2208.07846","DOIUrl":null,"url":null,"abstract":"Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate _problems_, _causes_, and _solutions_ that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"6 1","pages":"9-16"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AACL Bioflux","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2208.07846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Environmental Science","Score":null,"Total":0}
引用次数: 2

Abstract

Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate _problems_, _causes_, and _solutions_ that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TexPrax:一个道德的、实时的数据收集和注释的消息传递应用程序
收集和注释面向任务的对话框数据是困难的,特别是对于需要专家知识的高度特定的领域。与此同时,诸如即时通讯工具之类的非正式沟通渠道也越来越多地用于工作中。这导致大量与工作相关的信息通过这些渠道传播,需要员工手工进行后期处理。为了缓解这个问题,我们介绍了TexPrax,这是一个消息传递系统,用于收集和注释与工作相关的聊天中出现的问题、原因和解决方案。TexPrax使用聊天机器人直接吸引员工,为他们的对话提供轻量级注释,并简化他们的文档工作。为了遵守数据隐私和安全法规,我们使用端到端消息加密,让我们的用户完全控制他们的数据,这比传统的注释工具有很多优点。我们在对德国工厂员工的用户研究中对TexPrax进行了评估,这些员工在日常工作中遇到问题时会向同事寻求解决方案。总的来说,我们收集了202个面向任务的德语对话,其中包含1,027个句子,其中包含句子级专家注释。我们的数据分析还表明,现实世界的对话经常包含代码切换的实例,同一实体的不同缩写,以及NLP系统应该能够处理的方言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
AACL Bioflux
AACL Bioflux Environmental Science-Management, Monitoring, Policy and Law
CiteScore
1.40
自引率
0.00%
发文量
0
期刊最新文献
HaRiM^+: Evaluating Summary Quality with Hallucination Risk PESE: Event Structure Extraction using Pointer Network based Encoder-Decoder Architecture Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems Local Structure Matters Most in Most Languages Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1