In general, supervised Machine Learning approaches using labeled training data currently promise the best classification results with respect to classification accuracy. Therefore, data annotation is a key component of most Machine Learning projects implemented. However, creating labels for a training data set is often an elaborate project involving arduous and repetitive work, which is why data scientists often try to minimize the effort for data annotation by automating the data annotation process itself. In this paper, we present a case study of two data annotation projects on the same data set of support tickets and compare these: one using human annotators and the other using algorithmic Learning Functions in a combination of Active Learning and Weak Supervision. Here, we achieved a weighted confidence score of >94 % for the human-created labels, while also achieving up to 92 % agreement between the labels of our automated project and the labels created by human annotators, with the need for only 10 % of human annotation as the starting input of the automated approach. Additionally, we were able to reproduce the value of 85 % for initial human classification accuracy in support ticket distribution from previous papers. We close with a reflection about the worth of business understanding in data annotation projects and the problem and proposed solutions to ticket ambiguity.
扫码关注我们
求助内容:
应助结果提醒方式:
