Multi-label Connectionist Temporal Classification

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00161

Curtis Wigington, Brian L. Price, Scott D. Cohen

引用次数: 11

Abstract

The Connectionist Temporal Classification (CTC) loss function [1] enables end-to-end training of a neural network for sequence-to-sequence tasks without the need for prior alignments between the input and output. CTC is traditionally used for training sequential, single-label problems; each element in the sequence has only one class. In this work, we show that CTC is not suitable for multi-label tasks and we present a novel Multi-label Connectionist Temporal Classification (MCTC) loss function for multi-label, sequence-to-sequence classification. Multi-label classes can represent meaningful attributes of a single element; for example, in Optical Music Recognition (OMR), a music note can have separate duration and pitch attributes. Our approach achieves state-of-the-art results on Joint Handwritten Text Recognition and Name Entity Recognition, Asian Character Recognition, and OMR.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多标签联结时间分类

Connectionist Temporal Classification (CTC)损失函数[1]使神经网络能够端到端训练序列到序列的任务，而不需要在输入和输出之间进行事先对齐。CTC传统上用于训练顺序的单标签问题;序列中的每个元素只有一个类。在这项工作中，我们证明了CTC不适合多标签任务，并提出了一种新的多标签连接时间分类(MCTC)损失函数，用于多标签，序列到序列分类。多标签类可以表示单个元素的有意义的属性;例如，在光学音乐识别(OMR)中，一个音符可以有单独的持续时间和音高属性。我们的方法在联合手写文本识别和名称实体识别、亚洲字符识别和OMR方面取得了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量