用于文档图像分类的单类分类器的交互定义和调优

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI:10.1109/DAS.2016.46

Nathalie Girard, Roger Trullo, Sabine Barrat, N. Ragot, Jean-Yves Ramel

{"title":"用于文档图像分类的单类分类器的交互定义和调优","authors":"Nathalie Girard, Roger Trullo, Sabine Barrat, N. Ragot, Jean-Yves Ramel","doi":"10.1109/DAS.2016.46","DOIUrl":null,"url":null,"abstract":"With mass of data, document image classification systems have to face new trends like being able to process heterogeneous data streams efficiently. Generally, when processing data streams, few knowledge is available about the content of the possible streams. Furthermore, as getting labelled data is costly, the classification model has to be learned from few available labelled examples. To handle such specific context, we think that combining one-class classifiers could be a very interesting alternative to quickly define and tune classification systems dedicated to different document streams. The main interest of one-class classifiers is that no interdependence occurs between each classifier model allowing easy removal, addition or modification of classes of documents. Such reconfiguration will not have any impact on the other classifiers. It is also noticeable that each classifier can use a different set of features compared to the other to handle the same class or even different classes. In return, as only one class is well-specified during the learning step, one-class classifiers have to be defined carefully to obtain good performances. It is more difficult to select the representative training examples and the discriminative features with only positive examples. To overcome these difficulties, we have defined a complete framework offering different methods that can help a system designer to define and tune one-class classifier models. The aims are to make easier the selection of good training examples and of suitable features depending on the class to recognize into the document stream. For that purpose, the proposed methods compute different measures to evaluate the relevance of the available features and training examples. Moreover, a visualization of the decision space according to selected examples and features is proposed to help such a choice and, an automatic tuning is proposed for the parameters of the models according to the class to recognize when a validation stream is available. The pertinence of the proposed framework is illustrated on two different use cases (a real data stream and a public data set).","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Interactive Definition and Tuning of One-Class Classifiers for Document Image Classification\",\"authors\":\"Nathalie Girard, Roger Trullo, Sabine Barrat, N. Ragot, Jean-Yves Ramel\",\"doi\":\"10.1109/DAS.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With mass of data, document image classification systems have to face new trends like being able to process heterogeneous data streams efficiently. Generally, when processing data streams, few knowledge is available about the content of the possible streams. Furthermore, as getting labelled data is costly, the classification model has to be learned from few available labelled examples. To handle such specific context, we think that combining one-class classifiers could be a very interesting alternative to quickly define and tune classification systems dedicated to different document streams. The main interest of one-class classifiers is that no interdependence occurs between each classifier model allowing easy removal, addition or modification of classes of documents. Such reconfiguration will not have any impact on the other classifiers. It is also noticeable that each classifier can use a different set of features compared to the other to handle the same class or even different classes. In return, as only one class is well-specified during the learning step, one-class classifiers have to be defined carefully to obtain good performances. It is more difficult to select the representative training examples and the discriminative features with only positive examples. To overcome these difficulties, we have defined a complete framework offering different methods that can help a system designer to define and tune one-class classifier models. The aims are to make easier the selection of good training examples and of suitable features depending on the class to recognize into the document stream. For that purpose, the proposed methods compute different measures to evaluate the relevance of the available features and training examples. Moreover, a visualization of the decision space according to selected examples and features is proposed to help such a choice and, an automatic tuning is proposed for the parameters of the models according to the class to recognize when a validation stream is available. The pertinence of the proposed framework is illustrated on two different use cases (a real data stream and a public data set).\",\"PeriodicalId\":197359,\"journal\":{\"name\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

随着海量数据的出现，文档图像分类系统必须面对高效处理异构数据流的新趋势。通常，在处理数据流时，很少有关于可能流内容的知识可用。此外，由于获得标记数据的成本很高，分类模型必须从很少可用的标记示例中学习。为了处理这种特定的上下文，我们认为组合单类分类器可能是一种非常有趣的替代方法，可以快速定义和调优专用于不同文档流的分类系统。单类分类器的主要优点是每个分类器模型之间没有相互依赖关系，从而可以轻松地删除、添加或修改文档的类。这样的重新配置不会对其他分类器产生任何影响。同样值得注意的是，与其他分类器相比，每个分类器可以使用一组不同的特征来处理相同甚至不同的类。反过来，由于在学习步骤中只有一个类被很好地指定，因此必须仔细定义一个类分类器以获得良好的性能。在只有正例的情况下，选择具有代表性的训练样例和判别特征更加困难。为了克服这些困难，我们定义了一个完整的框架，提供了不同的方法，可以帮助系统设计人员定义和调优单类分类器模型。目的是为了更容易地选择好的训练示例和合适的特征，这取决于要识别到文档流中的类。为此，提出的方法计算不同的度量来评估可用特征和训练示例的相关性。此外，提出了一种根据所选示例和特征对决策空间进行可视化的方法来帮助进行选择，并提出了一种根据类别对模型参数进行自动调整的方法，以识别何时有验证流可用。提出的框架的相关性通过两个不同的用例(真实数据流和公共数据集)来说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Interactive Definition and Tuning of One-Class Classifiers for Document Image Classification

With mass of data, document image classification systems have to face new trends like being able to process heterogeneous data streams efficiently. Generally, when processing data streams, few knowledge is available about the content of the possible streams. Furthermore, as getting labelled data is costly, the classification model has to be learned from few available labelled examples. To handle such specific context, we think that combining one-class classifiers could be a very interesting alternative to quickly define and tune classification systems dedicated to different document streams. The main interest of one-class classifiers is that no interdependence occurs between each classifier model allowing easy removal, addition or modification of classes of documents. Such reconfiguration will not have any impact on the other classifiers. It is also noticeable that each classifier can use a different set of features compared to the other to handle the same class or even different classes. In return, as only one class is well-specified during the learning step, one-class classifiers have to be defined carefully to obtain good performances. It is more difficult to select the representative training examples and the discriminative features with only positive examples. To overcome these difficulties, we have defined a complete framework offering different methods that can help a system designer to define and tune one-class classifier models. The aims are to make easier the selection of good training examples and of suitable features depending on the class to recognize into the document stream. For that purpose, the proposed methods compute different measures to evaluate the relevance of the available features and training examples. Moreover, a visualization of the decision space according to selected examples and features is proposed to help such a choice and, an automatic tuning is proposed for the parameters of the models according to the class to recognize when a validation stream is available. The pertinence of the proposed framework is illustrated on two different use cases (a real data stream and a public data set).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

自引率

0.00%

发文量