{"title":"卷积神经网络应用于可视化文档分析的最佳实践","authors":"P. Simard, David Steinkraus, John C. Platt","doi":"10.1109/ICDAR.2003.1227801","DOIUrl":null,"url":null,"abstract":"Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple \"do-it-yourself\" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"4 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2755","resultStr":"{\"title\":\"Best practices for convolutional neural networks applied to visual document analysis\",\"authors\":\"P. Simard, David Steinkraus, John C. Platt\",\"doi\":\"10.1109/ICDAR.2003.1227801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple \\\"do-it-yourself\\\" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.\",\"PeriodicalId\":249193,\"journal\":{\"name\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"volume\":\"4 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2755\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2003.1227801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Best practices for convolutional neural networks applied to visual document analysis
Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple "do-it-yourself" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.