Proceedings of 3rd International Conference on Document Analysis and Recognition最新文献

英文中文

Segmentation of complex documents multilevel images: a robust and fast text bodies-headers detection and extraction scheme 复杂文档多层次图像分割:一种鲁棒快速的文本正文-标题检测与提取方案

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602016

D. Olivier, B. Dominique

We present a method for segmenting multilevels images of documents. The documents are considered difficult ones in the sense they may contain text paragraphs with different orientations and shapes, mixed with graphics and photographs. The proposed method extracts and separates blocks of text lines (printed or handwritten characters) and headers as well as stroke structures. The generic approach is first based on a multiscale analysis with the use of a pyramid representation of the image. At each level, text location is performed by a line borders detection scheme. Then, an efficient bottom-up procedure generates bodies (text paragraphs) as the output of algebric transformations upon a set of four directed graphs associated with the topological relationships of physical components.

提出了一种多层次图像分割方法。这些文件被认为是困难的文件，因为它们可能包含不同方向和形状的文本段落，并夹杂着图形和照片。提出的方法提取和分离文本行块(印刷或手写字符)和标题以及笔画结构。通用方法首先基于多尺度分析，使用图像的金字塔表示。在每个级别上，文本位置由行边界检测方案执行。然后，一个有效的自下而上过程生成主体(文本段落)，作为与物理组件的拓扑关系相关的一组四个有向图的代数转换的输出。

引用次数: 7

A new approach for Latin/Arabic character segmentation 拉丁/阿拉伯字符分割的一种新方法

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602040

K. Romeo-Pakker, H. Miled, Y. Lecourtier

In this paper, we propose two methods of character segmentation for Arabic handwritten characters and cursive Latin characters. Classical horizontal and vertical projections detect the lowercase writing area in lines. The problem of overlapping lower or upper strokes is resolved with a contour-following algorithm which starts in the lowercase writing area and labels the detected contours. In the first method, the junction segments connecting the characters to each other are detected by taking into account the writing line thickness. The second method detects the upper contour of each word. The strokes are detected in order to find primary segmentation points (PSP). These points are analysed with an automaton that considers the shape of the word for the determination of definitive segmentation points (DSP). The two methods are compared and the results are discussed.

本文提出了阿拉伯文手写字符和草书拉丁字符的两种字符分割方法。传统的水平和垂直投影检测行中的小写书写区域。采用轮廓跟踪算法解决上下笔画重叠问题，该算法从小写书写区域开始，并标记检测到的轮廓。在第一种方法中，通过考虑书写线粗细来检测连接字符彼此的连接段。第二种方法检测每个单词的上轮廓。检测笔画以找到主要分割点(PSP)。用自动机分析这些点，自动机考虑单词的形状以确定最终分割点(DSP)。对两种方法进行了比较，并对结果进行了讨论。

引用次数: 51

Experiments on extracting structural information from paper documents using syntactic pattern analysis 基于句法模式分析的纸质文档结构信息提取实验

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.599039

T. Bayer, H. Walischewski

Extracting structural information from paper documents supports the daily document processing by, for example, automatically finding index terms, document topics, etc. Knowledge about such components are modeled in a semantic net, which describes geometric properties, spatial relationships, lexical entities as well as lexical relationships. The document model is used to extract the sender, date, recipient, opening and closing formula from a business letter. 181 business letters have been processed, divided into a training set of 20 and the remaining ones for testing. The error rates for the test set range from 0.022 to 0.049 by an average rejection rate of 0.4. Results show that the computational effort can be limited to O(n/sup 2/) given n primitive objects for matching.

从纸质文档中提取结构信息支持日常文档处理，例如，自动查找索引术语、文档主题等。关于这些组件的知识在语义网络中建模，语义网络描述了几何属性、空间关系、词汇实体以及词汇关系。文档模型用于从商业信函中提取发件人、日期、收件人、开始和结束公式。已处理181封商务信函，分为训练集20封，其余为测试集。测试集的错误率范围为0.022至0.049，平均拒绝率为0.4。结果表明，在给定n个基本匹配对象的情况下，计算量可以限制在O(n/sup 2/)。

引用次数: 24

Computer processing on the identification of a Chinese seal image 中国印章图像识别的计算机处理

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.599027

Yung-Sheng Chen

Seal identification is usually performed by hard-matching of human visual inspection. A computer processing of the hard-matching is presented to identify a Chinese seal image. Seals of the author's own are used for experiments. Results show that the proposed approach is feasible.

海豹的识别通常是通过人眼视觉的硬匹配来完成的。提出了一种中文印章图像硬匹配识别的计算机处理方法。作者自己的印章用于实验。结果表明，该方法是可行的。

引用次数: 9

An intelligent Chinese official document processing system 智能中文公文处理系统

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602064

Tun-Wen Pai, Tieh-Ming Wu, Gan-How Chang, Pei-Yih Ting

Automated Chinese official document processing techniques and the public-key based cryptographic methodology are used to improve the efficiency of the existing Chinese executive information systems. They provide executive officials more room to achieve the best decision-making. One of the best solutions in document automation and message transmission is to combine low-cost computing power with stable communication capability. In this paper a computer-based architecture for intelligent Chinese official document processing is proposed. The automation of form analysis, document processing, data filing, security control, and information retrieval are discussed in detail. A digital multisignature technology is employed in this system to meet the basic requirements of data security and trust handling. By using such a technology, users will have confidence in utilizing information transmission systems without security suspicion.

采用中文公文自动处理技术和基于公钥的密码学方法，提高了现有中文行政信息系统的效率。它们为行政官员提供了更大的空间来实现最佳决策。文档自动化和消息传输的最佳解决方案之一是将低成本的计算能力与稳定的通信能力相结合。本文提出了一种基于计算机的公文智能处理体系结构。详细讨论了表单分析、文档处理、数据归档、安全控制和信息检索的自动化。该系统采用了数字多重签名技术，满足了数据安全和信任处理的基本要求。通过使用这种技术，用户将有信心使用没有安全疑虑的信息传输系统。

引用次数: 2

Faxed image restoration using Kalman filtering 基于卡尔曼滤波的传真图像恢复

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.601987

Myoung-Young Yoon, Seong-Whan Lee, Ju-Sung Kim

We present a new scheme for the restoration of faxed images degraded by both salient noise and additive white noise. We consider two fundamental aspects of faxed image restoration: modeling and the restoration algorithm. First, a model of a faxed image is presented. The model is based on an autoregressive Gauss-Markov random field and includes vertical and horizontal overlap effects. In particular, we concentrate on the nonsymmetric half plane causality. Second, the restoration of faxed images degraded by salient noise and additive white noise is considered by 2D Kalman filtering which provides an efficient recursive procedure. In order to illustrate the effectiveness of the proposed scheme, we present experimental results on the restoration of a faxed image.

我们提出了一种新的方案来恢复被显著噪声和加性白噪声退化的传真图像。我们考虑了传真图像恢复的两个基本方面:建模和恢复算法。首先，提出了一种传真图像的模型。该模型基于自回归高斯-马尔可夫随机场，包括垂直和水平重叠效应。特别地，我们集中于非对称半平面的因果关系。其次，采用二维卡尔曼滤波方法对显著噪声和加性白噪声退化的传真图像进行恢复，提供了一种有效的递归算法。为了说明所提方案的有效性，我们给出了对传真图像恢复的实验结果。

引用次数: 15

Knowledge-based derivation of document logical structure 基于知识的文档逻辑结构派生

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.599038

Debashish Niyogi, S. Srihari

The analysis of a document image to derive a symbolic description of its structure and contents involves using spatial domain knowledge to classify the different printed blocks (e.g., text paragraphs), group them into logical units (e.g., newspaper stories), and determine the reading order of the text blocks within each unit. These steps describe the conversion of the physical structure of a document into its logical structure. We have developed a computational model for document logical structure derivation, in which a rule-based control strategy utilizes the data obtained from analyzing a digitized document image, and makes inferences using a multi-level knowledge base of document layout rules. The knowledge-based document logical structure derivation system (DeLoS) based on this model consists of a hierarchical rule-based control system to guide the block classification, grouping and read-ordering operations; a global data structure to store the document image data and incremental inferences; and a domain knowledge base to encode the rules governing document layout.

对文档图像进行分析以获得其结构和内容的符号描述涉及到使用空间域知识对不同的印刷块(例如文本段落)进行分类，将它们分组为逻辑单元(例如报纸故事)，并确定每个单元内文本块的阅读顺序。这些步骤描述了将文档的物理结构转换为其逻辑结构。我们建立了一个文档逻辑结构推导的计算模型，其中基于规则的控制策略利用从数字化文档图像中分析得到的数据，并使用多层次的文档布局规则知识库进行推理。基于该模型的基于知识的文档逻辑结构派生系统(DeLoS)包括一个分层的、基于规则的控制系统来指导块的分类、分组和读排序操作;用于存储文档图像数据和增量推断的全局数据结构;以及一个领域知识库，用于编码控制文档布局的规则。

引用次数: 62

Description and recognition of form and automated form data entry 描述和识别表单和自动表单数据输入

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.601963

Jinhui Liu, Xiaoqing Ding, Youshou Wu

In this paper we present a form description method, in which frame lines are used to constitute a so-called frame template, which reflects the structure of a form either topologically or geometrically. Relevant item traversal algorithm is then proposed to locate and label form's items. We have also developed a robust and fast frame line detection method to make this form description practical for form recognition. Experimental results show our approach provides an effective way to convert printed forms into computerized format or collect information for database from printed forms.

在本文中，我们提出了一种形式描述方法，其中框架线被用来构成一个所谓的框架模板，它反映了一个形式的结构拓扑或几何。然后提出了相应的项遍历算法来定位和标记表单中的项。我们还开发了一种鲁棒和快速的帧线检测方法，使这种形式描述适用于形式识别。实验结果表明，该方法为将打印表单转换为计算机格式或将打印表单信息收集到数据库中提供了有效的方法。

引用次数: 36

Segmentation of numeric strings 数字字符串的分割

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602080

G. Congedo, G. Dimauro, S. Impedovo, G. Pirlo

This paper presents a complete procedure for the segmentation of handwritten numeric strings. The procedure uses an hypothesis-then-verification strategy in which multiple segmentation algorithms based on contiguous row partition work sequentially on the binary image until an acceptable segmentation is obtained. At this purpose a new set of algorithms simulating a "drop falling" process is introduced. The experimental tests demonstrate the effectiveness of the new algorithms in obtaining high-confidence segmentation hypotheses.

本文给出了一个完整的手写数字字符串分割程序。该方法采用假设-验证策略，其中基于连续行分割的多个分割算法依次对二值图像进行分割，直到获得可接受的分割。为此，本文引入了一套新的模拟“滴落”过程的算法。实验验证了新算法在获得高置信度分割假设方面的有效性。

引用次数: 83

Extracting characters and character lines in multi-agent scheme 多智能体方案中的字符和字符行提取

Proceedings of 3rd International Conference on Document Analysis and Recognition

Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.599000

K. Gyohten, Tomoko Sumiya, N. Babaguchi, K. Kakusho, T. Kitahashi

In this paper, we present COCE (COordinative Character Extractor), a new method for extracting printed Japanese characters from an unformatted document image. This research aims to exploit knowledge independent of the layouts. COCE is based on a multiagent scheme where each agent is assigned to a single character line and tries to extract characters by making use of the knowledge about features of a character line as well as shapes and arrangement of characters. Moreover, the agents communicate with each other to keep consistency between their tasks. We have favourable results for the effectiveness of this method.

本文提出了一种从未格式化文档图像中提取打印日文字符的新方法COCE (coordinated Character Extractor)。本研究旨在开发与布局无关的知识。COCE基于多智能体方案，每个智能体被分配到单个字符线，并试图利用有关字符线的特征以及字符的形状和排列的知识来提取字符。此外，代理之间相互通信以保持其任务之间的一致性。我们对这种方法的有效性有良好的结果。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of 3rd International Conference on Document Analysis and Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀