Proceedings of the Fourth International Conference on Document Analysis and Recognition最新文献

英文中文

Indexing and classification of TV news articles based on telop recognition 基于图像识别的电视新闻文章索引与分类

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.619882

Y. Ariki, T. Teranishi

In accumulating and retrieving multimedia information such as images, speech and text, it is necessary to compress and retrieve the information efficiently and accurately. The purpose of this paper is to construct a multimedia database of TV news images based on telop character recognition. The first step is to detect telop frames and to segment the characters by differentiating the telop frames based on the fact that character regions have high brightness and the character edges are clear. The second step is the telop character recognition. It is performed by a subspace method using direction histogram features. The third step is indexing by extracting noun words after morphological analysis of the recognized telop characters. These noun words correspond with key words and are given to TV news articles as their indices. Finally TV news articles are classified into 10 topics such as politics, economics, culture, amusements, sports and so on based on the extracted indices. We employed an index-topic table to classify the articles using indices. The telop character recognition rate was 65.7% and the article classification rate was 67.3%.

在图像、语音、文本等多媒体信息的积累和检索中，需要对信息进行高效、准确的压缩和检索。本文的目的是构建一个基于远程字符识别的电视新闻图像多媒体数据库。第一步是检测边缘帧，根据字符区域亮度高、字符边缘清晰的特点，通过区分边缘帧对字符进行分割。第二步是字符识别。利用方向直方图特征的子空间方法来实现。第三步是对识别出的名词特征进行形态分析后提取名词词进行标引。这些名词词与关键词相对应，作为电视新闻文章的索引。最后根据提取的指标将电视新闻文章分为政治、经济、文化、娱乐、体育等10个主题。我们使用索引主题表对文章进行索引分类。远程字符识别率为65.7%，文章分类率为67.3%。

{"title":"Indexing and classification of TV news articles based on telop recognition","authors":"Y. Ariki, T. Teranishi","doi":"10.1109/ICDAR.1997.619882","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.619882","url":null,"abstract":"In accumulating and retrieving multimedia information such as images, speech and text, it is necessary to compress and retrieve the information efficiently and accurately. The purpose of this paper is to construct a multimedia database of TV news images based on telop character recognition. The first step is to detect telop frames and to segment the characters by differentiating the telop frames based on the fact that character regions have high brightness and the character edges are clear. The second step is the telop character recognition. It is performed by a subspace method using direction histogram features. The third step is indexing by extracting noun words after morphological analysis of the recognized telop characters. These noun words correspond with key words and are given to TV news articles as their indices. Finally TV news articles are classified into 10 topics such as politics, economics, culture, amusements, sports and so on based on the extracted indices. We employed an index-topic table to classify the articles using indices. The telop character recognition rate was 65.7% and the article classification rate was 67.3%.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122875049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Speeding-up Chinese character recognition in an automatic document reading system 文档自动读取系统中汉字识别的提速

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.620581

Yi-Hong Tseng, Chi-Chang Kuo, Hsi-Jian Lee

We present two techniques for speeding up character recognition. Our character recognition system, including the candidate cluster selection and detail matching modules, is implemented using two statistical features: crossing counts and contour direction counts. In the training stage, we divide characters into different clusters. To keep a very high recognition rate, the candidate cluster selection module selects the top 60 clusters with minimal distances from among 300 predefined clusters. To further speed up the recognition speed, we use a modified branch and bound algorithm in the detail matching module. In the automatic document reading system, characters and punctuation marks are first extracted from printed document images and sorted according to their positions and the document orientation. The system then recognizes all printed Chinese characters between pairs of punctuation marks. The results are then spoken aloud by a speech synthesis system.

我们提出了两种加速字符识别的技术。我们的字符识别系统包括候选聚类选择和细节匹配模块，使用两个统计特征:交叉计数和轮廓方向计数来实现。在训练阶段，我们将字符分成不同的簇。为了保持很高的识别率，候选聚类选择模块从300个预定义聚类中选择距离最小的前60个聚类。为了进一步提高识别速度，我们在细节匹配模块中采用了改进的分支定界算法。在文档自动阅读系统中，首先从打印的文档图像中提取字符和标点符号，并根据它们的位置和文档方向进行排序。然后，该系统识别标点符号对之间的所有打印汉字。然后通过语音合成系统大声说出结果。

引用次数: 45

A proposal for a text-indicated writer verification method 文本指示的作者验证方法的建议

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.620600

Y. Yamazaki, N. Komatsu

We propose an on-line writer verification method to improve the reliability of verifying a specific system user. In the proposed method, a different text including ordinary characters is used on every verification process. This text can be selected automatically by the verification system so as to reflect the specific writer's features. A specific writer is accepted only when the same text, which is indicated by the verification system, is written, and the system can verify the writer's personal features from the written text. The proposed method makes it more difficult to disguise writer himself with forged handwriting data than the previous methods using only signatures.

为了提高对特定系统用户的验证可靠性，提出了一种在线作者验证方法。在该方法中，每个验证过程都使用包含普通字符的不同文本。该文本可以由验证系统自动选择，以反映特定作者的特征。只有当所写的文字与验证系统所指示的文本相同，并且系统可以从所写的文本中验证作者的个人特征时，才会接受特定的作者。与以前仅使用签名的方法相比，该方法更难以使用伪造的手写数据来伪装写信人本人。

引用次数: 15

Page segmentation using document model 使用文档模型进行页面分割

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.619809

Anil K. Jain, B. Yu

Transforming a paper document to its electronic version in a form suitable for efficient storage, retrieval and interpretation continues to be a challenging problem. An efficient document model is necessary to solve this problem. Document modeling involves techniques of thresholding, skew detection, geometric layout analysis and logical layout analysis. The derived model can then be used in document storage and retrieval. We use the traditional bottom-up approach based on the connected component extraction to efficiently implement page segmentation and region identification. A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer and logical analysis.

将纸质文件以适合有效储存、检索和解释的形式转换为电子文件仍然是一个具有挑战性的问题。一个有效的文档模型是解决这个问题的必要条件。文档建模涉及阈值分割、倾斜检测、几何布局分析和逻辑布局分析等技术。然后，可以将导出的模型用于文档存储和检索。我们采用传统的基于连通成分提取的自底向上方法，有效地实现了页面分割和区域识别。提出了一种保留自顶向下生成信息的文档模型，在此基础上对文档进行逻辑表示，以便进行交互式编辑、存储、检索、传递和逻辑分析。

引用次数: 19

Hand-printed Chinese character recognition via machine learning 基于机器学习的手印汉字识别

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.619839

A. Amin, Seung-Gwon Kim, C. Sammut

Recognition of Chinese characters has been an area of great interest for many years, and a large number of research papers and reports have already been published in this area. There are several major problems with Chinese character recognition: Chinese characters are distinct and ideographic, the character size is very large and a lot of structurally similar characters exist in the character set. Thus, classification criteria are difficult to generate. This paper presents a new technique for the recognition of hand-printed Chinese characters using machine learning C4.5. Conventional methods have relied on hand-constructed dictionaries which are tedious to construct and difficult to make tolerant to variation in writing styles. The paper also discusses Chinese character recognition using dominant point feature extraction and C4.5. The system was tested with 900 characters (each character has 40 samples) and the rate of recognition obtained was 84%.

多年来，汉字识别一直是一个备受关注的领域，在这一领域已经发表了大量的研究论文和报告。汉字识别存在几个主要问题:汉字具有明显的表意性，字符尺寸非常大，字符集中存在大量结构相似的字符。因此，很难产生分类标准。本文提出了一种基于机器学习C4.5的手印汉字识别新技术。传统的方法依赖于手工构建的词典，这些词典构建起来很繁琐，而且很难适应写作风格的变化。本文还讨论了基于优势点特征提取和C4.5的汉字识别方法。该系统以900个字符(每个字符有40个样本)进行测试，获得的识别率为84%。

引用次数: 8

Location and recognition of legal amounts on Chinese bank cheques 中国银行支票上法定金额的定位和识别

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.620570

Chiu L. Yu, C. Suen, Y. Tang

This paper describes a Chinese cheque processing system currently under development at the Centre for Pattern Recognition and Machine Intelligence (CENPARMI). The information on Chinese bank cheques is not the same as that on alphanumeric bank cheques. The legal amount in a Chinese bank cheque is the Chinese character text associated with each currency unit. This paper discusses a technique using each currency unit as a key word to locate/extract the legal amount in bank cheques. In the analysis and recognition process, the system tries to locate the smallest currency units in the image and identifies it first. Then, the system tries to locate the image strings associated with each currency unit. Each image string is separated and recognized. Next, a set of rules and context are applied to recognize the characters. In order to choose the correct one, the recognized character string is accepted only if it satisfies all the conditions governed by rules.

本文介绍了模式识别与机器智能中心(CENPARMI)正在开发的中文支票处理系统。中国银行支票上的信息与字母数字银行支票上的信息不一样。中国银行支票上的法定金额是与每个货币单位相关联的中文文字。本文讨论了一种以货币单位为关键字来定位/提取银行支票中法定金额的技术。在分析和识别过程中，系统试图找到图像中最小的货币单位，并首先识别它。然后，系统尝试定位与每个货币单位相关联的图像字符串。每个图像字符串被分离和识别。接下来，应用一组规则和上下文来识别字符。为了选择正确的字符串，只有当已识别的字符串满足规则控制的所有条件时才接受它。

引用次数: 4

The detection of duplicates in document image databases 文档图像数据库中重复项的检测

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.619863

D. Doermann, Huiping Li, O. Kia

We propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust "signature" extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods, including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos.

我们提出并实现了一种在非常大的图像数据库中检测重复文档的方法。该方法基于从每个文档图像中提取的鲁棒“签名”，该签名用于索引到先前处理过的文档表。与OCR或其他基于识别的方法相比，该方法具有许多优点，包括速度和对成像畸变的鲁棒性。为了验证该方法并测试可扩展性，我们开发了一个模拟器，允许我们更改系统参数并检查数百万个文档签名的性能。一个完整的系统在技术文章和备忘录的测试集合上实现和测试。

引用次数: 67

Establishment of personalized templates for automatic signature verification 建立个性化签名自动验证模板

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.619853

Christiane Schmidt, K. Kraiss

The paper presents a novel method of on-line signature verification that analyzes both the shape of the signature and dynamics of the writing process. This approach automatically determines characteristic features of the written image and combines these shape features with features from the writing dynamics. For establishing a writing characteristic template for one signer the signature is separated into characteristic segments. The segmentation algorithm extracts writing points which would give a forgery the appearance of the original. For these significant points local extreme values, which identify writing segments, are calculated. Subsequently, dynamic features are computed for the segments. The developed system needs three signatures of one person for the establishment of a personalized template. A database has been collected with 544 signatures of 27 signers for evaluation. The developed system achieved a correct acceptance rate of 78% and a correct rejection rate of 100%.

本文提出了一种新的在线签名验证方法，该方法既分析了签名的形状，又分析了书写过程的动态。该方法自动确定书写图像的特征特征，并将这些形状特征与书写动态特征相结合。为建立单个签名者的书写特征模板，将签名分成特征段。分割算法提取写入点，这些写入点会使伪造品看起来像原件。对于这些显著点，计算识别写入段的局部极值。然后，计算段的动态特征。开发的系统需要一个人的三个签名来建立个性化模板。已收集了27名签名者的544个签名的数据库，以供评估。所开发的系统的正确合格率为78%，正确拒绝率为100%。

引用次数: 27

Shape based learning for a multi-template method, and its application to handprinted numeral recognition 基于形状学习的一种多模板方法，及其在手印数字识别中的应用

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.620548

T. Yamauchi, Y. Itamoto, J. Tsukumo

Character recognition using multi-template methods is promising. Higher classification performance can be achieved according to an increase in the number of templates. However, classification performance is saturated because there is classifiability loss in feature extraction. The paper proposes a new multi-template method which learns training patterns with character shape information assigned by the authors. This method uses contour feature and direction feature, and includes a character shape consistency test applied to the conventional multi-template methods. The paper presents experimental results obtained from handprinted numerals. On the ETL-6 database classification experiment the classification rate was 99.19% and the substitution rate was 0.03%. A higher classification rate could be achieved.

采用多模板方法进行字符识别是很有前途的。模板数量的增加可以提高分类性能。然而，由于特征提取过程中存在可分类性损失，导致分类性能饱和。本文提出了一种新的多模板方法，利用作者指定的字符形状信息学习训练模式。该方法利用轮廓特征和方向特征，并包含一个适用于传统多模板方法的字符形状一致性检验。本文介绍了手印数字的实验结果。在ETL-6数据库分类实验中，分类率为99.19%，替代率为0.03%。可以实现更高的分类率。

引用次数: 1

Global interpolation method II for handwritten numbers overlapping a border by automatic knowledge acquisition of overlapped conditions 基于重叠条件知识自动获取的手写体数字边界重叠全局插值方法二

Proceedings of the Fourth International Conference on Document Analysis and Recognition

Pub Date : 1997-08-18 DOI: 10.1109/ICDAR.1997.620558

S. Naoi, Maki Yabuki

The global interpolation method we propose can extract a handwritten alpha-numeric character pattern even if it overlaps a border. Our method interpolates blank segments in a character after borders are removed by globally evaluating segment label connectivity and connectedness to produce characters with smooth edges. However, the method cannot interpolate missing superpositioning segments, such as an overlapping horizontal line in the number "2". To solve this problem, we propose a global interpolation method II which adds top-down recognition processing to the bottom-up processing of the existing global interpolation method by automatically acquiring knowledge of the relationship between the overlapped condition and recognition reliability. Experimental results which use generated overlapping characters using the ETL database showed that our global interpolation method II has almost the same accuracy as the original ETL database.

我们提出的全局插值方法可以提取手写字母数字字符模式，即使它与边界重叠。我们的方法通过全局评估片段标签的连通性和连通性，在去除边界后插入字符中的空白段，以产生边缘光滑的字符。然而，该方法不能插值缺失的重叠部分，例如数字“2”中的重叠水平线。为了解决这一问题，我们提出了一种全局插值方法II，该方法通过自动获取重叠条件与识别可靠性之间的关系知识，在现有全局插值方法的自下而上处理基础上增加了自顶向下的识别处理。使用ETL数据库生成重叠字符的实验结果表明，我们的全局插值方法II与原始ETL数据库具有几乎相同的精度。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Fourth International Conference on Document Analysis and Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀