首页 > 最新文献

Proceedings of 3rd International Conference on Document Analysis and Recognition最新文献

英文 中文
A microprocessor-based optical character recognition check reader 一种基于微处理器的光学字符识别校验器
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602066
Francis Y. L. Chin, Francis Wu
Magnetic Ink Character Recognition (MICR) technology has widely been used for processing bank checks. Since the MICR character set is a special type font, and the ink is also readable by human being, optical approach can also be used. This report will describe the design of a low-cost, but highly accurate, microprocessor-based optical character recognition (OCR) check reader. The performance of our OCR reader is affected by a number of factors, mainly the noise generated by the lens system and the colour image at the check background. In this paper we describe how our software solution can alleviate these problems. As speed is another concern, special attention is paid to the design of recognition algorithm, such as the avoidance of floating point arithmetics, hardware limitations, etc.
磁墨字符识别(MICR)技术已广泛应用于银行支票处理。由于MICR字符集是一种特殊的类型字体,而且墨水也是人类可读的,因此也可以采用光学方法。本报告将描述一种低成本,但高精度,基于微处理器的光学字符识别(OCR)检查阅读器的设计。我们的OCR阅读器的性能受到许多因素的影响,主要是镜头系统产生的噪声和检查背景下的彩色图像。在本文中,我们描述了我们的软件解决方案如何缓解这些问题。由于速度是另一个需要考虑的问题,因此在识别算法的设计上需要特别注意,如避免浮点运算、硬件限制等。
{"title":"A microprocessor-based optical character recognition check reader","authors":"Francis Y. L. Chin, Francis Wu","doi":"10.1109/ICDAR.1995.602066","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.602066","url":null,"abstract":"Magnetic Ink Character Recognition (MICR) technology has widely been used for processing bank checks. Since the MICR character set is a special type font, and the ink is also readable by human being, optical approach can also be used. This report will describe the design of a low-cost, but highly accurate, microprocessor-based optical character recognition (OCR) check reader. The performance of our OCR reader is affected by a number of factors, mainly the noise generated by the lens system and the colour image at the check background. In this paper we describe how our software solution can alleviate these problems. As speed is another concern, special attention is paid to the design of recognition algorithm, such as the avoidance of floating point arithmetics, hardware limitations, etc.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121754232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A system for automatic Chinese seal imprint verification 中文印章自动检定系统
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.601982
Wen Gao, S. Dong, Xilin Chen
Chinese seal imprint verification by computer is very difficult, but is much needed by the application. An improved method for seal imprint verification based on stroke edge matching combined with image difference analysis is proposed. Experimental results show that the proposed approach is excellent in consistency, reliability and adaptability and is feasible for practical applications.
中文印鉴的计算机验证是一个非常困难的过程,但在实际应用中却是非常必要的。提出了一种基于笔画边缘匹配与图像差分分析相结合的改进印章印记验证方法。实验结果表明,该方法具有良好的一致性、可靠性和适应性,在实际应用中是可行的。
{"title":"A system for automatic Chinese seal imprint verification","authors":"Wen Gao, S. Dong, Xilin Chen","doi":"10.1109/ICDAR.1995.601982","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.601982","url":null,"abstract":"Chinese seal imprint verification by computer is very difficult, but is much needed by the application. An improved method for seal imprint verification based on stroke edge matching combined with image difference analysis is proposed. Experimental results show that the proposed approach is excellent in consistency, reliability and adaptability and is feasible for practical applications.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121894866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Character detection based on multi-scale measurement 基于多尺度测量的特征检测
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.601978
H. Hontani, S. Shimotsuji
The paper presents a new method for character string extraction that works efficiently even for a complicated geographical map. The concept of multi-scale measurement is introduced to achieve development of an efficient string detection technique. In this paper, scale means the size of the area where character candidates may exist. The proposed method first merges small black regions within a certain area into a mass. When the size of the area changes, the mass will change. The proposed method observes the change of a mass corresponding to the change of the size of the area, and searches for a stable mass as a character string. Multi-scale measurement enables the detection process to find the adequate size of an area to detect a string. Because a stable mass may include small figures, a test of the shape of a detected mass and a character recognition process follow to judge whether a mass forms a character string. If a mass is rejected, it is split into smaller masses according to the results of multi-scale measurement. These judgment and split processes are repeated to detect character strings from a pattern where several strings are written closely.
本文提出了一种新的字符串提取方法,即使对复杂的地理地图也能有效地进行提取。为了开发高效的管柱检测技术,引入了多尺度测量的概念。在本文中,尺度是指角色候选可能存在的区域的大小。该方法首先将特定区域内的小黑色区域合并成一个团块。当面积的大小改变时,质量也会改变。该方法通过观察质量随区域大小的变化而变化,并以字符串形式搜索稳定质量。多尺度测量使检测过程能够找到合适尺寸的区域来检测管柱。由于稳定的质量可能包含小的数字,因此需要对检测到的质量进行形状测试并进行字符识别处理,以判断质量是否形成字符串。如果一个质量被拒绝,则根据多尺度测量的结果将其分成更小的质量。重复这些判断和分割过程,以从几个字符串写得很近的模式中检测字符串。
{"title":"Character detection based on multi-scale measurement","authors":"H. Hontani, S. Shimotsuji","doi":"10.1109/ICDAR.1995.601978","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.601978","url":null,"abstract":"The paper presents a new method for character string extraction that works efficiently even for a complicated geographical map. The concept of multi-scale measurement is introduced to achieve development of an efficient string detection technique. In this paper, scale means the size of the area where character candidates may exist. The proposed method first merges small black regions within a certain area into a mass. When the size of the area changes, the mass will change. The proposed method observes the change of a mass corresponding to the change of the size of the area, and searches for a stable mass as a character string. Multi-scale measurement enables the detection process to find the adequate size of an area to detect a string. Because a stable mass may include small figures, a test of the shape of a detected mass and a character recognition process follow to judge whether a mass forms a character string. If a mass is rejected, it is split into smaller masses according to the results of multi-scale measurement. These judgment and split processes are repeated to detect character strings from a pattern where several strings are written closely.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124251932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A high quality vectorization combining local quality measures and global constraints 结合局部质量度量和全局约束的高质量矢量化
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.598986
M. Röösli, G. Monagan
We present a vectorization system to generate vector data which corresponds to the line structures of a raster image. The vector data consists of the primitives: "straight lane segment" and "circular arc". The vectorization system measures the quality of each primitive generated. Thus, the vectorization does not only produce high quality vector data, it also gives a precise description of the quality of the data generated. This is crucial if the requirements set by industrial applications are to be met. In order not to lose the quality of the vector data while constructing primitives into line objects, geometric constraints are incorporated already at the vectorization level: constraints like requiring segments to be parallel or perpendicular, circular arcs to be concentric, or tangents of the primitives to be equal at their connection point. After the constraints have been satisfied the resulting primitives still fulfil the quality requirements as before the constraints were imposed. The possibility to refit the generated vector data under adapted constraints allows for an efficient interactive postprocessing of the data.
我们提出了一个矢量化系统来生成与光栅图像的线结构相对应的矢量数据。向量数据由“直线段”和“圆弧”这两个原语组成。矢量化系统测量生成的每个原语的质量。因此,矢量化不仅可以产生高质量的矢量数据,还可以精确描述生成的数据的质量。如果要满足工业应用设定的要求,这一点至关重要。为了在将原语构建为线对象时不失去矢量数据的质量,几何约束已经在矢量化级别中加入:诸如要求线段平行或垂直,圆弧同心,或原语的切线在连接点相等的约束。在满足约束条件之后,结果原语仍然像施加约束条件之前一样满足质量要求。在适应的约束下重新构造生成的矢量数据的可能性允许对数据进行有效的交互式后处理。
{"title":"A high quality vectorization combining local quality measures and global constraints","authors":"M. Röösli, G. Monagan","doi":"10.1109/ICDAR.1995.598986","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.598986","url":null,"abstract":"We present a vectorization system to generate vector data which corresponds to the line structures of a raster image. The vector data consists of the primitives: \"straight lane segment\" and \"circular arc\". The vectorization system measures the quality of each primitive generated. Thus, the vectorization does not only produce high quality vector data, it also gives a precise description of the quality of the data generated. This is crucial if the requirements set by industrial applications are to be met. In order not to lose the quality of the vector data while constructing primitives into line objects, geometric constraints are incorporated already at the vectorization level: constraints like requiring segments to be parallel or perpendicular, circular arcs to be concentric, or tangents of the primitives to be equal at their connection point. After the constraints have been satisfied the resulting primitives still fulfil the quality requirements as before the constraints were imposed. The possibility to refit the generated vector data under adapted constraints allows for an efficient interactive postprocessing of the data.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"435 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126104624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Extracting individual features from moments for Chinese writer identification 基于时刻特征的中文作者识别
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.599030
Cheng-Lin Liu, Ru-Wei Dai, Ying-Jian Liu
To solve the problem of writer identification (WI) with indeterminate classes (writers) and objects (characters), it is a good way to extract individual features with clear physical meanings and small dynamic ranges. In this paper, a new method named Moment-Based Feature Method to identify Chinese writers is presented in which normalized individual features are derived from geometric moments of character images. The extracted features are invariant under translation, scaling, and stroke-width. They are explicitly corresponding to human perception of shape and distribute their values in small dynamic ranges. Experiments of writer recognition and verification are implemented to demonstrate the efficiency of this method and promising results have been achieved.
为了解决类(写作者)和对象(字符)不确定的写作者识别(WI)问题,提取物理意义明确、动态范围小的个体特征是一种很好的方法。本文提出了一种基于矩的汉字特征识别方法,该方法从汉字图像的几何矩中提取归一化的个体特征。提取的特征在平移、缩放和描边宽度下都是不变的。它们明确地对应于人类对形状的感知,并将它们的值分布在小的动态范围内。作者识别与验证实验验证了该方法的有效性,并取得了良好的效果。
{"title":"Extracting individual features from moments for Chinese writer identification","authors":"Cheng-Lin Liu, Ru-Wei Dai, Ying-Jian Liu","doi":"10.1109/ICDAR.1995.599030","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.599030","url":null,"abstract":"To solve the problem of writer identification (WI) with indeterminate classes (writers) and objects (characters), it is a good way to extract individual features with clear physical meanings and small dynamic ranges. In this paper, a new method named Moment-Based Feature Method to identify Chinese writers is presented in which normalized individual features are derived from geometric moments of character images. The extracted features are invariant under translation, scaling, and stroke-width. They are explicitly corresponding to human perception of shape and distribute their values in small dynamic ranges. Experiments of writer recognition and verification are implemented to demonstrate the efficiency of this method and promising results have been achieved.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129411856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A system for scanning and segmenting cursively handwritten words into basic strokes 一种扫描和分割草书手写文字为基本笔画的系统
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602082
C. Privitera, R. Plamondon
This paper presents a segmentation method that partly mimics the cognitive-behavioral process used by human subjects to recover motor-temporal information from the image of a handwritten word. The approach does not exploit any thinning procedure, but rather a different typology of information is manipulated concerning the curvature of the word contour. Starting from the maximum curvature points roughly corresponding to the beginning of a stroke, the algorithm scans the word, following the natural course of the line and attempts to repeat the same movement as executed by the writer during the generation of the word. At each maximum curvature point, the line is segmented and reconstructed by a smooth interpolation of the most interior points belonging to the line just covered. At the end of the scanning process, a temporal sequence of motor strokes is obtained which plausibly composes the original intended movement.
本文提出了一种部分模仿人类受试者从手写文字图像中恢复运动时间信息的认知行为过程的分割方法。该方法不利用任何细化过程,而是操纵关于单词轮廓曲率的不同类型的信息。该算法从大致对应笔画开始的最大曲率点开始扫描单词,遵循线条的自然轨迹,并尝试重复作者在生成单词时所执行的相同运动。在每个最大曲率点处,通过对属于刚刚覆盖的线的最内部点进行平滑插值来分割和重建该线。在扫描过程结束时,获得的时间序列的运动冲程似乎组成了原来的预期运动。
{"title":"A system for scanning and segmenting cursively handwritten words into basic strokes","authors":"C. Privitera, R. Plamondon","doi":"10.1109/ICDAR.1995.602082","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.602082","url":null,"abstract":"This paper presents a segmentation method that partly mimics the cognitive-behavioral process used by human subjects to recover motor-temporal information from the image of a handwritten word. The approach does not exploit any thinning procedure, but rather a different typology of information is manipulated concerning the curvature of the word contour. Starting from the maximum curvature points roughly corresponding to the beginning of a stroke, the algorithm scans the word, following the natural course of the line and attempts to repeat the same movement as executed by the writer during the generation of the word. At each maximum curvature point, the line is segmented and reconstructed by a smooth interpolation of the most interior points belonging to the line just covered. At the end of the scanning process, a temporal sequence of motor strokes is obtained which plausibly composes the original intended movement.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128491004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A simplified attributed graph grammar for high-level music recognition 用于高级音乐识别的简化属性图语法
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602096
S. Baumann
This paper describes a simplified attributed programmed graph grammar to represent and process a-priori knowledge about common music notation. The presented approach serves as a high-level recognition stage and is interlocked to previous low-level recognition phases in our entire optical music recognition system (DOREMIDI++). The implemented grammar rules and control diagrams describe a declarative knowledge base to drive a transformation algorithm. This transformation converts the results of symbol recognition stages to a symbolic representation of the musical score.
本文描述了一种简化的属性程序化图语法,用于表示和处理普通乐谱的先验知识。所提出的方法作为一个高级识别阶段,并在我们的整个光学音乐识别系统(doremid++)中与先前的低级识别阶段互锁。实现的语法规则和控制图描述了用于驱动转换算法的声明性知识库。这种转换将符号识别阶段的结果转换为乐谱的符号表示。
{"title":"A simplified attributed graph grammar for high-level music recognition","authors":"S. Baumann","doi":"10.1109/ICDAR.1995.602096","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.602096","url":null,"abstract":"This paper describes a simplified attributed programmed graph grammar to represent and process a-priori knowledge about common music notation. The presented approach serves as a high-level recognition stage and is interlocked to previous low-level recognition phases in our entire optical music recognition system (DOREMIDI++). The implemented grammar rules and control diagrams describe a declarative knowledge base to drive a transformation algorithm. This transformation converts the results of symbol recognition stages to a symbolic representation of the musical score.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129835090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Automatic text skew estimation in document images 文档图像中的自动文本倾斜估计
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.602126
Su S. Chen, R. Haralick, I. T. Phillips
This paper describes an algorithm to estimate the text skew angle in a document image. The algorithm utilizes the recursive morphological transforms and yields accurate estimates of text skew angles on a large document image data set. The algorithm computes the optimal parameter settings on the fly without any human interaction. In this automatic mode, experimental results indicate that the algorithm generates estimated text skew angles within 0.5/spl deg/ of the true text skew angles with a probability of 99%. To process a 300 dpi document image, the algorithm takes 10 seconds on SUN Sparc 10 machines.
本文描述了一种估计文档图像中文本倾斜角度的算法。该算法利用递归形态学变换,在大型文档图像数据集上产生准确的文本倾斜角度估计。该算法在没有任何人工干预的情况下动态计算最佳参数设置。在这种自动模式下,实验结果表明,算法生成的估计文本偏斜角度在真实文本偏斜角度的0.5/spl度/范围内,概率为99%。要处理一张300 dpi的文档图像,该算法在SUN Sparc 10机器上需要10秒。
{"title":"Automatic text skew estimation in document images","authors":"Su S. Chen, R. Haralick, I. T. Phillips","doi":"10.1109/ICDAR.1995.602126","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.602126","url":null,"abstract":"This paper describes an algorithm to estimate the text skew angle in a document image. The algorithm utilizes the recursive morphological transforms and yields accurate estimates of text skew angles on a large document image data set. The algorithm computes the optimal parameter settings on the fly without any human interaction. In this automatic mode, experimental results indicate that the algorithm generates estimated text skew angles within 0.5/spl deg/ of the true text skew angles with a probability of 99%. To process a 300 dpi document image, the algorithm takes 10 seconds on SUN Sparc 10 machines.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128912150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Mathematics recognition using graph rewriting 使用图形重写的数学识别
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.599026
Ann Grbavec, D. Blostein
This paper investigates graph rewriting as a tool for high-level recognition of two-dimensional mathematical notation. "High-level recognition" is the process of determining the meaning of a diagram from the output of a symbol recognizer. Characteristic problems of high-level mathematics recognition include: determining the groupings of symbols into recursive subexpressions and resolving ambiguities that depend upon global context. Our graph-rewriting approach uses knowledge of the notational conventions of mathematics, such as operator precedence and operator range, more effectively than syntactic or previous structural methods. Graph rewriting offers a flexible formalism with a strong theoretical foundation for manipulating two-dimensional patterns. It has been shown to be a useful technique for high-level recognition of circuit diagrams and musical scores. By demonstrating a graph-rewriting strategy for mathematics recognition, this paper provides further evidence for graph rewriting as a general tool for diagram recognition, and identifies some of the issues that must be considered as this potential is explored.
本文研究了图形重写作为二维数学符号高级识别的一种工具。“高级识别”是根据符号识别器的输出确定图表含义的过程。高级数学识别的特征问题包括:确定符号的递归子表达式的分组和解决依赖于全局上下文的歧义。我们的图重写方法使用了数学符号约定的知识,例如运算符优先级和运算符范围,比语法方法或以前的结构方法更有效。图形重写为操作二维模式提供了灵活的形式和强大的理论基础。它已被证明是一种有用的技术,用于高级识别电路图和乐谱。通过展示用于数学识别的图形重写策略,本文为图形重写作为图形识别的通用工具提供了进一步的证据,并确定了在探索这种潜力时必须考虑的一些问题。
{"title":"Mathematics recognition using graph rewriting","authors":"Ann Grbavec, D. Blostein","doi":"10.1109/ICDAR.1995.599026","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.599026","url":null,"abstract":"This paper investigates graph rewriting as a tool for high-level recognition of two-dimensional mathematical notation. \"High-level recognition\" is the process of determining the meaning of a diagram from the output of a symbol recognizer. Characteristic problems of high-level mathematics recognition include: determining the groupings of symbols into recursive subexpressions and resolving ambiguities that depend upon global context. Our graph-rewriting approach uses knowledge of the notational conventions of mathematics, such as operator precedence and operator range, more effectively than syntactic or previous structural methods. Graph rewriting offers a flexible formalism with a strong theoretical foundation for manipulating two-dimensional patterns. It has been shown to be a useful technique for high-level recognition of circuit diagrams and musical scores. By demonstrating a graph-rewriting strategy for mathematics recognition, this paper provides further evidence for graph rewriting as a general tool for diagram recognition, and identifies some of the issues that must be considered as this potential is explored.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116510208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
False hits of tri-syllabic queries in a Chinese signature file 中文签名文件中三音节查询错误命中
Pub Date : 1995-08-14 DOI: 10.1109/ICDAR.1995.598966
Tyne Liang, Suh-Yin Lee, Wei-Pang Yang
In the application of the superimposed coding method to character-based Chinese text retrieval we find two kinds of false hits for a multi-syllabic (multicharacter) query. The first type is a random false hit (RFH) which is due to accidental setting of bits by irrelevant characters in a document signature. The other type is an adjacency false hit (AFH) which is due to the loss of character sequence information in signature creation. Since many query terms are proper nouns and Chinese names which often contain three characters (tri-syllabic), we derive a formula to estimate the RFH for trisyllabic queries. As for the AFH which cannot be reduced by single character (monogram) hashing method, a method which hashes consecutive character pairs (bigram) is designed to reduce both the AFH and the RFH. We find that there exists an optimal weight assignment for a minimal false hit rate in a combined scheme which encodes both monogram and bigram keys in document signatures.
在将叠加编码方法应用于基于字符的中文文本检索中,对一个多音节(多字符)查询发现了两种错误命中。第一种类型是随机错误命中(RFH),这是由于文档签名中不相关的字符偶然设置了位。另一种类型是邻接错误命中(AFH),这是由于签名创建过程中字符序列信息的丢失。由于许多查询词是专有名词和中文名称,通常包含三个字符(三音节),我们推导了一个公式来估计三音节查询的RFH。针对单字符(字母组合)哈希法无法减少的AFH,设计了一种对连续字符对(双字符)进行哈希的方法来同时减少AFH和RFH。我们发现在一个同时编码字母组合和双字母组合密钥的文件签名方案中存在一个最小错误命中率的最优权值分配。
{"title":"False hits of tri-syllabic queries in a Chinese signature file","authors":"Tyne Liang, Suh-Yin Lee, Wei-Pang Yang","doi":"10.1109/ICDAR.1995.598966","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.598966","url":null,"abstract":"In the application of the superimposed coding method to character-based Chinese text retrieval we find two kinds of false hits for a multi-syllabic (multicharacter) query. The first type is a random false hit (RFH) which is due to accidental setting of bits by irrelevant characters in a document signature. The other type is an adjacency false hit (AFH) which is due to the loss of character sequence information in signature creation. Since many query terms are proper nouns and Chinese names which often contain three characters (tri-syllabic), we derive a formula to estimate the RFH for trisyllabic queries. As for the AFH which cannot be reduced by single character (monogram) hashing method, a method which hashes consecutive character pairs (bigram) is designed to reduce both the AFH and the RFH. We find that there exists an optimal weight assignment for a minimal false hit rate in a combined scheme which encodes both monogram and bigram keys in document signatures.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125858557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of 3rd International Conference on Document Analysis and Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1