An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles

Aradhana Kar, S. Pradhan
{"title":"An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles","authors":"Aradhana Kar, S. Pradhan","doi":"10.1109/CINE56307.2022.10037532","DOIUrl":null,"url":null,"abstract":"This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.","PeriodicalId":336238,"journal":{"name":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINE56307.2022.10037532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于四分位数的Odia文本线段分词方法
本文处理给定线段的分词问题。这些线段可以在单个线段中包含字母和符号,也可以在两个不同的线段中包含一行文本的字母和符号。将行文本分割成字母,并使用重构模块重构两个不同线段中的相关矩阵。本文介绍的方法分为三个阶段:Pre_Processing模块、Find_White_Spaces模块和Analyse_White_Spaces模块。Pre_Processing模块负责读取输入线段,将其转换为灰色图像,删除封装整个文本的空白,然后将其转换为二值图像。Find_White_Spaces模块负责查找单词之间空白的开始和结束。Analyse_White_Spaces模块负责使用四分位数分析空白的宽度,并将分割的单词存储在目录“segmented words”中。所提出的系统已经用仅由字母组成的线段图像和带matras的字母图像进行了测试。该方法在分词方面的总体正确率达到99.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
EEG-Based Brain Computer Interface for Emotion Recognition Breast Cancer Prediction Using Long Short-Term Memory Algorithm Improving Learner's Comprehension Using Entailment-Based Question Generation Application of a Novel Deep Fuzzy Dual Support Vector Regression Machine in Stock Price Prediction A Lightweight DoS and DDoS Attack Detection Mechanism-Based on Deep Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1