快速循环神经网络与 Bi-LSTM 在 NLP 中用于泰米尔语手写文本分割

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-02-07 DOI:10.1145/3643808
C. Vinotheni, Lakshmana Pandian S.
{"title":"快速循环神经网络与 Bi-LSTM 在 NLP 中用于泰米尔语手写文本分割","authors":"C. Vinotheni, Lakshmana Pandian S.","doi":"10.1145/3643808","DOIUrl":null,"url":null,"abstract":"<p>Tamil text segmentation is a long-standing test in language comprehension that entails separating a record into adjacent pieces based on its semantic design. Each segment is important in its own way. The segments are organised according to the purpose of the content examination as text groups, sentences, phrases, words, characters or any other data unit. That process has been portioned using rapid tangled neural organisation in this research, which presents content segmentation methods based on deep learning in natural language processing (NLP). This study proposes a bidirectional long short-term memory (Bi-LSTM) neural network prototype in which fast recurrent neural network (FRNN) are used to learn Tamil text group embedding and phrases are fragmented using text-oriented data. As a result, this prototype is capable of handling variable measured setting data and gives a vast new dataset for naturally segmenting text in Tamil. In addition, we develop a segmentation prototype and show how well it sums up to unnoticeable regular content using this dataset as a base. With Bi-LSTM, the segmentation precision of FRNN is superior to that of other segmentation approaches; however, it is still inferior to that of certain other techniques. Every content is scaled to the required size in the proposed framework, which is immediately accessible for the preparation. This means, each word in a scaled Tamil text is employed to prepare neural organisation as fragmented content. The results reveal that the proposed framework produces high rates of segmentation for manually authored material that are nearly equivalent to segmentation-based plans.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast Recurrent Neural Network with Bi-LSTM for Handwritten Tamil text segmentation in NLP\",\"authors\":\"C. Vinotheni, Lakshmana Pandian S.\",\"doi\":\"10.1145/3643808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Tamil text segmentation is a long-standing test in language comprehension that entails separating a record into adjacent pieces based on its semantic design. Each segment is important in its own way. The segments are organised according to the purpose of the content examination as text groups, sentences, phrases, words, characters or any other data unit. That process has been portioned using rapid tangled neural organisation in this research, which presents content segmentation methods based on deep learning in natural language processing (NLP). This study proposes a bidirectional long short-term memory (Bi-LSTM) neural network prototype in which fast recurrent neural network (FRNN) are used to learn Tamil text group embedding and phrases are fragmented using text-oriented data. As a result, this prototype is capable of handling variable measured setting data and gives a vast new dataset for naturally segmenting text in Tamil. In addition, we develop a segmentation prototype and show how well it sums up to unnoticeable regular content using this dataset as a base. With Bi-LSTM, the segmentation precision of FRNN is superior to that of other segmentation approaches; however, it is still inferior to that of certain other techniques. Every content is scaled to the required size in the proposed framework, which is immediately accessible for the preparation. This means, each word in a scaled Tamil text is employed to prepare neural organisation as fragmented content. The results reveal that the proposed framework produces high rates of segmentation for manually authored material that are nearly equivalent to segmentation-based plans.</p>\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3643808\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643808","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

泰米尔语文本分段是语言理解中一项历史悠久的测试,需要根据语义设计将记录分成相邻的片段。每个片段都有其自身的重要性。这些片段根据内容检查的目的组织成文本组、句子、短语、单词、字符或任何其他数据单元。本研究利用快速纠缠神经组织对这一过程进行了分段,提出了基于自然语言处理(NLP)中深度学习的内容分段方法。本研究提出了一种双向长短期记忆(Bi-LSTM)神经网络原型,其中使用了快速循环神经网络(FRNN)来学习泰米尔语文本组嵌入,并使用面向文本的数据对短语进行分割。因此,该原型能够处理可变的测量设置数据,并为自然分割泰米尔语文本提供了一个庞大的新数据集。此外,我们还开发了一个分段原型,并以该数据集为基础,展示了它对不易察觉的常规内容的总结效果。在使用 Bi-LSTM 的情况下,FRNN 的分割精度优于其他分割方法,但仍低于某些其他技术。在所提出的框架中,每个内容都被缩放为所需的大小,可立即用于准备工作。这意味着,缩放的泰米尔语文本中的每个单词都会被用作神经组织的片段内容。结果表明,对于人工撰写的材料,建议的框架能产生很高的分割率,几乎等同于基于分割的计划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fast Recurrent Neural Network with Bi-LSTM for Handwritten Tamil text segmentation in NLP

Tamil text segmentation is a long-standing test in language comprehension that entails separating a record into adjacent pieces based on its semantic design. Each segment is important in its own way. The segments are organised according to the purpose of the content examination as text groups, sentences, phrases, words, characters or any other data unit. That process has been portioned using rapid tangled neural organisation in this research, which presents content segmentation methods based on deep learning in natural language processing (NLP). This study proposes a bidirectional long short-term memory (Bi-LSTM) neural network prototype in which fast recurrent neural network (FRNN) are used to learn Tamil text group embedding and phrases are fragmented using text-oriented data. As a result, this prototype is capable of handling variable measured setting data and gives a vast new dataset for naturally segmenting text in Tamil. In addition, we develop a segmentation prototype and show how well it sums up to unnoticeable regular content using this dataset as a base. With Bi-LSTM, the segmentation precision of FRNN is superior to that of other segmentation approaches; however, it is still inferior to that of certain other techniques. Every content is scaled to the required size in the proposed framework, which is immediately accessible for the preparation. This means, each word in a scaled Tamil text is employed to prepare neural organisation as fragmented content. The results reveal that the proposed framework produces high rates of segmentation for manually authored material that are nearly equivalent to segmentation-based plans.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.60
自引率
15.00%
发文量
241
期刊介绍: The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.
期刊最新文献
Learning and Vision-based approach for Human fall detection and classification in naturally occurring scenes using video data A DENSE SPATIAL NETWORK MODEL FOR EMOTION RECOGNITION USING LEARNING APPROACHES CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data TRGCN: A Prediction Model for Information Diffusion Based on Transformer and Relational Graph Convolutional Network Adaptive Semantic Information Extraction of Tibetan Opera Mask with Recall Loss
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1