Automatic Construction of Interval-Valued Fuzzy Hindi WordNet using Lexico-Syntactic Patterns and Word Embeddings

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-02-02 DOI:10.1145/3643132
Minni Jain, Rajni Jindal, Amita Jain
{"title":"Automatic Construction of Interval-Valued Fuzzy Hindi WordNet using Lexico-Syntactic Patterns and Word Embeddings","authors":"Minni Jain, Rajni Jindal, Amita Jain","doi":"10.1145/3643132","DOIUrl":null,"url":null,"abstract":"<p>A computational lexicon is the backbone of any language processing system. It helps computers to understand the language complexity as a human does by inculcating words and their semantic associations. Manually constructed famous Hindi WordNet (HWN) consists of various classical semantic relations (crisp relations). To handle uncertainty and represent Hindi WordNet more semantically, Type- 1 fuzzy graphs are applied to relations of Hindi WordNet. But uncertainty in the crisp membership degree is not considered in Type 1 fuzzy set (T1FS). Also collecting billions (5,55,69,51,753 relations in HWN) of membership values from experts (humans) is not feasible. This paper applied the concept of Interval-Valued Fuzzy graphs and proposed Interval- Valued Fuzzy Hindi WordNet (IVFHWN). IVFHWN automatically identifies Interval- Valued Fuzzy relations between words and their degree of membership using word embeddings and lexico-syntactic patterns. The experimental results for the word sense disambiguation problem show better outcomes when IVFHWN is being used in place of Type 1 Fuzzy Hindi WordNet and classical Hindi WordNet.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"42 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643132","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

A computational lexicon is the backbone of any language processing system. It helps computers to understand the language complexity as a human does by inculcating words and their semantic associations. Manually constructed famous Hindi WordNet (HWN) consists of various classical semantic relations (crisp relations). To handle uncertainty and represent Hindi WordNet more semantically, Type- 1 fuzzy graphs are applied to relations of Hindi WordNet. But uncertainty in the crisp membership degree is not considered in Type 1 fuzzy set (T1FS). Also collecting billions (5,55,69,51,753 relations in HWN) of membership values from experts (humans) is not feasible. This paper applied the concept of Interval-Valued Fuzzy graphs and proposed Interval- Valued Fuzzy Hindi WordNet (IVFHWN). IVFHWN automatically identifies Interval- Valued Fuzzy relations between words and their degree of membership using word embeddings and lexico-syntactic patterns. The experimental results for the word sense disambiguation problem show better outcomes when IVFHWN is being used in place of Type 1 Fuzzy Hindi WordNet and classical Hindi WordNet.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用词典句法模式和词语嵌入自动构建区间值模糊印地语词网
计算词典是任何语言处理系统的支柱。它通过灌输单词及其语义关联,帮助计算机像人类一样理解语言的复杂性。人工构建的著名印地语词网(HWN)由各种经典语义关系(清晰关系)组成。为了处理不确定性并更语义化地表示印地语 WordNet,对印地语 WordNet 的关系应用了 1 类模糊图。但在 1 类模糊集(T1FS)中没有考虑清晰成员度的不确定性。此外,从专家(人类)那里收集数十亿(HWN 中的 5,55,69,51,753 个关系)的成员值也不可行。本文应用了区间值模糊图的概念,并提出了区间值模糊印地语词网(IVFHWN)。IVFHWN 利用词嵌入和词义句法模式自动识别词与词之间的区间值模糊关系及其成员度。词义消歧问题的实验结果表明,用 IVFHWN 代替第一类模糊印地语词网和经典印地语词网时,效果更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.60
自引率
15.00%
发文量
241
期刊介绍: The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.
期刊最新文献
Study on Intelligent Scoring of English Composition Based on Machine Learning from the Perspective of Natural Language Processing FedREAS: A Robust Efficient Aggregation and Selection Framework for Federated Learning X-Phishing-Writer: A Framework for Cross-Lingual Phishing Email Generation Automatic Algerian Sarcasm Detection from Texts and Images KannadaLex: A lexical database with psycholinguistic information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1