从跨语言同步多义词数据中推断非同步语义图的因果关系

IF 2 Q2 COMMUNICATION Frontiers in Communication Pub Date : 2024-01-11 DOI:10.3389/fcomm.2023.1288196

Johannes Dellert

{"title":"从跨语言同步多义词数据中推断非同步语义图的因果关系","authors":"Johannes Dellert","doi":"10.3389/fcomm.2023.1288196","DOIUrl":null,"url":null,"abstract":"Semantic maps are used in lexical typology to summarize cross-linguistic implicational universals of co-expression between meanings in a domain. They are defined as networks which, using as few links as possible, connect the meanings so that every isolectic set (i.e., set of meanings that can be expressed by the same word in some language) forms a connected component. Due to the close connection between synchronic polysemies and semantic change, semantic maps are often interpreted diachronically as encoding potential pathways of semantic extension. While semantic maps are traditionally generated by hand, there have been attempts to automate this complex and non-deterministic process. I explore the problem from a new algorithmic angle by casting it in the framework of causal discovery, a field which explores the possibility of automatically inferring causal structures from observational data. I show that a standard causal inference algorithm can be used to reduce cross-linguistic polysemy data into minimal network structures which explain the observed polysemies. If the algorithm makes its link deletion decisions on the basis of the connected component criterion, the skeleton of the resulting causal structure is a synchronic semantic map. The arrows which are added to some links in the second stage can be interpreted as expressing the main tendencies of semantic extension. Much of the existing literature on semantic maps implicitly assumes that the data from the languages under analysis is correct and complete, whereas in reality, semantic map research is riddled by data quality and sparseness problems. To quantify the uncertainty inherent in the inferred diachronic semantic maps, I rely on bootstrapping on the language level to model the uncertainty caused by the given language sample, as well as on random link processing orders to explore the space of possible semantic maps for a given input. The maps inferred from the samples are then summarized into a consensus network where every link and arrow receives a confidence value. In experiments on cross-linguistic polysemy data of varying shapes, the resulting confidence values are found to mostly agree with previously published results, though challenges in directionality inference remain.","PeriodicalId":31739,"journal":{"name":"Frontiers in Communication","volume":"2 8","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Causal inference of diachronic semantic maps from cross-linguistic synchronic polysemy data\",\"authors\":\"Johannes Dellert\",\"doi\":\"10.3389/fcomm.2023.1288196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic maps are used in lexical typology to summarize cross-linguistic implicational universals of co-expression between meanings in a domain. They are defined as networks which, using as few links as possible, connect the meanings so that every isolectic set (i.e., set of meanings that can be expressed by the same word in some language) forms a connected component. Due to the close connection between synchronic polysemies and semantic change, semantic maps are often interpreted diachronically as encoding potential pathways of semantic extension. While semantic maps are traditionally generated by hand, there have been attempts to automate this complex and non-deterministic process. I explore the problem from a new algorithmic angle by casting it in the framework of causal discovery, a field which explores the possibility of automatically inferring causal structures from observational data. I show that a standard causal inference algorithm can be used to reduce cross-linguistic polysemy data into minimal network structures which explain the observed polysemies. If the algorithm makes its link deletion decisions on the basis of the connected component criterion, the skeleton of the resulting causal structure is a synchronic semantic map. The arrows which are added to some links in the second stage can be interpreted as expressing the main tendencies of semantic extension. Much of the existing literature on semantic maps implicitly assumes that the data from the languages under analysis is correct and complete, whereas in reality, semantic map research is riddled by data quality and sparseness problems. To quantify the uncertainty inherent in the inferred diachronic semantic maps, I rely on bootstrapping on the language level to model the uncertainty caused by the given language sample, as well as on random link processing orders to explore the space of possible semantic maps for a given input. The maps inferred from the samples are then summarized into a consensus network where every link and arrow receives a confidence value. In experiments on cross-linguistic polysemy data of varying shapes, the resulting confidence values are found to mostly agree with previously published results, though challenges in directionality inference remain.\",\"PeriodicalId\":31739,\"journal\":{\"name\":\"Frontiers in Communication\",\"volume\":\"2 8\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fcomm.2023.1288196\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomm.2023.1288196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMMUNICATION","Score":null,"Total":0}

引用次数: 0

摘要

语义图在词汇类型学中被用来总结一个领域中意义之间共同表达的跨语言隐含普遍性。语义图被定义为网络，使用尽可能少的链接将意义连接起来，使每一个等折中集（即在某种语言中可以用同一个词表达的意义集）形成一个相连的组成部分。由于同步多义词与语义变化之间的密切联系，语义图通常被解释为编码语义扩展的潜在路径的非同步图。虽然语义图传统上都是手工生成的，但也有人尝试将这一复杂而非确定的过程自动化。我从一个新的算法角度探讨了这个问题，将其置于因果发现的框架中，这一领域探讨了从观察数据中自动推断因果结构的可能性。我的研究表明，标准的因果推理算法可以用来将跨语言多义词数据还原为最小网络结构，从而解释观察到的多义词。如果该算法根据连接成分标准做出删除链接的决定，那么得到的因果结构的骨架就是同步语义图。第二阶段中添加到某些链接上的箭头可以解释为表达了语义扩展的主要趋势。现有的许多语义图谱文献都隐含地假定了所分析语言的数据是正确和完整的，而实际上，语义图谱研究充满了数据质量和稀缺性问题。为了量化推断出的异时语义图中固有的不确定性，我依靠语言层面的引导来模拟给定语言样本造成的不确定性，并依靠随机链接处理顺序来探索给定输入的可能语义图空间。然后将从样本中推断出的语义图汇总到共识网络中，其中每个链接和箭头都有一个置信度值。在对不同形状的跨语言多义词数据进行的实验中，发现所得出的置信度值与之前公布的结果基本一致，但在方向性推断方面仍存在挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Causal inference of diachronic semantic maps from cross-linguistic synchronic polysemy data

Semantic maps are used in lexical typology to summarize cross-linguistic implicational universals of co-expression between meanings in a domain. They are defined as networks which, using as few links as possible, connect the meanings so that every isolectic set (i.e., set of meanings that can be expressed by the same word in some language) forms a connected component. Due to the close connection between synchronic polysemies and semantic change, semantic maps are often interpreted diachronically as encoding potential pathways of semantic extension. While semantic maps are traditionally generated by hand, there have been attempts to automate this complex and non-deterministic process. I explore the problem from a new algorithmic angle by casting it in the framework of causal discovery, a field which explores the possibility of automatically inferring causal structures from observational data. I show that a standard causal inference algorithm can be used to reduce cross-linguistic polysemy data into minimal network structures which explain the observed polysemies. If the algorithm makes its link deletion decisions on the basis of the connected component criterion, the skeleton of the resulting causal structure is a synchronic semantic map. The arrows which are added to some links in the second stage can be interpreted as expressing the main tendencies of semantic extension. Much of the existing literature on semantic maps implicitly assumes that the data from the languages under analysis is correct and complete, whereas in reality, semantic map research is riddled by data quality and sparseness problems. To quantify the uncertainty inherent in the inferred diachronic semantic maps, I rely on bootstrapping on the language level to model the uncertainty caused by the given language sample, as well as on random link processing orders to explore the space of possible semantic maps for a given input. The maps inferred from the samples are then summarized into a consensus network where every link and arrow receives a confidence value. In experiments on cross-linguistic polysemy data of varying shapes, the resulting confidence values are found to mostly agree with previously published results, though challenges in directionality inference remain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊