与嵌入词同床共枕？用于语料库辅助话语分析的搭配和词嵌入比较

IF 4.3 Applied Corpus Linguistics Pub Date : 2024-12-01 Epub Date: 2024-11-13 DOI:10.1016/j.acorp.2024.100117

Jordan Batchelor

{"title":"与嵌入词同床共枕？用于语料库辅助话语分析的搭配和词嵌入比较","authors":"Jordan Batchelor","doi":"10.1016/j.acorp.2024.100117","DOIUrl":null,"url":null,"abstract":"<div><div>This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100117"},"PeriodicalIF":4.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis\",\"authors\":\"Jordan Batchelor\",\"doi\":\"10.1016/j.acorp.2024.100117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.</div></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"4 3\",\"pages\":\"Article 100117\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799124000340\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799124000340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/13 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文讨论了识别话语中词汇模式的两种方法，即语料库语言学的搭配分析法和自然语言处理的词嵌入法。虽然这两种方法都能识别词汇模式，但它们处理任务的基本框架不同，其结果的相似程度也没有直接比较过。本研究使用两个语料库、五种搭配测量方法和两种词嵌入算法来进行这种比较。研究结果普遍支持这样的观点，即许多具有相似嵌入的词对都是搭配词，其次，许多搭配词也具有相似的词嵌入。然而，一个主要区别是，具有相似嵌入词的词对不需要经常或根本不需要共同出现。此外，我们还发现两种词嵌入算法所突出的词的种类存在系统性差异，并对此进行了讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis

This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊