与嵌入词同床共枕?用于语料库辅助话语分析的搭配和词嵌入比较

Jordan Batchelor
{"title":"与嵌入词同床共枕?用于语料库辅助话语分析的搭配和词嵌入比较","authors":"Jordan Batchelor","doi":"10.1016/j.acorp.2024.100117","DOIUrl":null,"url":null,"abstract":"<div><div>This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100117"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis\",\"authors\":\"Jordan Batchelor\",\"doi\":\"10.1016/j.acorp.2024.100117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.</div></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"4 3\",\"pages\":\"Article 100117\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799124000340\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799124000340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文讨论了识别话语中词汇模式的两种方法,即语料库语言学的搭配分析法和自然语言处理的词嵌入法。虽然这两种方法都能识别词汇模式,但它们处理任务的基本框架不同,其结果的相似程度也没有直接比较过。本研究使用两个语料库、五种搭配测量方法和两种词嵌入算法来进行这种比较。研究结果普遍支持这样的观点,即许多具有相似嵌入的词对都是搭配词,其次,许多搭配词也具有相似的词嵌入。然而,一个主要区别是,具有相似嵌入词的词对不需要经常或根本不需要共同出现。此外,我们还发现两种词嵌入算法所突出的词的种类存在系统性差异,并对此进行了讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis
This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Corpus Linguistics
Applied Corpus Linguistics Linguistics and Language
CiteScore
1.30
自引率
0.00%
发文量
0
审稿时长
70 days
期刊最新文献
‘I am still unsure…’ – Spontaneous expressions of vaccine indecision on Mumsnet How humans and machines identify discourse topics: A methodological triangulation Anywhere but here: Discourses and representations surrounding same-sex marriage in Japanese newspapers Is LIWC reliable, efficient, and effective for the analysis of large online datasets in forensic and security contexts? The personal_relationship frame in love fraud
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1