关键词提取中不同嵌入方法的比较评价

2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) Pub Date : 2023-06-08 DOI:10.1109/HORA58378.2023.10156762

Ghaith Ashqar, Alev Mutlu

{"title":"关键词提取中不同嵌入方法的比较评价","authors":"Ghaith Ashqar, Alev Mutlu","doi":"10.1109/HORA58378.2023.10156762","DOIUrl":null,"url":null,"abstract":"Automatic keyword extraction from a text document is the problem of identifying in-text words or phrases that best describe the content of the text document. Recently, word embeddings found application in keyword extraction as they improve the performance by incorporating semantic information. In this study, we focus various embeddings and and compare their performance in keyword extraction. To this aim, firstly, we modified a keyword extraction system called KeyBERT to work with different embeddings. Then, we run the modfied application using ten models on seven benchmark datasets. The experimental findings show that all-mpnet-base-v2 achieved statistically better results over the other models in precision, recall, and F1 score. Moreover, all-mpnet-base-v2 achieved highest scores for MAP and MRR and also retrieved the most number of relevant keywords on the average.","PeriodicalId":247679,"journal":{"name":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Assessment of Various Embeddings for Keyword Extraction\",\"authors\":\"Ghaith Ashqar, Alev Mutlu\",\"doi\":\"10.1109/HORA58378.2023.10156762\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic keyword extraction from a text document is the problem of identifying in-text words or phrases that best describe the content of the text document. Recently, word embeddings found application in keyword extraction as they improve the performance by incorporating semantic information. In this study, we focus various embeddings and and compare their performance in keyword extraction. To this aim, firstly, we modified a keyword extraction system called KeyBERT to work with different embeddings. Then, we run the modfied application using ten models on seven benchmark datasets. The experimental findings show that all-mpnet-base-v2 achieved statistically better results over the other models in precision, recall, and F1 score. Moreover, all-mpnet-base-v2 achieved highest scores for MAP and MRR and also retrieved the most number of relevant keywords on the average.\",\"PeriodicalId\":247679,\"journal\":{\"name\":\"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HORA58378.2023.10156762\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HORA58378.2023.10156762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

从文本文档中自动提取关键字是识别最能描述文本文档内容的文本单词或短语的问题。近年来，词嵌入技术在关键词提取中得到了广泛的应用，因为它通过融合语义信息来提高提取性能。在本研究中，我们关注了各种嵌入，并比较了它们在关键字提取方面的性能。为此，首先，我们修改了一个名为KeyBERT的关键字提取系统来处理不同的嵌入。然后，我们在七个基准数据集上使用十个模型运行修改后的应用程序。实验结果表明，与其他模型相比，all-mpnet-base-v2在准确率、召回率和F1分数方面取得了更好的统计结果。此外，all-mpnet-base-v2在MAP和MRR方面得分最高，并且平均检索到的相关关键词数量最多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Comparative Assessment of Various Embeddings for Keyword Extraction

Automatic keyword extraction from a text document is the problem of identifying in-text words or phrases that best describe the content of the text document. Recently, word embeddings found application in keyword extraction as they improve the performance by incorporating semantic information. In this study, we focus various embeddings and and compare their performance in keyword extraction. To this aim, firstly, we modified a keyword extraction system called KeyBERT to work with different embeddings. Then, we run the modfied application using ten models on seven benchmark datasets. The experimental findings show that all-mpnet-base-v2 achieved statistically better results over the other models in precision, recall, and F1 score. Moreover, all-mpnet-base-v2 achieved highest scores for MAP and MRR and also retrieved the most number of relevant keywords on the average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)

自引率

0.00%

发文量