BERT ile Kazak Haber Veri Kümesinden Anahtar Kelime Çıkarımı

Aiman Abibullayeva, Aydın Çeti̇n
{"title":"BERT ile Kazak Haber Veri Kümesinden Anahtar Kelime Çıkarımı","authors":"Aiman Abibullayeva, Aydın Çeti̇n","doi":"10.31202/ecjse.1131826","DOIUrl":null,"url":null,"abstract":"Keywords provide a concise and precise description of the document's content. Due to the importance of the keyword and the difficulty of manual markup, automatic keyword extraction makes this process easy and fast. In this paper, Keyword Extraction from Kazakh News Dataset was presented. Model performance results were obtained by using the BERT base - uncased and BERT-base-multilingual-uncased pre-trained language model for the newly compiled Kazakh News Dataset-KND. Compiled Kazakh news data set consists of 7060 data. Data were collected from the web pages anatili.kazgazeta.kz, Bilimdinews.kz, and zhasalash.kz using the BeautifulSoap and Requests libraries. These web pages mostly contain news, history, and literary texts. The dataset includes the publication name or news title, the author of the publication or news subject, and the URL of the Kazakh news site. In the evaluation of the training results, it was observed that the BERT base-multilingual-uncased F-score performance was higher than the BERT model.","PeriodicalId":11622,"journal":{"name":"El-Cezeri Fen ve Mühendislik Dergisi","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"El-Cezeri Fen ve Mühendislik Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31202/ecjse.1131826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Keywords provide a concise and precise description of the document's content. Due to the importance of the keyword and the difficulty of manual markup, automatic keyword extraction makes this process easy and fast. In this paper, Keyword Extraction from Kazakh News Dataset was presented. Model performance results were obtained by using the BERT base - uncased and BERT-base-multilingual-uncased pre-trained language model for the newly compiled Kazakh News Dataset-KND. Compiled Kazakh news data set consists of 7060 data. Data were collected from the web pages anatili.kazgazeta.kz, Bilimdinews.kz, and zhasalash.kz using the BeautifulSoap and Requests libraries. These web pages mostly contain news, history, and literary texts. The dataset includes the publication name or news title, the author of the publication or news subject, and the URL of the Kazakh news site. In the evaluation of the training results, it was observed that the BERT base-multilingual-uncased F-score performance was higher than the BERT model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关键词提供了对文档内容的简明而精确的描述。由于关键字的重要性和手工标记的难度,自动关键字提取使这一过程变得简单和快速。本文提出了一种基于哈萨克语新闻数据集的关键词提取方法。对新编译的哈萨克语新闻数据集- knd使用BERT基非case和BERT基多语言非case预训练语言模型获得模型性能结果。编译的哈萨克语新闻数据集由7060个数据组成。数据收集自anatili.kazgazeta网站。kz, Bilimdinews。还有炸土豆条。使用BeautifulSoap和Requests库。这些网页大多包含新闻、历史和文学文本。数据集包括出版物名称或新闻标题、出版物或新闻主题的作者以及哈萨克新闻网站的URL。在对训练结果的评价中,我们观察到基于BERT的多语言-不加大小写的f分表现高于BERT模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Human Robot Interaction with Social Humanoid Robots A Single Source Thirteen Level Switched Capacitor Boost Inverter for PV applications Yakınsak-Konik Nozulların Giriş ve Çıkış Çaplarının İtme Kuvveti ve Hacimsel Debi Üzerindeki Etkisinin Teorik, Nümerik ve Deneysel İncelemesi Zeytinyağı Üretim Atıklarının Yün Boyamacılığında Kullanım Olanaklarının Araştırılması Yer Tepki Analizlerinde Farklı Dinamik Kayma Modülü Yaklaşımları Kullanılarak Belirlenen Tepki Spektrumlarının Karşılaştırılması
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1