基于潜在狄利克雷分配和隐马尔可夫模型POS-TAG(词性标注)的Twitter故事生成器

Yasir Abdur Rohman, R. Kusumaningrum
{"title":"基于潜在狄利克雷分配和隐马尔可夫模型POS-TAG(词性标注)的Twitter故事生成器","authors":"Yasir Abdur Rohman, R. Kusumaningrum","doi":"10.1109/ICICoS48119.2019.8982411","DOIUrl":null,"url":null,"abstract":"Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.","PeriodicalId":105407,"journal":{"name":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)\",\"authors\":\"Yasir Abdur Rohman, R. Kusumaningrum\",\"doi\":\"10.1109/ICICoS48119.2019.8982411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.\",\"PeriodicalId\":105407,\"journal\":{\"name\":\"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICoS48119.2019.8982411\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS48119.2019.8982411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

2015年,Twitter在印尼的活跃用户达到5000万,而全球用户总数为2.84亿。2019年1月,Twitter的活跃用户比2018年增长了52%,而2018年的活跃用户仅为27%。大量的用户导致tweet文档的数量增加。包含用户活动、新闻、故事等信息的Tweet文档可以被处理成对记者有价值的信息。所有收集到的信息,然后根据相关的推文排列成一个故事,将成为新闻/文章。整个过程仍然是手动完成的,每条推文都是一个接一个地收集,大部分推文文档都是从趋势主题中收集的。实际上,这应该通过收集具有相同主题的推文来自动完成。因此,本研究提出了一种结合Latent Dirichlet Allocation (LDA)和Hidden Markov Model POS-TAG (Part-of-Speech Tagging)的Twitter故事生成器方法,可以基于特定主题生成Twitter故事生成器。我们在实验中实现了两个场景。第一个实验计算LDA和HMM POS-TAG上的perplexity值,得到最小perplexity值为6.31,alpha值为0.001,beta值为0.001,题目数为4。第二个实验计算了ROUGE-1、ROUGE-2、blue -1和blue -2对Twitter故事生成器结果的值,得到最佳的ROUGE-1值为0.470,beta帽值为0.1,最佳的ROUGE-2值为0.149,beta帽值为0.001。同时,主题1的最佳BLEU-1值为0.617,主题3的最佳BLEU-2值为0.432。当HMM POS-TAG能够正确标记推文文档时,使用该方法的推文故事生成器具有良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)
Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of GPGPU-Based Brute-Force and Dictionary Attack on SHA-1 Password Hash Ranking of Game Mechanics for Gamification in Mobile Payment Using AHP-TOPSIS: Uses and Gratification Perspective An Assesment of Knowledge Sharing System: SCeLE Universitas Indonesia Improved Line Operator for Retinal Blood Vessel Segmentation Classification of Abnormality in Chest X-Ray Images by Transfer Learning of CheXNet
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1