从与灾难相关的推文中提取关键词

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313696

Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea

{"title":"从与灾难相关的推文中提取关键词","authors":"Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea","doi":"10.1145/3308558.3313696","DOIUrl":null,"url":null,"abstract":"While keyphrase extraction has received considerable attention in recent years, relatively few studies exist on extracting keyphrases from social media platforms such as Twitter, and even fewer for extracting disaster-related keyphrases from such sources. During a disaster, keyphrases can be extremely useful for filtering relevant tweets that can enhance situational awareness. Previously, joint training of two different layers of a stacked Recurrent Neural Network for keyword discovery and keyphrase extraction had been shown to be effective in extracting keyphrases from general Twitter data. We improve the model's performance on both general Twitter data and disaster-related Twitter data by incorporating contextual word embeddings, POS-tags, phonetics, and phonological features. Moreover, we discuss the shortcomings of the often used F1-measure for evaluating the quality of predicted keyphrases with respect to the ground truth annotations. Instead of the F1-measure, we propose the use of embedding-based metrics to better capture the correctness of the predicted keyphrases. In addition, we also present a novel extension of an embedding-based metric. The extension allows one to better control the penalty for the difference in the number of ground-truth and predicted keyphrases.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Keyphrase Extraction from Disaster-related Tweets\",\"authors\":\"Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea\",\"doi\":\"10.1145/3308558.3313696\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While keyphrase extraction has received considerable attention in recent years, relatively few studies exist on extracting keyphrases from social media platforms such as Twitter, and even fewer for extracting disaster-related keyphrases from such sources. During a disaster, keyphrases can be extremely useful for filtering relevant tweets that can enhance situational awareness. Previously, joint training of two different layers of a stacked Recurrent Neural Network for keyword discovery and keyphrase extraction had been shown to be effective in extracting keyphrases from general Twitter data. We improve the model's performance on both general Twitter data and disaster-related Twitter data by incorporating contextual word embeddings, POS-tags, phonetics, and phonological features. Moreover, we discuss the shortcomings of the often used F1-measure for evaluating the quality of predicted keyphrases with respect to the ground truth annotations. Instead of the F1-measure, we propose the use of embedding-based metrics to better capture the correctness of the predicted keyphrases. In addition, we also present a novel extension of an embedding-based metric. The extension allows one to better control the penalty for the difference in the number of ground-truth and predicted keyphrases.\",\"PeriodicalId\":23013,\"journal\":{\"name\":\"The World Wide Web Conference\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The World Wide Web Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3308558.3313696\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

近年来，关键词提取受到了相当大的关注，但从Twitter等社交媒体平台中提取关键词的研究相对较少，而从这些平台中提取与灾害相关的关键词的研究就更少了。在灾难中，关键字对于过滤相关的推文非常有用，可以增强态势感知。在此之前，将两个不同层的堆叠递归神经网络联合训练用于关键字发现和关键字提取，已被证明可以有效地从一般Twitter数据中提取关键字。我们通过结合上下文词嵌入、pos标签、语音学和音系特征，提高了模型在一般Twitter数据和与灾难相关的Twitter数据上的性能。此外，我们讨论了常用的f1度量的缺点，用于评估相对于基础真值注释的预测关键短语的质量。代替f1度量，我们建议使用基于嵌入的度量来更好地捕获预测关键短语的正确性。此外，我们还提出了一种基于嵌入的度量的新扩展。该扩展允许一个人更好地控制惩罚的数量的基础真理和预测的关键字的差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Keyphrase Extraction from Disaster-related Tweets

While keyphrase extraction has received considerable attention in recent years, relatively few studies exist on extracting keyphrases from social media platforms such as Twitter, and even fewer for extracting disaster-related keyphrases from such sources. During a disaster, keyphrases can be extremely useful for filtering relevant tweets that can enhance situational awareness. Previously, joint training of two different layers of a stacked Recurrent Neural Network for keyword discovery and keyphrase extraction had been shown to be effective in extracting keyphrases from general Twitter data. We improve the model's performance on both general Twitter data and disaster-related Twitter data by incorporating contextual word embeddings, POS-tags, phonetics, and phonological features. Moreover, we discuss the shortcomings of the often used F1-measure for evaluating the quality of predicted keyphrases with respect to the ground truth annotations. Instead of the F1-measure, we propose the use of embedding-based metrics to better capture the correctness of the predicted keyphrases. In addition, we also present a novel extension of an embedding-based metric. The extension allows one to better control the penalty for the difference in the number of ground-truth and predicted keyphrases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量

期刊最新文献

Decoupled Smoothing on Graphs Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis Augmenting Knowledge Tracing by Considering Forgetting Behavior Enhancing Fashion Recommendation with Visual Compatibility Relationship Judging a Book by Its Cover: The Effect of Facial Perception on Centrality in Social Networks