利用浅层学习和深度学习模型对用于灾难响应的推文分类系统进行实证研究

3区计算机科学 Q1 Computer Science Journal of Ambient Intelligence and Humanized Computing Pub Date : 2024-05-10 DOI:10.1007/s12652-024-04807-w

Kholoud Maswadi, Ali Alhazmi, Faisal Alshanketi, Christopher Ifeanyi Eke

{"title":"利用浅层学习和深度学习模型对用于灾难响应的推文分类系统进行实证研究","authors":"Kholoud Maswadi, Ali Alhazmi, Faisal Alshanketi, Christopher Ifeanyi Eke","doi":"10.1007/s12652-024-04807-w","DOIUrl":null,"url":null,"abstract":"<p>Disaster-based tweets during an emergency consist of a variety of information on people who have been hurt or killed, people who are lost or discovered, infrastructure and utilities destroyed; this information can assist governmental and humanitarian organizations in prioritizing their aid and rescue efforts. It is crucial to build a model that can categorize these tweets into distinct types due to their massive volume so as to better organize rescue and relief effort and save lives. In this study, Twitter data of 2013 Queensland flood and 2015 Nepal earthquake has been classified as disaster or non-disaster by employing three classes of models. The first model is performed using the lexical feature based on Term Frequency-Inverse Document Frequency (TF-IDF). The classification was performed using five classification algorithms such as DT, LR, SVM, RF, while Ensemble Voting was used to produce the outcome of the models. The second model uses shallow classifiers in conjunction with several features, including lexical (TF-IDF), hashtag, POS, and GloVe embedding. The third set of the model utilized deep learning algorithms including LSTM, LSTM, and GRU, using BERT (Bidirectional Encoder Representations from Transformers) for constructing semantic word embedding to learn the context. The key performance evaluation metrics such as accuracy, F1 score, recall, and precision were employed to measure and compare the three sets of models for disaster response classification on two publicly available Twitter datasets. By performing a comprehensive empirical evaluation of the tweet classification technique across different disaster kinds, the predictive performance shows that the best accuracy was achieved with DT algorithm which attained the highest performance accuracy followed by Bi-LSTM models for disaster response classification by attaining the best accuracy of 96.46% and 96.40% on the Queensland flood dataset; DT algorithm also attained 78.3% accuracy on the Nepal earthquake dataset based on the majority-voting ensemble respectively. Thus, this research contributes by investigating the integration of deep and shallow learning models effectively in a tweet classification system designed for disaster response. Examining the ways that these two methods work seamlessly offers insights into how to best utilize their complimentary advantages to increase the robustness and accuracy of locating suitable data in disaster crisis.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The empirical study of tweet classification system for disaster response using shallow and deep learning models\",\"authors\":\"Kholoud Maswadi, Ali Alhazmi, Faisal Alshanketi, Christopher Ifeanyi Eke\",\"doi\":\"10.1007/s12652-024-04807-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Disaster-based tweets during an emergency consist of a variety of information on people who have been hurt or killed, people who are lost or discovered, infrastructure and utilities destroyed; this information can assist governmental and humanitarian organizations in prioritizing their aid and rescue efforts. It is crucial to build a model that can categorize these tweets into distinct types due to their massive volume so as to better organize rescue and relief effort and save lives. In this study, Twitter data of 2013 Queensland flood and 2015 Nepal earthquake has been classified as disaster or non-disaster by employing three classes of models. The first model is performed using the lexical feature based on Term Frequency-Inverse Document Frequency (TF-IDF). The classification was performed using five classification algorithms such as DT, LR, SVM, RF, while Ensemble Voting was used to produce the outcome of the models. The second model uses shallow classifiers in conjunction with several features, including lexical (TF-IDF), hashtag, POS, and GloVe embedding. The third set of the model utilized deep learning algorithms including LSTM, LSTM, and GRU, using BERT (Bidirectional Encoder Representations from Transformers) for constructing semantic word embedding to learn the context. The key performance evaluation metrics such as accuracy, F1 score, recall, and precision were employed to measure and compare the three sets of models for disaster response classification on two publicly available Twitter datasets. By performing a comprehensive empirical evaluation of the tweet classification technique across different disaster kinds, the predictive performance shows that the best accuracy was achieved with DT algorithm which attained the highest performance accuracy followed by Bi-LSTM models for disaster response classification by attaining the best accuracy of 96.46% and 96.40% on the Queensland flood dataset; DT algorithm also attained 78.3% accuracy on the Nepal earthquake dataset based on the majority-voting ensemble respectively. Thus, this research contributes by investigating the integration of deep and shallow learning models effectively in a tweet classification system designed for disaster response. Examining the ways that these two methods work seamlessly offers insights into how to best utilize their complimentary advantages to increase the robustness and accuracy of locating suitable data in disaster crisis.</p>\",\"PeriodicalId\":14959,\"journal\":{\"name\":\"Journal of Ambient Intelligence and Humanized Computing\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Ambient Intelligence and Humanized Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12652-024-04807-w\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ambient Intelligence and Humanized Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12652-024-04807-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

紧急情况下的灾难推文包括各种信息，如人员伤亡、人员走失或被发现、基础设施和公用设施被毁等，这些信息可以帮助政府和人道主义组织确定援助和救援工作的优先次序。由于推文数量庞大，建立一个能将这些推文分为不同类型的模型至关重要，这样才能更好地组织救援工作，拯救生命。在本研究中，通过采用三类模型将 2013 年昆士兰洪灾和 2015 年尼泊尔地震的 Twitter 数据划分为灾难或非灾难。第一个模型使用基于词频-反向文档频率（TF-IDF）的词性特征。分类使用了五种分类算法，如 DT、LR、SVM、RF，同时使用了集合投票来生成模型结果。第二个模型使用浅层分类器，并结合词法（TF-IDF）、标签、POS 和 GloVe 嵌入等几个特征。第三套模型利用了深度学习算法，包括 LSTM、LSTM 和 GRU，使用 BERT（Bidirectional Encoder Representations from Transformers）构建语义词嵌入来学习上下文。在两个公开的 Twitter 数据集上，采用准确率、F1 分数、召回率和精确度等关键性能评估指标来衡量和比较用于灾难响应分类的三套模型。通过对不同灾害类型的推文分类技术进行综合实证评估，预测结果表明，在昆士兰洪水数据集上，DT 算法的准确率最高，其次是用于灾害响应分类的 Bi-LSTM 模型，分别达到 96.46% 和 96.40% 的最佳准确率；在尼泊尔地震数据集上，基于多数票合集的 DT 算法也分别达到了 78.3% 的准确率。因此，本研究通过研究深度学习模型和浅层学习模型在为灾害响应而设计的推特分类系统中的有效整合做出了贡献。通过研究这两种方法的无缝工作方式，可以深入了解如何更好地利用它们的互补优势，提高在灾难危机中定位合适数据的稳健性和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The empirical study of tweet classification system for disaster response using shallow and deep learning models

Disaster-based tweets during an emergency consist of a variety of information on people who have been hurt or killed, people who are lost or discovered, infrastructure and utilities destroyed; this information can assist governmental and humanitarian organizations in prioritizing their aid and rescue efforts. It is crucial to build a model that can categorize these tweets into distinct types due to their massive volume so as to better organize rescue and relief effort and save lives. In this study, Twitter data of 2013 Queensland flood and 2015 Nepal earthquake has been classified as disaster or non-disaster by employing three classes of models. The first model is performed using the lexical feature based on Term Frequency-Inverse Document Frequency (TF-IDF). The classification was performed using five classification algorithms such as DT, LR, SVM, RF, while Ensemble Voting was used to produce the outcome of the models. The second model uses shallow classifiers in conjunction with several features, including lexical (TF-IDF), hashtag, POS, and GloVe embedding. The third set of the model utilized deep learning algorithms including LSTM, LSTM, and GRU, using BERT (Bidirectional Encoder Representations from Transformers) for constructing semantic word embedding to learn the context. The key performance evaluation metrics such as accuracy, F1 score, recall, and precision were employed to measure and compare the three sets of models for disaster response classification on two publicly available Twitter datasets. By performing a comprehensive empirical evaluation of the tweet classification technique across different disaster kinds, the predictive performance shows that the best accuracy was achieved with DT algorithm which attained the highest performance accuracy followed by Bi-LSTM models for disaster response classification by attaining the best accuracy of 96.46% and 96.40% on the Queensland flood dataset; DT algorithm also attained 78.3% accuracy on the Nepal earthquake dataset based on the majority-voting ensemble respectively. Thus, this research contributes by investigating the integration of deep and shallow learning models effectively in a tweet classification system designed for disaster response. Examining the ways that these two methods work seamlessly offers insights into how to best utilize their complimentary advantages to increase the robustness and accuracy of locating suitable data in disaster crisis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Ambient Intelligence and Humanized Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.60

自引率

0.00%

发文量

854

期刊介绍： The purpose of JAIHC is to provide a high profile, leading edge forum for academics, industrial professionals, educators and policy makers involved in the field to contribute, to disseminate the most innovative researches and developments of all aspects of ambient intelligence and humanized computing, such as intelligent/smart objects, environments/spaces, and systems. The journal discusses various technical, safety, personal, social, physical, political, artistic and economic issues. The research topics covered by the journal are (but not limited to): Pervasive/Ubiquitous Computing and Applications Cognitive wireless sensor network Embedded Systems and Software Mobile Computing and Wireless Communications Next Generation Multimedia Systems Security, Privacy and Trust Service and Semantic Computing Advanced Networking Architectures Dependable, Reliable and Autonomic Computing Embedded Smart Agents Context awareness, social sensing and inference Multi modal interaction design Ergonomics and product prototyping Intelligent and self-organizing transportation networks & services Healthcare Systems Virtual Humans & Virtual Worlds Wearables sensors and actuators