Discovering latent themes in aviation safety reports using text mining and network analytics

IF 4.8 Q2 TRANSPORTATION International Journal of Transportation Science and Technology Pub Date : 2024-12-01 Epub Date: 2024-02-27 DOI:10.1016/j.ijtst.2024.02.009
Yingying Xing , Yutong Wu , Shiwen Zhang , Ling Wang , Haoyuan Cui , Bo Jia , Hongwei Wang
{"title":"Discovering latent themes in aviation safety reports using text mining and network analytics","authors":"Yingying Xing ,&nbsp;Yutong Wu ,&nbsp;Shiwen Zhang ,&nbsp;Ling Wang ,&nbsp;Haoyuan Cui ,&nbsp;Bo Jia ,&nbsp;Hongwei Wang","doi":"10.1016/j.ijtst.2024.02.009","DOIUrl":null,"url":null,"abstract":"<div><div>Aviation accidents, referring to unexpected and undesirable events involving aircraft, often cause great damage to property and human life. Learning from historical accidents is pivotal for improving safety in aviation. However, aviation accidents are typically documented and stored as unstructured or semi-structured free-text, rendering the ability to analyze such data a difficult task. This study presents a novel framework that combines text mining and network analytics techniques to provide the ability to analyze aviation accident reports automatically. The framework comprises a four-step modelling approach to: (1) the transformation of unstructured aviation safety report texts into structured numeric matrices using the TF-IDF matrix; (2) the identification of aviation accident topics using a structural topic model (STM); (3) the production of a word co-occurrence network (WCN) to determine the interrelations between aviation safety risk factors; and (4) quantitative analysis by technology of keywords to pinpoint key causal factors in aviation safety events. The proposed framework is validated by analyzing aviation accident reports collected by the National Transportation Safety Board (NTSB). The results indicate that STM provides a more granular partitioning of topics and better distinguishes between similar events compared to traditional latent dirichlet allocation (LDA). Among the identified topics, “Fuel and Power” and “En-route Phase” have the highest occurrence rate according to STM. Additionally, “Aircraft Crash” is the most prevalent topic in aviation accidents that resulted in fatal injuries, whereas the “Landing phase” is the most prevalent topic in non-fatal injuries on accidents. Based on the WCN, three centrality measures highlight “inspection of equipment” and “take off” as the most important risk factors in aviation safety. The proposed framework provides a comprehensive solution for in-depth analysis of aviation safety reports, offering decision support for aviation safety management and accident prevention, thereby reducing risks and strengthening safety measures.</div></div>","PeriodicalId":52282,"journal":{"name":"International Journal of Transportation Science and Technology","volume":"16 ","pages":"Pages 292-316"},"PeriodicalIF":4.8000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Transportation Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2046043024000297","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/27 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 0

Abstract

Aviation accidents, referring to unexpected and undesirable events involving aircraft, often cause great damage to property and human life. Learning from historical accidents is pivotal for improving safety in aviation. However, aviation accidents are typically documented and stored as unstructured or semi-structured free-text, rendering the ability to analyze such data a difficult task. This study presents a novel framework that combines text mining and network analytics techniques to provide the ability to analyze aviation accident reports automatically. The framework comprises a four-step modelling approach to: (1) the transformation of unstructured aviation safety report texts into structured numeric matrices using the TF-IDF matrix; (2) the identification of aviation accident topics using a structural topic model (STM); (3) the production of a word co-occurrence network (WCN) to determine the interrelations between aviation safety risk factors; and (4) quantitative analysis by technology of keywords to pinpoint key causal factors in aviation safety events. The proposed framework is validated by analyzing aviation accident reports collected by the National Transportation Safety Board (NTSB). The results indicate that STM provides a more granular partitioning of topics and better distinguishes between similar events compared to traditional latent dirichlet allocation (LDA). Among the identified topics, “Fuel and Power” and “En-route Phase” have the highest occurrence rate according to STM. Additionally, “Aircraft Crash” is the most prevalent topic in aviation accidents that resulted in fatal injuries, whereas the “Landing phase” is the most prevalent topic in non-fatal injuries on accidents. Based on the WCN, three centrality measures highlight “inspection of equipment” and “take off” as the most important risk factors in aviation safety. The proposed framework provides a comprehensive solution for in-depth analysis of aviation safety reports, offering decision support for aviation safety management and accident prevention, thereby reducing risks and strengthening safety measures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用文本挖掘和网络分析发现航空安全报告中的潜在主题
航空事故是指涉及飞机的意外和不希望发生的事件,通常会造成巨大的财产和生命损失。从历史事故中吸取教训对提高航空安全至关重要。然而,航空事故通常以非结构化或半结构化的自由文本形式记录和存储,这使得分析此类数据的能力成为一项艰巨的任务。本研究提出了一种结合文本挖掘和网络分析技术的新框架,以提供自动分析航空事故报告的能力。该框架包括四步建模方法:(1)使用TF-IDF矩阵将非结构化航空安全报告文本转换为结构化数字矩阵;(2)利用结构主题模型(STM)识别航空事故主题;(3)生成词共现网络(WCN),确定航空安全风险因素之间的相互关系;(4)利用关键词技术进行定量分析,找出航空安全事件的关键原因。该框架通过分析美国国家运输安全委员会(NTSB)收集的航空事故报告进行了验证。结果表明,与传统的潜在狄利克雷分配(LDA)相比,STM提供了更细粒度的主题划分,并更好地区分了相似事件。在确定的主题中,根据STM,“燃料和动力”和“途中阶段”的出现率最高。此外,在导致致命伤害的航空事故中,“飞机坠毁”是最普遍的话题,而在非致命伤害事故中,“着陆阶段”是最普遍的话题。基于WCN,三个中心性措施强调“设备检查”和“起飞”是航空安全中最重要的风险因素。该框架为深入分析航空安全报告提供了全面的解决方案,为航空安全管理和事故预防提供决策支持,从而降低风险,加强安全措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Transportation Science and Technology
International Journal of Transportation Science and Technology Engineering-Civil and Structural Engineering
CiteScore
7.20
自引率
0.00%
发文量
105
审稿时长
88 days
期刊最新文献
A dictionary-based Bayesian approach to optimizing left-turn restriction locations in grid networks Enhanced in-situ measurement and evaluation methods for subgrade modulus utilizing falling weight deflectometer Real-time traffic conflict prediction at signalized intersections using vehicle trajectory data and deep learning Collective accessibility impacts of public transport automation on rural areas: the case study of Mühlwald, South Tyrol First train schedule optimization for metro systems considering minimum adjustment cost for special event scenarios
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1