从长文本维护文档中提取因果知识

IF 8.2 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers in Industry Pub Date : 2024-05-31 DOI:10.1016/j.compind.2024.104110
Brad Hershowitz , Melinda Hodkiewicz , Tyler Bikaun , Michael Stewart , Wei Liu
{"title":"从长文本维护文档中提取因果知识","authors":"Brad Hershowitz ,&nbsp;Melinda Hodkiewicz ,&nbsp;Tyler Bikaun ,&nbsp;Michael Stewart ,&nbsp;Wei Liu","doi":"10.1016/j.compind.2024.104110","DOIUrl":null,"url":null,"abstract":"<div><p>Large numbers of maintenance Work Request Notification (WRN) records are created by industry as part of standard business work flows. These digital records hold invaluable insights crucial to best practice in asset management. Of particular interest are the cause–effect relations in the <em>long text</em> WRN field. In this research we develop a two-stage deep learning pipeline to extract cause-and-effect triples and construct a causal graph database. A novel sentence-level noise removal method in the first stage filters out information extraneous to causal semantics. The second stage leverages a joint entity-and-relation extraction model to extract causal relations. To train the noise removal and causality extraction models we produced an annotated dataset of 1027 WRN records. The results for causality extraction as measured by F1-score are 83% and 92% for the identification of <em>Cause</em> and <em>Effect</em> entities respectively, and 78% for a correct causal relation between these entities. The pipeline is applied to a real-word, industrial plant dataset of 98,000 WRN records to produce a graph database. This work provides a framework for technical personnel to query the causes of equipment failures enabling answers to questions such as “what are the most <em>common</em>, <em>costly</em>, and <em>recent</em> causes of failures at my facility?”.</p></div>","PeriodicalId":55219,"journal":{"name":"Computers in Industry","volume":null,"pages":null},"PeriodicalIF":8.2000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0166361524000381/pdfft?md5=96893d090d4ff3f33a64736705fd345b&pid=1-s2.0-S0166361524000381-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Causal knowledge extraction from long text maintenance documents\",\"authors\":\"Brad Hershowitz ,&nbsp;Melinda Hodkiewicz ,&nbsp;Tyler Bikaun ,&nbsp;Michael Stewart ,&nbsp;Wei Liu\",\"doi\":\"10.1016/j.compind.2024.104110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Large numbers of maintenance Work Request Notification (WRN) records are created by industry as part of standard business work flows. These digital records hold invaluable insights crucial to best practice in asset management. Of particular interest are the cause–effect relations in the <em>long text</em> WRN field. In this research we develop a two-stage deep learning pipeline to extract cause-and-effect triples and construct a causal graph database. A novel sentence-level noise removal method in the first stage filters out information extraneous to causal semantics. The second stage leverages a joint entity-and-relation extraction model to extract causal relations. To train the noise removal and causality extraction models we produced an annotated dataset of 1027 WRN records. The results for causality extraction as measured by F1-score are 83% and 92% for the identification of <em>Cause</em> and <em>Effect</em> entities respectively, and 78% for a correct causal relation between these entities. The pipeline is applied to a real-word, industrial plant dataset of 98,000 WRN records to produce a graph database. This work provides a framework for technical personnel to query the causes of equipment failures enabling answers to questions such as “what are the most <em>common</em>, <em>costly</em>, and <em>recent</em> causes of failures at my facility?”.</p></div>\",\"PeriodicalId\":55219,\"journal\":{\"name\":\"Computers in Industry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2024-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0166361524000381/pdfft?md5=96893d090d4ff3f33a64736705fd345b&pid=1-s2.0-S0166361524000381-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Industry\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0166361524000381\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Industry","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166361524000381","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

作为标准业务工作流程的一部分,企业创建了大量的维护工作申请通知(WRN)记录。这些数字记录蕴含着对资产管理最佳实践至关重要的宝贵见解。长文本 WRN 字段中的因果关系尤其值得关注。在这项研究中,我们开发了一种两阶段深度学习管道,用于提取因果三元组并构建因果图数据库。第一阶段采用一种新颖的句子级噪声去除方法,过滤掉与因果语义无关的信息。第二阶段利用实体和关系联合提取模型来提取因果关系。为了训练噪声去除和因果关系提取模型,我们制作了一个包含 1027 条 WRN 记录的注释数据集。根据 F1 分数衡量,因果关系提取的结果是,识别出 "因 "和 "果 "实体的正确率分别为 83% 和 92%,这些实体之间因果关系的正确率为 78%。该管道应用于一个包含 98,000 条 WRN 记录的工业工厂实词数据集,以生成一个图数据库。这项工作为技术人员查询设备故障原因提供了一个框架,使他们能够回答诸如 "我的工厂最常见、代价最高和最近发生的故障原因是什么?
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Causal knowledge extraction from long text maintenance documents

Large numbers of maintenance Work Request Notification (WRN) records are created by industry as part of standard business work flows. These digital records hold invaluable insights crucial to best practice in asset management. Of particular interest are the cause–effect relations in the long text WRN field. In this research we develop a two-stage deep learning pipeline to extract cause-and-effect triples and construct a causal graph database. A novel sentence-level noise removal method in the first stage filters out information extraneous to causal semantics. The second stage leverages a joint entity-and-relation extraction model to extract causal relations. To train the noise removal and causality extraction models we produced an annotated dataset of 1027 WRN records. The results for causality extraction as measured by F1-score are 83% and 92% for the identification of Cause and Effect entities respectively, and 78% for a correct causal relation between these entities. The pipeline is applied to a real-word, industrial plant dataset of 98,000 WRN records to produce a graph database. This work provides a framework for technical personnel to query the causes of equipment failures enabling answers to questions such as “what are the most common, costly, and recent causes of failures at my facility?”.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers in Industry
Computers in Industry 工程技术-计算机:跨学科应用
CiteScore
18.90
自引率
8.00%
发文量
152
审稿时长
22 days
期刊介绍: The objective of Computers in Industry is to present original, high-quality, application-oriented research papers that: • Illuminate emerging trends and possibilities in the utilization of Information and Communication Technology in industry; • Establish connections or integrations across various technology domains within the expansive realm of computer applications for industry; • Foster connections or integrations across diverse application areas of ICT in industry.
期刊最新文献
Rapid quality control for recycled coarse aggregates (RCA) streams: Multi-sensor integration for advanced contaminant detection Apple varieties and growth prediction with time series classification based on deep learning to impact the harvesting decisions Maximum subspace transferability discriminant analysis: A new cross-domain similarity measure for wind-turbine fault transfer diagnosis Dual channel visible graph convolutional neural network for microleakage monitoring of pipeline weld homalographic cracks Video-based automatic people counting for public transport: On-bus versus off-bus deployment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1