GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Pub Date : 2023-07-19 DOI:10.5220/0012084400003541
Nils Freyer, Dustin Thewes, Matthias Meinecke
{"title":"GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts","authors":"Nils Freyer, Dustin Thewes, Matthias Meinecke","doi":"10.5220/0012084400003541","DOIUrl":null,"url":null,"abstract":"Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"335-342"},"PeriodicalIF":2.2000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.5220/0012084400003541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GUIDO:从自然语言文本中发现和排序指南的混合方法
从文本描述中提取工作流网络可用于简化指导方针或形式化正式流程(如业务流程和算法)的文本描述。然而,手动提取流程的任务需要领域的专业知识和努力。虽然需要自动提取过程模型,但是用形式化的过程模型对文本进行注释是非常昂贵的。因此,只有少数几种基于机器学习的提取方法。反过来,基于规则的方法需要领域特异性才能很好地工作,并且很少能够区分文本描述中的相关和不相关信息。在本文中,我们提出了GUIDO,这是一种过程模型提取任务的混合方法,首先,使用基于bert的句子分类器根据与过程模型的相关性对句子进行分类,然后使用依赖解析从分类为相关的句子中提取过程模型。所提出的方法比纯基于规则的方法取得了明显更好的结果。GUIDO的平均行为相似性得分为0.93美元。尽管如此,与纯粹基于机器学习的方法相比,注释的成本仍然很低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data
Data Decision Sciences-Information Systems and Management
CiteScore
4.30
自引率
3.80%
发文量
0
审稿时长
10 weeks
期刊最新文献
Medical Opinions Analysis about the Decrease of Autopsies Using Emerging Pattern Mining Unlocking Insights: Analysing COVID-19 Lockdown Policies and Mobility Data in Victoria, Australia, through a Data-Driven Machine Learning Approach Expert-Annotated Dataset to Study Cyberbullying in Polish Language Genome Sequence of the Plant-Growth-Promoting Endophyte Curtobacterium flaccumfaciens Strain W004 A Qualitative Dataset for Coffee Bio-Aggressors Detection Based on the Ancestral Knowledge of the Cauca Coffee Farmers in Colombia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1