Semi-Supervised Classification with A*: A Case Study on Electronic Invoicing

IF 4.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data and Cognitive Computing Pub Date : 2023-09-20 DOI:10.3390/bdcc7030155
Bernardo Panichi, Alessandro Lazzeri
{"title":"Semi-Supervised Classification with A*: A Case Study on Electronic Invoicing","authors":"Bernardo Panichi, Alessandro Lazzeri","doi":"10.3390/bdcc7030155","DOIUrl":null,"url":null,"abstract":"This paper addresses the time-intensive task of assigning accurate account labels to invoice entries within corporate bookkeeping. Despite the advent of electronic invoicing, many software solutions still rely on rule-based approaches that fail to address the multifaceted nature of this challenge. While machine learning holds promise for such repetitive tasks, the presence of low-quality training data often poses a hurdle. Frequently, labels pertain to invoice rows at a group level rather than an individual level, leading to the exclusion of numerous records during preprocessing. To enhance the efficiency of an invoice entry classifier within a semi-supervised context, this study proposes an innovative approach that combines the classifier with the A* graph search algorithm. Through experimentation across various classifiers, the results consistently demonstrated a noteworthy increase in accuracy, ranging between 1% and 4%. This improvement is primarily attributed to a marked reduction in the discard rate of data, which decreased from 39% to 14%. This paper contributes to the literature by presenting a method that leverages the synergy of a classifier and A* graph search to overcome challenges posed by limited and group-level label information in the realm of electronic invoicing classification.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"25 1","pages":"0"},"PeriodicalIF":4.4000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc7030155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

This paper addresses the time-intensive task of assigning accurate account labels to invoice entries within corporate bookkeeping. Despite the advent of electronic invoicing, many software solutions still rely on rule-based approaches that fail to address the multifaceted nature of this challenge. While machine learning holds promise for such repetitive tasks, the presence of low-quality training data often poses a hurdle. Frequently, labels pertain to invoice rows at a group level rather than an individual level, leading to the exclusion of numerous records during preprocessing. To enhance the efficiency of an invoice entry classifier within a semi-supervised context, this study proposes an innovative approach that combines the classifier with the A* graph search algorithm. Through experimentation across various classifiers, the results consistently demonstrated a noteworthy increase in accuracy, ranging between 1% and 4%. This improvement is primarily attributed to a marked reduction in the discard rate of data, which decreased from 39% to 14%. This paper contributes to the literature by presenting a method that leverages the synergy of a classifier and A* graph search to overcome challenges posed by limited and group-level label information in the realm of electronic invoicing classification.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
带A*的半监督分类:以电子发票为例
本文解决了在公司簿记中分配准确帐户标签的时间密集型任务。尽管出现了电子发票,但许多软件解决方案仍然依赖于基于规则的方法,无法解决这一挑战的多面性。虽然机器学习有望解决此类重复性任务,但低质量训练数据的存在往往构成障碍。通常,标签属于组级别而不是个人级别的发票行,这会导致在预处理期间排除大量记录。为了提高发票输入分类器在半监督环境下的效率,本研究提出了一种将分类器与a *图搜索算法相结合的创新方法。通过对各种分类器的实验,结果一致表明准确率显著提高,范围在1%到4%之间。这一改进主要归功于数据丢弃率的显著降低,从39%降至14%。本文通过提出一种利用分类器和a *图搜索的协同作用来克服电子发票分类领域中有限和组级标签信息所带来的挑战的方法,为文献做出了贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Big Data and Cognitive Computing
Big Data and Cognitive Computing Business, Management and Accounting-Management Information Systems
CiteScore
7.10
自引率
8.10%
发文量
128
审稿时长
11 weeks
期刊最新文献
Evaluating the Performance of Topic Modeling Techniques with Human Validation to Support Qualitative Analysis Almost Nobody Is Using ChatGPT to Write Academic Science Papers (Yet). A Survey of Incremental Deep Learning for Defect Detection in Manufacturing BNMI-DINA: A Bayesian Cognitive Diagnosis Model for Enhanced Personalized Learning Semantic Similarity of Common Verbal Expressions in Older Adults through a Pre-Trained Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1