Semi-Supervised Classification with A*: A Case Study on Electronic Invoicing

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data and Cognitive Computing Pub Date : 2023-09-20 DOI:10.3390/bdcc7030155

Bernardo Panichi, Alessandro Lazzeri

{"title":"Semi-Supervised Classification with A*: A Case Study on Electronic Invoicing","authors":"Bernardo Panichi, Alessandro Lazzeri","doi":"10.3390/bdcc7030155","DOIUrl":null,"url":null,"abstract":"This paper addresses the time-intensive task of assigning accurate account labels to invoice entries within corporate bookkeeping. Despite the advent of electronic invoicing, many software solutions still rely on rule-based approaches that fail to address the multifaceted nature of this challenge. While machine learning holds promise for such repetitive tasks, the presence of low-quality training data often poses a hurdle. Frequently, labels pertain to invoice rows at a group level rather than an individual level, leading to the exclusion of numerous records during preprocessing. To enhance the efficiency of an invoice entry classifier within a semi-supervised context, this study proposes an innovative approach that combines the classifier with the A* graph search algorithm. Through experimentation across various classifiers, the results consistently demonstrated a noteworthy increase in accuracy, ranging between 1% and 4%. This improvement is primarily attributed to a marked reduction in the discard rate of data, which decreased from 39% to 14%. This paper contributes to the literature by presenting a method that leverages the synergy of a classifier and A* graph search to overcome challenges posed by limited and group-level label information in the realm of electronic invoicing classification.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"25 1","pages":"0"},"PeriodicalIF":3.7000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc7030155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This paper addresses the time-intensive task of assigning accurate account labels to invoice entries within corporate bookkeeping. Despite the advent of electronic invoicing, many software solutions still rely on rule-based approaches that fail to address the multifaceted nature of this challenge. While machine learning holds promise for such repetitive tasks, the presence of low-quality training data often poses a hurdle. Frequently, labels pertain to invoice rows at a group level rather than an individual level, leading to the exclusion of numerous records during preprocessing. To enhance the efficiency of an invoice entry classifier within a semi-supervised context, this study proposes an innovative approach that combines the classifier with the A* graph search algorithm. Through experimentation across various classifiers, the results consistently demonstrated a noteworthy increase in accuracy, ranging between 1% and 4%. This improvement is primarily attributed to a marked reduction in the discard rate of data, which decreased from 39% to 14%. This paper contributes to the literature by presenting a method that leverages the synergy of a classifier and A* graph search to overcome challenges posed by limited and group-level label information in the realm of electronic invoicing classification.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

带A*的半监督分类:以电子发票为例

本文解决了在公司簿记中分配准确帐户标签的时间密集型任务。尽管出现了电子发票，但许多软件解决方案仍然依赖于基于规则的方法，无法解决这一挑战的多面性。虽然机器学习有望解决此类重复性任务，但低质量训练数据的存在往往构成障碍。通常，标签属于组级别而不是个人级别的发票行，这会导致在预处理期间排除大量记录。为了提高发票输入分类器在半监督环境下的效率，本研究提出了一种将分类器与a *图搜索算法相结合的创新方法。通过对各种分类器的实验，结果一致表明准确率显著提高，范围在1%到4%之间。这一改进主要归功于数据丢弃率的显著降低，从39%降至14%。本文通过提出一种利用分类器和a *图搜索的协同作用来克服电子发票分类领域中有限和组级标签信息所带来的挑战的方法，为文献做出了贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊