基于变压器模型的学习曲线预测

Q1 Multidisciplinary Emerging Science Journal Pub Date : 2023-10-01 DOI:10.28991/esj-2023-07-05-03

Francisco Cruz, Mauro Castelli

{"title":"基于变压器模型的学习曲线预测","authors":"Francisco Cruz, Mauro Castelli","doi":"10.28991/esj-2023-07-05-03","DOIUrl":null,"url":null,"abstract":"One of the main challenges when training or fine-tuning a machine learning model concerns the number of observations necessary to achieve satisfactory performance. While, in general, more training observations result in a better-performing model, collecting more data can be time-consuming, expensive, or even impossible. For this reason, investigating the relationship between the dataset's size and the performance of a machine learning model is fundamental to deciding, with a certain likelihood, the minimum number of observations that are necessary to ensure a satisfactory-performing model is obtained as a result of the training process. The learning curve represents the relationship between the dataset’s size and the performance of the model and is especially useful when choosing a model for a specific task or planning the annotation work of a dataset. Thus, the purpose of this paper is to find the functions that best fit the learning curves of a Transformers-based model (LayoutLM) when fine-tuned to extract information from invoices. Two new datasets of invoices are made available for such a task. Combined with a third dataset already available online, 22 sub-datasets are defined, and their learning curves are plotted based on cross-validation results. The functions are fit using a non-linear least squares technique. The results show that both a bi-asymptotic and a Morgan-Mercer-Flodin function fit the learning curves extremely well. Also, an empirical relation is presented to predict the learning curve from a single parameter that may be easily obtained in the early stage of the annotation process. Doi: 10.28991/ESJ-2023-07-05-03 Full Text: PDF","PeriodicalId":11586,"journal":{"name":"Emerging Science Journal","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Curves Prediction for a Transformers-Based Model\",\"authors\":\"Francisco Cruz, Mauro Castelli\",\"doi\":\"10.28991/esj-2023-07-05-03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the main challenges when training or fine-tuning a machine learning model concerns the number of observations necessary to achieve satisfactory performance. While, in general, more training observations result in a better-performing model, collecting more data can be time-consuming, expensive, or even impossible. For this reason, investigating the relationship between the dataset's size and the performance of a machine learning model is fundamental to deciding, with a certain likelihood, the minimum number of observations that are necessary to ensure a satisfactory-performing model is obtained as a result of the training process. The learning curve represents the relationship between the dataset’s size and the performance of the model and is especially useful when choosing a model for a specific task or planning the annotation work of a dataset. Thus, the purpose of this paper is to find the functions that best fit the learning curves of a Transformers-based model (LayoutLM) when fine-tuned to extract information from invoices. Two new datasets of invoices are made available for such a task. Combined with a third dataset already available online, 22 sub-datasets are defined, and their learning curves are plotted based on cross-validation results. The functions are fit using a non-linear least squares technique. The results show that both a bi-asymptotic and a Morgan-Mercer-Flodin function fit the learning curves extremely well. Also, an empirical relation is presented to predict the learning curve from a single parameter that may be easily obtained in the early stage of the annotation process. Doi: 10.28991/ESJ-2023-07-05-03 Full Text: PDF\",\"PeriodicalId\":11586,\"journal\":{\"name\":\"Emerging Science Journal\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Emerging Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.28991/esj-2023-07-05-03\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emerging Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28991/esj-2023-07-05-03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

摘要

训练或微调机器学习模型时的主要挑战之一涉及到实现令人满意的性能所需的观察数量。虽然，一般来说，更多的训练观察结果会产生更好的模型，但收集更多的数据可能是耗时的、昂贵的，甚至是不可能的。出于这个原因，研究数据集大小和机器学习模型性能之间的关系是决定的基础，在一定的可能性下，最少的观察次数是确保作为训练过程的结果获得一个令人满意的模型所必需的。学习曲线表示数据集大小和模型性能之间的关系，在为特定任务选择模型或计划数据集的注释工作时特别有用。因此，本文的目的是找到最适合基于transformer的模型(LayoutLM)的学习曲线的函数，当对其进行微调以从发票中提取信息时。为这样的任务提供了两个新的发票数据集。结合已有的第三个在线数据集，定义了22个子数据集，并根据交叉验证结果绘制了它们的学习曲线。利用非线性最小二乘技术对函数进行拟合。结果表明，双渐近函数和Morgan-Mercer-Flodin函数都能很好地拟合学习曲线。此外，本文还提出了一种经验关系，用于从单个参数预测学习曲线，该参数在注释过程的早期阶段很容易获得。Doi: 10.28991/ESJ-2023-07-05-03全文:PDF

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning Curves Prediction for a Transformers-Based Model

One of the main challenges when training or fine-tuning a machine learning model concerns the number of observations necessary to achieve satisfactory performance. While, in general, more training observations result in a better-performing model, collecting more data can be time-consuming, expensive, or even impossible. For this reason, investigating the relationship between the dataset's size and the performance of a machine learning model is fundamental to deciding, with a certain likelihood, the minimum number of observations that are necessary to ensure a satisfactory-performing model is obtained as a result of the training process. The learning curve represents the relationship between the dataset’s size and the performance of the model and is especially useful when choosing a model for a specific task or planning the annotation work of a dataset. Thus, the purpose of this paper is to find the functions that best fit the learning curves of a Transformers-based model (LayoutLM) when fine-tuned to extract information from invoices. Two new datasets of invoices are made available for such a task. Combined with a third dataset already available online, 22 sub-datasets are defined, and their learning curves are plotted based on cross-validation results. The functions are fit using a non-linear least squares technique. The results show that both a bi-asymptotic and a Morgan-Mercer-Flodin function fit the learning curves extremely well. Also, an empirical relation is presented to predict the learning curve from a single parameter that may be easily obtained in the early stage of the annotation process. Doi: 10.28991/ESJ-2023-07-05-03 Full Text: PDF

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊