Raydonal Ospina, Ranah Duarte Costa, Leandro Chaves Rêgo, Fernando Marmolejo‐Ramos
{"title":"非参数量化器在在线手写签名验证中的应用:统计学习方法","authors":"Raydonal Ospina, Ranah Duarte Costa, Leandro Chaves Rêgo, Fernando Marmolejo‐Ramos","doi":"10.1002/sam.11673","DOIUrl":null,"url":null,"abstract":"This work explores the use of nonparametric quantifiers in the signature verification problem of handwritten signatures. We used the MCYT‐100 (MCYT Fingerprint subcorpus) database, widely used in signature verification problems. The discrete‐time sequence positions in the <jats:italic>x</jats:italic> ‐axis and <jats:italic>y</jats:italic>‐axis provided in the database are preprocessed, and time causal information based on nonparametric quantifiers such as entropy, complexity, Fisher information, and trend are employed. The study also proposes to evaluate these quantifiers with the time series obtained, applying the first and second derivatives of each sequence position to evaluate the dynamic behavior by looking at their velocity and acceleration regimes, respectively. The signatures in the MCYT‐100 database are classified via Logistic Regression, Support Vector Machines (SVM), Random Forest, and Extreme Gradient Boosting (XGBoost). The quantifiers were used as input features to train the classifiers. To assess the ability and impact of nonparametric quantifiers to distinguish forgery and genuine signatures, we used variable selection criteria, such as: information gain, analysis of variance, and variance inflation factor. The performance of classifiers was evaluated by measures of classification error such as specificity and area under the curve. The results show that the SVM and XGBoost classifiers present the best performance.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"13 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of nonparametric quantifiers for online handwritten signature verification: A statistical learning approach\",\"authors\":\"Raydonal Ospina, Ranah Duarte Costa, Leandro Chaves Rêgo, Fernando Marmolejo‐Ramos\",\"doi\":\"10.1002/sam.11673\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work explores the use of nonparametric quantifiers in the signature verification problem of handwritten signatures. We used the MCYT‐100 (MCYT Fingerprint subcorpus) database, widely used in signature verification problems. The discrete‐time sequence positions in the <jats:italic>x</jats:italic> ‐axis and <jats:italic>y</jats:italic>‐axis provided in the database are preprocessed, and time causal information based on nonparametric quantifiers such as entropy, complexity, Fisher information, and trend are employed. The study also proposes to evaluate these quantifiers with the time series obtained, applying the first and second derivatives of each sequence position to evaluate the dynamic behavior by looking at their velocity and acceleration regimes, respectively. The signatures in the MCYT‐100 database are classified via Logistic Regression, Support Vector Machines (SVM), Random Forest, and Extreme Gradient Boosting (XGBoost). The quantifiers were used as input features to train the classifiers. To assess the ability and impact of nonparametric quantifiers to distinguish forgery and genuine signatures, we used variable selection criteria, such as: information gain, analysis of variance, and variance inflation factor. The performance of classifiers was evaluated by measures of classification error such as specificity and area under the curve. The results show that the SVM and XGBoost classifiers present the best performance.\",\"PeriodicalId\":48684,\"journal\":{\"name\":\"Statistical Analysis and Data Mining\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Analysis and Data Mining\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11673\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11673","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
这项研究探索了非参数量化器在手写签名验证问题中的应用。我们使用了广泛应用于签名验证问题的 MCYT-100 (MCYT 指纹子语料库)数据库。我们对数据库中提供的 x 轴和 y 轴上的离散时间序列位置进行了预处理,并采用了基于熵、复杂度、费雪信息和趋势等非参数量化指标的时间因果信息。研究还建议利用所获得的时间序列来评估这些量化指标,应用每个序列位置的一阶导数和二阶导数来评估动态行为,分别观察其速度和加速度状态。MCYT-100 数据库中的特征通过逻辑回归、支持向量机 (SVM)、随机森林和极梯度提升 (XGBoost) 进行分类。量化指标被用作训练分类器的输入特征。为了评估非参数量化器区分伪造和真实签名的能力和影响,我们使用了变量选择标准,例如:信息增益、方差分析和方差膨胀因子。分类器的性能通过分类误差度量(如特异性和曲线下面积)进行评估。结果表明,SVM 和 XGBoost 分类器的性能最佳。
Application of nonparametric quantifiers for online handwritten signature verification: A statistical learning approach
This work explores the use of nonparametric quantifiers in the signature verification problem of handwritten signatures. We used the MCYT‐100 (MCYT Fingerprint subcorpus) database, widely used in signature verification problems. The discrete‐time sequence positions in the x ‐axis and y‐axis provided in the database are preprocessed, and time causal information based on nonparametric quantifiers such as entropy, complexity, Fisher information, and trend are employed. The study also proposes to evaluate these quantifiers with the time series obtained, applying the first and second derivatives of each sequence position to evaluate the dynamic behavior by looking at their velocity and acceleration regimes, respectively. The signatures in the MCYT‐100 database are classified via Logistic Regression, Support Vector Machines (SVM), Random Forest, and Extreme Gradient Boosting (XGBoost). The quantifiers were used as input features to train the classifiers. To assess the ability and impact of nonparametric quantifiers to distinguish forgery and genuine signatures, we used variable selection criteria, such as: information gain, analysis of variance, and variance inflation factor. The performance of classifiers was evaluated by measures of classification error such as specificity and area under the curve. The results show that the SVM and XGBoost classifiers present the best performance.
期刊介绍:
Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce.
The focus of the journal is on papers which satisfy one or more of the following criteria:
Solve data analysis problems associated with massive, complex datasets
Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research.
Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models
Provide survey to prominent research topics.