Pub Date : 2024-01-30DOI: 10.1007/s10472-024-09927-9
Francine Chen, Yanxia Zhang, Minh Nguyen, Matt Klenk, Charlene Wu
While most models of human choice are linear to ease interpretation, it is not clear whether linear models are good models of human decision making. And while prior studies have investigated how task conditions and group characteristics, such as personality or socio-demographic background, influence human decisions, no prior works have investigated how to use less personal information for choice prediction. We propose a deep learning model based on self-attention and cross-attention to model human decision making which takes into account both subject-specific information and task conditions. We show that our model can consistently predict human decisions more accurately than linear models and other baseline models while remaining interpretable. In addition, although a larger amount of subject specific information will generally lead to more accurate choice prediction, collecting more surveys to gather subject background information is a burden to subjects, as well as costly and time-consuming. To address this, we introduce a training scheme that reduces the number of surveys that must be collected in order to achieve more accurate predictions.
{"title":"Personalized choice prediction with less user information","authors":"Francine Chen, Yanxia Zhang, Minh Nguyen, Matt Klenk, Charlene Wu","doi":"10.1007/s10472-024-09927-9","DOIUrl":"10.1007/s10472-024-09927-9","url":null,"abstract":"<div><p>While most models of human choice are linear to ease interpretation, it is not clear whether linear models are good models of human decision making. And while prior studies have investigated how task conditions and group characteristics, such as personality or socio-demographic background, influence human decisions, no prior works have investigated how to use less personal information for choice prediction. We propose a deep learning model based on self-attention and cross-attention to model human decision making which takes into account both subject-specific information and task conditions. We show that our model can consistently predict human decisions more accurately than linear models and other baseline models while remaining interpretable. In addition, although a larger amount of subject specific information will generally lead to more accurate choice prediction, collecting more surveys to gather subject background information is a burden to subjects, as well as costly and time-consuming. To address this, we introduce a training scheme that reduces the number of surveys that must be collected in order to achieve more accurate predictions.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 6","pages":"1489 - 1509"},"PeriodicalIF":1.2,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10472-024-09927-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139648618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.1007/s10472-024-09928-8
Dmitry Semenov, Alexander Koldanov, Petr Koldanov, Panos Pardalos
In this paper we propose a new notion of a clique reliability. The clique reliability is understood as the ratio of the number of statistically significant links in a clique to the number of edges of the clique. This notion relies on a recently proposed original technique for separating inferences about pairwise connections between vertices of a network into significant and admissible ones. In this paper, we propose an extension of this technique to the problem of clique detection. We propose a method of step-by-step construction of a clique with a given reliability. The results of constructing cliques with a given reliability using data on the returns of stocks included in the Dow Jones index are presented.
{"title":"Clique detection with a given reliability","authors":"Dmitry Semenov, Alexander Koldanov, Petr Koldanov, Panos Pardalos","doi":"10.1007/s10472-024-09928-8","DOIUrl":"https://doi.org/10.1007/s10472-024-09928-8","url":null,"abstract":"<p>In this paper we propose a new notion of a clique reliability. The clique reliability is understood as the ratio of the number of statistically significant links in a clique to the number of edges of the clique. This notion relies on a recently proposed original technique for separating inferences about pairwise connections between vertices of a network into significant and admissible ones. In this paper, we propose an extension of this technique to the problem of clique detection. We propose a method of step-by-step construction of a clique with a given reliability. The results of constructing cliques with a given reliability using data on the returns of stocks included in the Dow Jones index are presented.</p>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"208 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139578602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.1007/s10472-023-09913-7
Fernando Díaz-del-Río, Helena Molina-Abril, Pedro Real, Darian Onchis, Sergio Blanco-Trejo
Topological representations of binary digital images usually take into consideration different adjacency types between colors. Within the cubical-voxel 3D binary image context, we design an algorithm for computing the isotopic model of an image, called (6, 26)-Homological Region Adjacency Tree ((6, 26)-Hom-Tree). This algorithm is based on a flexible graph scaffolding at the inter-voxel level called Homological Spanning Forest model (HSF). Hom-Trees are edge-weighted trees in which each node is a maximally connected set of constant-value voxels, which is interpreted as a subtree of the HSF. This representation integrates and relates the homological information (connected components, tunnels and cavities) of the maximally connected regions of constant color using 6-adjacency and 26-adjacency for black and white voxels, respectively (the criteria most commonly used for 3D images). The Euler-Poincaré numbers (which may as well be computed by counting the number of cells of each dimension on a cubical complex) and the connected component labeling of the foreground and background of a given image can also be straightforwardly computed from its Hom-Trees. Being (I_D) a 3D binary well-composed image (where D is the set of black voxels), an almost fully parallel algorithm for constructing the Hom-Tree via HSF computation is implemented and tested here. If (I_D) has (m_1{times } m_2{times } m_3) voxels, the time complexity order of the reproducible algorithm is near (O(log (m_1{+}m_2{+}m_3))), under the assumption that a processing element is available for each cubical voxel. Strategies for using the compressed information of the Hom-Tree representation to distinguish two topologically different images having the same homological information (Betti numbers) are discussed here. The topological discriminatory power of the Hom-Tree and the low time complexity order of the proposed implementation guarantee its usability within machine learning methods for the classification and comparison of natural 3D images.
{"title":"Parallel homological calculus for 3D binary digital images","authors":"Fernando Díaz-del-Río, Helena Molina-Abril, Pedro Real, Darian Onchis, Sergio Blanco-Trejo","doi":"10.1007/s10472-023-09913-7","DOIUrl":"10.1007/s10472-023-09913-7","url":null,"abstract":"<div><p>Topological representations of binary digital images usually take into consideration different adjacency types between colors. Within the cubical-voxel 3D binary image context, we design an algorithm for computing the isotopic model of an image, called (<b>6</b>, <b>26</b>)-Homological Region Adjacency Tree ((<b>6</b>, <b>26</b>)-<i>Hom-Tree</i>). This algorithm is based on a flexible graph scaffolding at the inter-voxel level called Homological Spanning Forest model (HSF). <i>Hom-Trees</i> are edge-weighted trees in which each node is a maximally connected set of constant-value voxels, which is interpreted as a subtree of the HSF. This representation integrates and relates the homological information (connected components, tunnels and cavities) of the maximally connected regions of constant color using 6-adjacency and 26-adjacency for black and white voxels, respectively (the criteria most commonly used for 3D images). The Euler-Poincaré numbers (which may as well be computed by counting the number of cells of each dimension on a cubical complex) and the connected component labeling of the foreground and background of a given image can also be straightforwardly computed from its Hom-Trees. Being <span>(I_D)</span> a 3D binary well-composed image (where <i>D</i> is the set of black voxels), an almost fully parallel algorithm for constructing the <i>Hom-Tree</i> via HSF computation is implemented and tested here. If <span>(I_D)</span> has <span>(m_1{times } m_2{times } m_3)</span> voxels, the time complexity order of the reproducible algorithm is near <span>(O(log (m_1{+}m_2{+}m_3)))</span>, under the assumption that a processing element is available for each cubical voxel. Strategies for using the compressed information of the <i>Hom-Tree</i> representation to distinguish two topologically different images having the same homological information (Betti numbers) are discussed here. The topological discriminatory power of the <i>Hom-Tree</i> and the low time complexity order of the proposed implementation guarantee its usability within machine learning methods for the classification and comparison of natural 3<i>D</i> images.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 1","pages":"77 - 113"},"PeriodicalIF":1.2,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10472-023-09913-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139578597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider comparative dissimilarity relations on pairs on fuzzy description profiles, the latter providing a fuzzy set-based representation of pairs of objects. Such a relation expresses the idea of “no more dissimilar than” and is used by a decision maker when performing a case-based decision task under vague information. We first limit ourselves to those relations admitting a weighted (varvec{L}^p) distance representation, for which we provide an axiomatic characterization in case the relation is complete, transitive and defined on the entire space of pairs of fuzzy description profiles. Next, we switch to the more general class of comparative dissimilarity relations representable by a Choquet (varvec{L}^p) distance, parameterized by a completely alternating normalized capacity.
{"title":"Weighted and Choquet (L^p) distance representation of comparative dissimilarity relations on fuzzy description profiles","authors":"Giulianella Coletti, Davide Petturiti, Bernadette Bouchon-Meunier","doi":"10.1007/s10472-024-09924-y","DOIUrl":"10.1007/s10472-024-09924-y","url":null,"abstract":"<div><p>We consider comparative dissimilarity relations on pairs on fuzzy description profiles, the latter providing a fuzzy set-based representation of pairs of objects. Such a relation expresses the idea of “no more dissimilar than” and is used by a decision maker when performing a case-based decision task under vague information. We first limit ourselves to those relations admitting a weighted <span>(varvec{L}^p)</span> distance representation, for which we provide an axiomatic characterization in case the relation is complete, transitive and defined on the entire space of pairs of fuzzy description profiles. Next, we switch to the more general class of comparative dissimilarity relations representable by a Choquet <span>(varvec{L}^p)</span> distance, parameterized by a completely alternating normalized capacity.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 6","pages":"1407 - 1436"},"PeriodicalIF":1.2,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10472-024-09924-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139562232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-19DOI: 10.1007/s10472-024-09922-0
Dimitrios I. Diochnos, Martin Charles Golumbic, Frederick Hoffman
{"title":"ISAIM-2022: international symposium on artificial intelligence and mathematics","authors":"Dimitrios I. Diochnos, Martin Charles Golumbic, Frederick Hoffman","doi":"10.1007/s10472-024-09922-0","DOIUrl":"10.1007/s10472-024-09922-0","url":null,"abstract":"","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 1","pages":"1 - 4"},"PeriodicalIF":1.2,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139611902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-19DOI: 10.1007/s10472-023-09919-1
Yitzchak Shmalo
We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in (mathbb {R}^n), this doubling condition is formulated using slabs in (mathbb {R}^n) and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set T that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time (t_0) will have high accuracy for all time (t>t_0). Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. We demonstrate how to numerically implement a simplified version of this uniform doubling condition on a dataset and apply it to achieve stability of accuracy using a few model examples. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU.
{"title":"Stability of accuracy for the training of DNNs via the uniform doubling condition","authors":"Yitzchak Shmalo","doi":"10.1007/s10472-023-09919-1","DOIUrl":"10.1007/s10472-023-09919-1","url":null,"abstract":"<div><p>We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in <span>(mathbb {R}^n)</span>, this doubling condition is formulated using slabs in <span>(mathbb {R}^n)</span> and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set <i>T</i> that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time <span>(t_0)</span> will have high accuracy for all time <span>(t>t_0)</span>. Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. We demonstrate how to numerically implement a simplified version of this uniform doubling condition on a dataset and apply it to achieve stability of accuracy using a few model examples. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 2","pages":"439 - 483"},"PeriodicalIF":1.2,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-19DOI: 10.1007/s10472-024-09923-z
Valentin E. Brimkov
{"title":"Combinatorial and geometric problems in imaging sciences","authors":"Valentin E. Brimkov","doi":"10.1007/s10472-024-09923-z","DOIUrl":"10.1007/s10472-024-09923-z","url":null,"abstract":"","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 1","pages":"5 - 6"},"PeriodicalIF":1.2,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139525436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-13DOI: 10.1007/s10472-023-09917-3
Pranjal Awasthi, Corinna Cortes, Mehryar Mohri
We study a problem of best-effort adaptation motivated by several applications and considerations, which consists of determining an accurate predictor for a target domain, for which a moderate amount of labeled samples are available, while leveraging information from another domain for which substantially more labeled samples are at one’s disposal. We present a new and general discrepancy-based theoretical analysis of sample reweighting methods, including bounds holding uniformly over the weights. We show how these bounds can guide the design of learning algorithms that we discuss in detail. We further show that our learning guarantees and algorithms provide improved solutions for standard domain adaptation problems, for which few labeled data or none are available from the target domain. We finally report the results of a series of experiments demonstrating the effectiveness of our best-effort adaptation and domain adaptation algorithms, as well as comparisons with several baselines. We also discuss how our analysis can benefit the design of principled solutions for fine-tuning.
{"title":"Best-effort adaptation","authors":"Pranjal Awasthi, Corinna Cortes, Mehryar Mohri","doi":"10.1007/s10472-023-09917-3","DOIUrl":"10.1007/s10472-023-09917-3","url":null,"abstract":"<div><p>We study a problem of <i>best-effort adaptation</i> motivated by several applications and considerations, which consists of determining an accurate predictor for a target domain, for which a moderate amount of labeled samples are available, while leveraging information from another domain for which substantially more labeled samples are at one’s disposal. We present a new and general discrepancy-based theoretical analysis of sample reweighting methods, including bounds holding uniformly over the weights. We show how these bounds can guide the design of learning algorithms that we discuss in detail. We further show that our learning guarantees and algorithms provide improved solutions for standard domain adaptation problems, for which few labeled data or none are available from the target domain. We finally report the results of a series of experiments demonstrating the effectiveness of our best-effort adaptation and domain adaptation algorithms, as well as comparisons with several baselines. We also discuss how our analysis can benefit the design of principled solutions for <i>fine-tuning</i>.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 2","pages":"393 - 438"},"PeriodicalIF":1.2,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139465039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-30DOI: 10.1007/s10472-023-09920-8
Telmo Matos
In this paper, we consider three Relaxation Adaptive Memory Programming (RAMP) approaches for solving the Uncapacitated Facility Location Problem (UFLP), whose objective is to locate a set of facilities and allocate these facilities to all clients at minimum cost. Different levels of sophistication were implemented to measure the performance of the RAMP approach. In the simpler level, (Dual-) RAMP explores more intensively the dual side of the problem, incorporating a Lagrangean Relaxation and Subgradient Optimization with a simple Improvement Method on the primal side. In the most sophisticated level, RAMP combines a Dual-Ascent procedure on the dual side with a Scatter Search (SS) procedure on primal side, forming the Primal–Dual RAMP (PD-RAMP). The Dual-RAMP algorithm starts with (dual side) the dualization of the initial problem, and then a projection method projects the dual solutions into the primal solutions space. Next, (primal side) the projected solutions are improved through an improvement method. In the PD-RAMP algorithm, the SS procedure is incorporated in the primal side to carry out a more intensive exploration. The algorithm alternates between the dual and the primal side until a fixed number of iterations is achieved. Computational experiments on a standard testbed for the UFLP were conducted to assess the performance of all the RAMP algorithms.
{"title":"RAMP experiments in solving the uncapacitated facility location problem","authors":"Telmo Matos","doi":"10.1007/s10472-023-09920-8","DOIUrl":"10.1007/s10472-023-09920-8","url":null,"abstract":"<div><p>In this paper, we consider three Relaxation Adaptive Memory Programming (RAMP) approaches for solving the Uncapacitated Facility Location Problem (UFLP), whose objective is to locate a set of facilities and allocate these facilities to all clients at minimum cost. Different levels of sophistication were implemented to measure the performance of the RAMP approach. In the simpler level, (Dual-) RAMP explores more intensively the dual side of the problem, incorporating a Lagrangean Relaxation and Subgradient Optimization with a simple Improvement Method on the primal side. In the most sophisticated level, RAMP combines a Dual-Ascent procedure on the dual side with a Scatter Search (SS) procedure on primal side, forming the Primal–Dual RAMP (PD-RAMP). The Dual-RAMP algorithm starts with (dual side) the dualization of the initial problem, and then a projection method projects the dual solutions into the primal solutions space. Next, (primal side) the projected solutions are improved through an improvement method. In the PD-RAMP algorithm, the SS procedure is incorporated in the primal side to carry out a more intensive exploration. The algorithm alternates between the dual and the primal side until a fixed number of iterations is achieved. Computational experiments on a standard testbed for the UFLP were conducted to assess the performance of all the RAMP algorithms.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 2","pages":"485 - 504"},"PeriodicalIF":1.2,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139066200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-26DOI: 10.1007/s10472-023-09918-2
Abstract
This paper explores the inference of sentence analogies not restricted to the formal level. We introduce MaskPrompt, a prompt-based method that addresses the analogy task as masked analogy completion. This enables us to fine-tune, in a lightweight manner, pre-trained language models on the task of reconstructing masked spans in analogy prompts. We apply constraints which are approximations of the parallelogram view of analogy to construct a corpus of sentence analogies from textual entailment sentence pairs. In the constructed corpus, sentence analogies are characterized by their level of being formal, ranging from strict to loose. We apply MaskPrompt on this corpus and compare MaskPrompt with the basic fine-tuning paradigm. Our experiments show that MaskPrompt outperforms basic fine-tuning in solving analogies in terms of overall performance, with gains of over 2% in accuracy. Furthermore, we study the contribution of loose analogies, i.e., analogies relaxed on the formal aspect. When fine-tuning with a small number of them (several hundreds), the accuracy on strict analogies jumps from 82% to 99%. This demonstrates that loose analogies effectively capture implicit but coherent analogical regularities. We also use MaskPrompt with different schemes on masked content to optimize analogy solutions. The best masking scheme during fine-tuning is to mask any term: it exhibits the highest robustness in accuracy on all tested equivalent forms of analogies.
{"title":"Learning from masked analogies between sentences at multiple levels of formality","authors":"","doi":"10.1007/s10472-023-09918-2","DOIUrl":"https://doi.org/10.1007/s10472-023-09918-2","url":null,"abstract":"<h3>Abstract</h3> <p>This paper explores the inference of sentence analogies not restricted to the formal level. We introduce MaskPrompt, a prompt-based method that addresses the analogy task as masked analogy completion. This enables us to fine-tune, in a lightweight manner, pre-trained language models on the task of reconstructing masked spans in analogy prompts. We apply constraints which are approximations of the parallelogram view of analogy to construct a corpus of sentence analogies from textual entailment sentence pairs. In the constructed corpus, sentence analogies are characterized by their level of being formal, ranging from strict to loose. We apply MaskPrompt on this corpus and compare MaskPrompt with the basic fine-tuning paradigm. Our experiments show that MaskPrompt outperforms basic fine-tuning in solving analogies in terms of overall performance, with gains of over 2% in accuracy. Furthermore, we study the contribution of loose analogies, i.e., analogies relaxed on the formal aspect. When fine-tuning with a small number of them (several hundreds), the accuracy on strict analogies jumps from 82% to 99%. This demonstrates that loose analogies effectively capture implicit but coherent analogical regularities. We also use MaskPrompt with different schemes on masked content to optimize analogy solutions. The best masking scheme during fine-tuning is to mask any term: it exhibits the highest robustness in accuracy on all tested equivalent forms of analogies.</p>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}