Contract scheduling is a general technique that allows the design of systems with interruptible capabilities, given an algorithm that is not necessarily interruptible. Previous work on this topic has assumed that the interruption is a worst-case deadline that is unknown to the scheduler. In this work, we study new settings in which the scheduler has access to some imperfect prediction in regards to the interruption. In the first setting, which is inspired by recent advances in learning-enhanced algorithms, the prediction describes the time that the interruption occurs. The second setting introduces a new model in which predictions are elicited as responses to a number of binary queries. For both settings, we investigate trade-offs between the robustness (i.e., the worst-case performance of the schedule if the prediction is generated adversarially) and the consistency (i.e., the performance assuming that the prediction is error-free). We also establish results on the performance of the schedules as a function of the prediction error.
{"title":"Contract Scheduling with Predictions","authors":"Spyros Angelopoulos, Shahin Kamali","doi":"10.1613/jair.1.14117","DOIUrl":"https://doi.org/10.1613/jair.1.14117","url":null,"abstract":"Contract scheduling is a general technique that allows the design of systems with interruptible capabilities, given an algorithm that is not necessarily interruptible. Previous work on this topic has assumed that the interruption is a worst-case deadline that is unknown to the scheduler. In this work, we study new settings in which the scheduler has access to some imperfect prediction in regards to the interruption. In the first setting, which is inspired by recent advances in learning-enhanced algorithms, the prediction describes the time that the interruption occurs. The second setting introduces a new model in which predictions are elicited as responses to a number of binary queries. For both settings, we investigate trade-offs between the robustness (i.e., the worst-case performance of the schedule if the prediction is generated adversarially) and the consistency (i.e., the performance assuming that the prediction is error-free). We also establish results on the performance of the schedules as a function of the prediction error.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136221754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, John Burden, Pablo Antonio Moreno Casares, B. S. Loe, Roi Reichart, Seán Ó hÉigeartaigh, A. Korhonen, J. Hernández-Orallo
Even with obvious deficiencies, large prompt-commanded multimodal models are proving to be flexible cognitive tools representing an unprecedented generality. But the directness, diversity, and degree of user interaction create a distinctive “human-centred generality” (HCG), rather than a fully autonomous one. HCG implies that —for a specific user— a system is only as general as it is effective for the user’s relevant tasks and their prevalent ways of prompting. A human-centred evaluation of general-purpose AI systems therefore needs to reflect the personal nature of interaction, tasks and cognition. We argue that the best way to understand these systems is as highly-coupled cognitive extenders, and to analyse the bidirectional cognitive adaptations between them and humans. In this paper, we give a formulation of HCG, as well as a high-level overview of the elements and trade-offs involved in the prompting process. We end the paper by outlining some essential research questions and suggestions for improving evaluation practices, which we envision as characteristic for the evaluation of general artificial intelligence in the future. This paper appears in the AI & Society track.
{"title":"Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models","authors":"Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, John Burden, Pablo Antonio Moreno Casares, B. S. Loe, Roi Reichart, Seán Ó hÉigeartaigh, A. Korhonen, J. Hernández-Orallo","doi":"10.1613/jair.1.14157","DOIUrl":"https://doi.org/10.1613/jair.1.14157","url":null,"abstract":"Even with obvious deficiencies, large prompt-commanded multimodal models are proving to be flexible cognitive tools representing an unprecedented generality. But the directness, diversity, and degree of user interaction create a distinctive “human-centred generality” (HCG), rather than a fully autonomous one. HCG implies that —for a specific user— a system is only as general as it is effective for the user’s relevant tasks and their prevalent ways of prompting. A human-centred evaluation of general-purpose AI systems therefore needs to reflect the personal nature of interaction, tasks and cognition. We argue that the best way to understand these systems is as highly-coupled cognitive extenders, and to analyse the bidirectional cognitive adaptations between them and humans. In this paper, we give a formulation of HCG, as well as a high-level overview of the elements and trade-offs involved in the prompting process. We end the paper by outlining some essential research questions and suggestions for improving evaluation practices, which we envision as characteristic for the evaluation of general artificial intelligence in the future.\u0000This paper appears in the AI & Society track.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"11 1","pages":"377-394"},"PeriodicalIF":5.0,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75840088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Xu, Shuxing Li, Rui Yang, Chun Yuan, Lei Han
Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challenging problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past experiences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of samples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.
{"title":"Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization","authors":"Jiawei Xu, Shuxing Li, Rui Yang, Chun Yuan, Lei Han","doi":"10.1613/jair.1.14398","DOIUrl":"https://doi.org/10.1613/jair.1.14398","url":null,"abstract":"Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challenging problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past experiences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of samples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"98 1","pages":"355-376"},"PeriodicalIF":5.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73838875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many important problems in AI, among them #SAT, parameter learning and probabilistic inference go beyond the classical satisfiability problem. Here, instead of finding a solution we are interested in a quantity associated with the set of solutions, such as the number of solutions, the optimal solution or the probability that a query holds in a solution. To model such quantitative problems in a uniform manner, a number of frameworks, e.g. Algebraic Model Counting and Semiring-based Constraint Satisfaction Problems, employ what we call the semiring paradigm. In the latter the abstract algebraic structure of the semiring serves as a means of parameterizing the problem definition, thus allowing for different modes of quantitative computations by choosing different semirings. While efficiently solvable cases have been widely studied, a systematic study of the computational complexity of such problems depending on the semiring parameter is missing. In this work, we characterize the latter by NP(R), a novel generalization of NP over semiring R, and obtain NP(R)-completeness results for a selection of semiring frameworks. To obtain more tangible insights into the hardness of NP(R), we link it to well-known complexity classes from the literature. Interestingly, we manage to connect the computational hardness to properties of the semiring. Using this insight, we see that, on the one hand, NP(R) is always at least as hard as NP or ModpP depending on the semiring R and in general unlikely to be in FPSPACEpoly. On the other hand, for broad subclasses of semirings relevant in practice we can employ reductions to NP, ModpP and #P. These results show that in many cases solutions are only mildly harder to compute than functions in NP, ModpP and #P, give us new insights into how problems that involve counting on semirings can be approached, and provide a means of assessing whether an algorithm is appropriate for a given class of problems.
{"title":"Semiring Reasoning Frameworks in AI and Their Computational Complexity","authors":"Thomas Eiter, Rafael Kiesel","doi":"10.1613/jair.1.13970","DOIUrl":"https://doi.org/10.1613/jair.1.13970","url":null,"abstract":"Many important problems in AI, among them #SAT, parameter learning and probabilistic inference go beyond the classical satisfiability problem. Here, instead of finding a solution we are interested in a quantity associated with the set of solutions, such as the number of solutions, the optimal solution or the probability that a query holds in a solution. To model such quantitative problems in a uniform manner, a number of frameworks, e.g. Algebraic Model Counting and Semiring-based Constraint Satisfaction Problems, employ what we call the semiring paradigm. In the latter the abstract algebraic structure of the semiring serves as a means of parameterizing the problem definition, thus allowing for different modes of quantitative computations by choosing different semirings. While efficiently solvable cases have been widely studied, a systematic study of the computational complexity of such problems depending on the semiring parameter is missing. In this work, we characterize the latter by NP(R), a novel generalization of NP over semiring R, and obtain NP(R)-completeness results for a selection of semiring frameworks. To obtain more tangible insights into the hardness of NP(R), we link it to well-known complexity classes from the literature. Interestingly, we manage to connect the computational hardness to properties of the semiring. Using this insight, we see that, on the one hand, NP(R) is always at least as hard as NP or ModpP depending on the semiring R and in general unlikely to be in FPSPACEpoly. On the other hand, for broad subclasses of semirings relevant in practice we can employ reductions to NP, ModpP and #P. These results show that in many cases solutions are only mildly harder to compute than functions in NP, ModpP and #P, give us new insights into how problems that involve counting on semirings can be approached, and provide a means of assessing whether an algorithm is appropriate for a given class of problems.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"84 1","pages":"207-293"},"PeriodicalIF":5.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83838050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Chris Amato
Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.
{"title":"On Centralized Critics in Multi-Agent Reinforcement Learning","authors":"Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Chris Amato","doi":"10.1613/jair.1.14386","DOIUrl":"https://doi.org/10.1613/jair.1.14386","url":null,"abstract":"Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"99 1","pages":"295-354"},"PeriodicalIF":5.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79256386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A prominent strand of work in formal semantics investigates the ways in which human languages quantify the elements of a set, as when we say All A are B, Few A are B, and so on. Building on a growing body of empirical studies that shed light on the meaning and the use of quantifiers, we extend this line of work by computationally modelling how human speakers textually describe complex scenes in which quantitative relations play an important role. To this end, we conduct a series of elicitation experiments in which human speakers were asked to perform a linguistic task that invites the use of quantified expressions. The experiments result in a corpus, called QTUNA, made up of short texts that contain a large variety of quantified expressions. We analyse QTUNA, summarise our findings, and explain how we design computational models of human quantifier use accordingly. Finally, we evaluate these models in accordance with QTUNA.
{"title":"Computational Modelling of Quantifier Use: Corpus, Models, and Evaluation","authors":"Guanyi Chen, Kees van Deemter","doi":"10.1613/jair.1.13899","DOIUrl":"https://doi.org/10.1613/jair.1.13899","url":null,"abstract":"A prominent strand of work in formal semantics investigates the ways in which human languages quantify the elements of a set, as when we say All A are B, Few A are B, and so on. Building on a growing body of empirical studies that shed light on the meaning and the use of quantifiers, we extend this line of work by computationally modelling how human speakers textually describe complex scenes in which quantitative relations play an important role. To this end, we conduct a series of elicitation experiments in which human speakers were asked to perform a linguistic task that invites the use of quantified expressions. The experiments result in a corpus, called QTUNA, made up of short texts that contain a large variety of quantified expressions. We analyse QTUNA, summarise our findings, and explain how we design computational models of human quantifier use accordingly. Finally, we evaluate these models in accordance with QTUNA.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"14 1","pages":"167-206"},"PeriodicalIF":5.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75224210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks. As humans tend to use objects in many different ways depending on the scene and the objects’ availability, learning object affordances in everyday-life scenarios is a challenging task, particularly in the presence of an open set of interactions and objects. We address the problem of affordance categorization for class-agnostic objects with an open set of interactions; we achieve this by learning similarities between object interactions in an unsupervised way and thus inducing clusters of object affordances. A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs (AGs), which abstract from the continuous representation of spatio-temporal interactions in RGB-D videos. These AGs are clustered to obtain groups of objects with similar affordances. Our experiments in a real-world scenario demonstrate that our method learns to create object affordance clusters with a high V-measure even in cluttered scenes. The proposed approach handles object occlusions by capturing effectively possible interactions and without imposing any object or scene constraints.
{"title":"Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings","authors":"Alexia Toumpa, Anthony G. Cohn","doi":"10.1613/jair.1.13253","DOIUrl":"https://doi.org/10.1613/jair.1.13253","url":null,"abstract":"Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks. As humans tend to use objects in many different ways depending on the scene and the objects’ availability, learning object affordances in everyday-life scenarios is a challenging task, particularly in the presence of an open set of interactions and objects. We address the problem of affordance categorization for class-agnostic objects with an open set of interactions; we achieve this by learning similarities between object interactions in an unsupervised way and thus inducing clusters of object affordances. A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs (AGs), which abstract from the continuous representation of spatio-temporal interactions in RGB-D videos. These AGs are clustered to obtain groups of objects with similar affordances. Our experiments in a real-world scenario demonstrate that our method learns to create object affordance clusters with a high V-measure even in cluttered scenes. The proposed approach handles object occlusions by capturing effectively possible interactions and without imposing any object or scene constraints.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135962030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shikhar Tuli, Bhishma Dedhia, Shreshth Tuli, Niraj K. Jha
The existence of a plethora of language models makes the problem of selecting the best one for a custom task challenging. Most state-of-the-art methods leverage transformer-based models (e.g., BERT) or their variants. However, training such models and exploring their hyperparameter space is computationally expensive. Prior work proposes several neural architecture search (NAS) methods that employ performance predictors (e.g., surrogate models) to address this issue; however, such works limit analysis to homogeneous models that use fixed dimensionality throughout the network. This leads to sub-optimal architectures. To address this limitation, we propose a suite of heterogeneous and flexible models, namely FlexiBERT, that have varied encoder layers with a diverse set of possible operations and different hidden dimensions. For better-posed surrogate modeling in this expanded design space, we propose a new graph-similarity-based embedding scheme. We also propose a novel NAS policy, called BOSHNAS, that leverages this new scheme, Bayesian modeling, and second-order optimization, to quickly train and use a neural surrogate model to converge to the optimal architecture. A comprehensive set of experiments shows that the proposed policy, when applied to the FlexiBERT design space, pushes the performance frontier upwards compared to traditional models. FlexiBERT-Mini, one of our proposed models, has 3% fewer parameters than BERT-Mini and achieves 8.9% higher GLUE score. A FlexiBERT model with equivalent performance as the best homogeneous model has 2.6× smaller size. FlexiBERT-Large, another proposed model, attains state-of-the-art results, outperforming the baseline models by at least 5.7% on the GLUE benchmark.
{"title":"FlexiBERT: Are Current Transformer Architectures too Homogeneous and Rigid?","authors":"Shikhar Tuli, Bhishma Dedhia, Shreshth Tuli, Niraj K. Jha","doi":"10.1613/jair.1.13942","DOIUrl":"https://doi.org/10.1613/jair.1.13942","url":null,"abstract":"The existence of a plethora of language models makes the problem of selecting the best one for a custom task challenging. Most state-of-the-art methods leverage transformer-based models (e.g., BERT) or their variants. However, training such models and exploring their hyperparameter space is computationally expensive. Prior work proposes several neural architecture search (NAS) methods that employ performance predictors (e.g., surrogate models) to address this issue; however, such works limit analysis to homogeneous models that use fixed dimensionality throughout the network. This leads to sub-optimal architectures. To address this limitation, we propose a suite of heterogeneous and flexible models, namely FlexiBERT, that have varied encoder layers with a diverse set of possible operations and different hidden dimensions. For better-posed surrogate modeling in this expanded design space, we propose a new graph-similarity-based embedding scheme. We also propose a novel NAS policy, called BOSHNAS, that leverages this new scheme, Bayesian modeling, and second-order optimization, to quickly train and use a neural surrogate model to converge to the optimal architecture. A comprehensive set of experiments shows that the proposed policy, when applied to the FlexiBERT design space, pushes the performance frontier upwards compared to traditional models. FlexiBERT-Mini, one of our proposed models, has 3% fewer parameters than BERT-Mini and achieves 8.9% higher GLUE score. A FlexiBERT model with equivalent performance as the best homogeneous model has 2.6× smaller size. FlexiBERT-Large, another proposed model, attains state-of-the-art results, outperforming the baseline models by at least 5.7% on the GLUE benchmark.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135962522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conditional text generation is supposed to generate a fluent and coherent target text that is faithful to the source text. Although pre-trained models have achieved promising results, they still suffer from the crucial factuality problem. To deal with this issue, we propose a factuality-aware pretraining-finetuning framework named FactGen, which fully considers factuality during two training stages. Specifically, at the pre-training stage, we utilize a natural language inference model to construct target texts that are entailed by the source texts, resulting in a more factually consistent pre-training objective. Then, during the fine-tuning stage, we further introduce a contrastive ranking loss to encourage the model to generate factually consistent text with higher probability. Extensive experiments on three conditional text generation tasks demonstrate the effectiveness and generality of our training framework.
{"title":"FactGen: Faithful Text Generation by Factuality-aware Pre-training and Contrastive Ranking Fine-tuning","authors":"Zhibin Lan, Wei Li, Jinsong Su, Xinyan Xiao, Jiachen Liu, Wenhao Wu, Yajuan Lyu","doi":"10.1613/jair.1.14267","DOIUrl":"https://doi.org/10.1613/jair.1.14267","url":null,"abstract":"Conditional text generation is supposed to generate a fluent and coherent target text that is faithful to the source text. Although pre-trained models have achieved promising results, they still suffer from the crucial factuality problem. To deal with this issue, we propose a factuality-aware pretraining-finetuning framework named FactGen, which fully considers factuality during two training stages. Specifically, at the pre-training stage, we utilize a natural language inference model to construct target texts that are entailed by the source texts, resulting in a more factually consistent pre-training objective. Then, during the fine-tuning stage, we further introduce a contrastive ranking loss to encourage the model to generate factually consistent text with higher probability. Extensive experiments on three conditional text generation tasks demonstrate the effectiveness and generality of our training framework.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"38 1","pages":"1281-1303"},"PeriodicalIF":5.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86112408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jim Martin Catacora Ocana, R. Capobianco, D. Nardi
Deep reinforcement learning has achieved impressive results in recent years; yet, it is still severely troubled by environments showcasing sparse rewards. On top of that, not all sparse-reward environments are created equal, i.e., they can differ in the presence or absence of various features, with many of them having a great impact on learning. In light of this, the present work puts together a literature compilation of such environmental features, covering particularly those that have been taken advantage of and those that continue to pose a challenge. We expect this effort to provide guidance to researchers for assessing the generality of their new proposals and to call their attention to issues that remain unresolved when dealing with sparse rewards.
{"title":"An Overview of Environmental Features that Impact Deep Reinforcement Learning in Sparse-Reward Domains","authors":"Jim Martin Catacora Ocana, R. Capobianco, D. Nardi","doi":"10.1613/jair.1.14390","DOIUrl":"https://doi.org/10.1613/jair.1.14390","url":null,"abstract":"Deep reinforcement learning has achieved impressive results in recent years; yet, it is still severely troubled by environments showcasing sparse rewards. On top of that, not all sparse-reward environments are created equal, i.e., they can differ in the presence or absence of various features, with many of them having a great impact on learning. In light of this, the present work puts together a literature compilation of such environmental features, covering particularly those that have been taken advantage of and those that continue to pose a challenge. We expect this effort to provide guidance to researchers for assessing the generality of their new proposals and to call their attention to issues that remain unresolved when dealing with sparse rewards.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"12 1","pages":"1181-1218"},"PeriodicalIF":5.0,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79801463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}