Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10018
Ayelet C. Portnoy, Alejandro Cohen
In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than objectives, which efficiently gains from the correlated prior statistics structure. Our numerical results demonstrate that the proposed Multi-Stage GT (MSGT) algorithm can obtain the optimal Maximum A Posteriori (MAP) performance with feasible complexity in practical regimes, such as with COVID-19 and sparse signal recovery applications, and reduce in the scenarios tested the number of pooled tests by at least $25%$ compared to existing classical low complexity GT algorithms. Moreover, we analytically characterize the complexity of the proposed MSGT algorithm that guarantees its efficiency.
{"title":"Multi-Stage Algorithm for Group Testing with Prior Statistics","authors":"Ayelet C. Portnoy, Alejandro Cohen","doi":"10.48550/arXiv.2402.10018","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10018","url":null,"abstract":"In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than objectives, which efficiently gains from the correlated prior statistics structure. Our numerical results demonstrate that the proposed Multi-Stage GT (MSGT) algorithm can obtain the optimal Maximum A Posteriori (MAP) performance with feasible complexity in practical regimes, such as with COVID-19 and sparse signal recovery applications, and reduce in the scenarios tested the number of pooled tests by at least $25%$ compared to existing classical low complexity GT algorithms. Moreover, we analytically characterize the complexity of the proposed MSGT algorithm that guarantees its efficiency.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"22 26","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09894
Tao Long, Katy Ilonka Gero, Lydia B. Chilton
Generative AI brings novel and impressive abilities to help people in everyday tasks. There are many AI workflows that solve real and complex problems by chaining AI outputs together with human interaction. Although there is an undeniable lure of AI, it's uncertain how useful generative AI workflows are after the novelty wears off. Additionally, tools built with generative AI have the potential to be personalized and adapted quickly and easily, but do users take advantage of the potential to customize? We conducted a three-week longitudinal study with 12 users to understand the familiarization and customization of generative AI tools for science communication. Our study revealed that the familiarization phase lasts for 4.3 sessions, where users explore the capabilities of the workflow and which aspects they find useful. After familiarization, the perceived utility of the system is rated higher than before, indicating that the perceived utility of AI is not just a novelty effect. The increase in benefits mainly comes from end-users' ability to customize prompts, and thus appropriate the system to their own needs. This points to a future where generative AI systems can allow us to design for appropriation.
{"title":"Not Just Novelty: A Longitudinal Study on Utility and Customization of AI Workflows","authors":"Tao Long, Katy Ilonka Gero, Lydia B. Chilton","doi":"10.48550/arXiv.2402.09894","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09894","url":null,"abstract":"Generative AI brings novel and impressive abilities to help people in everyday tasks. There are many AI workflows that solve real and complex problems by chaining AI outputs together with human interaction. Although there is an undeniable lure of AI, it's uncertain how useful generative AI workflows are after the novelty wears off. Additionally, tools built with generative AI have the potential to be personalized and adapted quickly and easily, but do users take advantage of the potential to customize? We conducted a three-week longitudinal study with 12 users to understand the familiarization and customization of generative AI tools for science communication. Our study revealed that the familiarization phase lasts for 4.3 sessions, where users explore the capabilities of the workflow and which aspects they find useful. After familiarization, the perceived utility of the system is rated higher than before, indicating that the perceived utility of AI is not just a novelty effect. The increase in benefits mainly comes from end-users' ability to customize prompts, and thus appropriate the system to their own needs. This points to a future where generative AI systems can allow us to design for appropriation.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"29 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09792
S. Shrivastava, A. Biswas, S. Chakrabarty, G. Dash, V. Saraswat, U. Ganguly
Learning of deep neural networks (DNN) using Resistive Processing Unit (RPU) architecture is energy-efficient as it utilizes dedicated neuromorphic hardware and stochastic computation of weight updates for in-memory computing. Charge Trap Flash (CTF) devices can implement RPU-based weight updates in DNNs. However, prior work has shown that the weight updates (V_T) in CTF-based RPU are impacted by the non-ideal program time of CTF. The non-ideal program time is affected by two factors of CTF. Firstly, the effects of the number of input pulses (N) or pulse width (pw), and secondly, the gap between successive update pulses (t_gap) used for the stochastic computation of weight updates. Therefore, the impact of this non-ideal program time must be studied for neural network training simulations. In this study, Firstly, we propose a pulse-train design compensation technique to reduce the total error caused by non-ideal program time of CTF and stochastic variance of a network. Secondly, we simulate RPU-based DNN with non-ideal program time of CTF on MNIST and Fashion-MNIST datasets. We find that for larger N (~1000), learning performance approaches the ideal (software-level) training level and, therefore, is not much impacted by the choice of t_gap used to implement RPU-based weight updates. However, for lower N (<500), learning performance depends on T_gap of the pulses. Finally, we also performed an ablation study to isolate the causal factor of the improved learning performance. We conclude that the lower noise level in the weight updates is the most likely significant factor to improve the learning performance of DNN. Thus, our study attempts to compensate for the error caused by non-ideal program time and standardize the pulse length (N) and pulse gap (t_gap) specifications for CTF-based RPUs for accurate system-level on-chip training.
{"title":"System-level Impact of Non-Ideal Program-Time of Charge Trap Flash (CTF) on Deep Neural Network","authors":"S. Shrivastava, A. Biswas, S. Chakrabarty, G. Dash, V. Saraswat, U. Ganguly","doi":"10.48550/arXiv.2402.09792","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09792","url":null,"abstract":"Learning of deep neural networks (DNN) using Resistive Processing Unit (RPU) architecture is energy-efficient as it utilizes dedicated neuromorphic hardware and stochastic computation of weight updates for in-memory computing. Charge Trap Flash (CTF) devices can implement RPU-based weight updates in DNNs. However, prior work has shown that the weight updates (V_T) in CTF-based RPU are impacted by the non-ideal program time of CTF. The non-ideal program time is affected by two factors of CTF. Firstly, the effects of the number of input pulses (N) or pulse width (pw), and secondly, the gap between successive update pulses (t_gap) used for the stochastic computation of weight updates. Therefore, the impact of this non-ideal program time must be studied for neural network training simulations. In this study, Firstly, we propose a pulse-train design compensation technique to reduce the total error caused by non-ideal program time of CTF and stochastic variance of a network. Secondly, we simulate RPU-based DNN with non-ideal program time of CTF on MNIST and Fashion-MNIST datasets. We find that for larger N (~1000), learning performance approaches the ideal (software-level) training level and, therefore, is not much impacted by the choice of t_gap used to implement RPU-based weight updates. However, for lower N (<500), learning performance depends on T_gap of the pulses. Finally, we also performed an ablation study to isolate the causal factor of the improved learning performance. We conclude that the lower noise level in the weight updates is the most likely significant factor to improve the learning performance of DNN. Thus, our study attempts to compensate for the error caused by non-ideal program time and standardize the pulse length (N) and pulse gap (t_gap) specifications for CTF-based RPUs for accurate system-level on-chip training.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"20 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10137
Yinhong Liu, Yimai Fang, David Vandyke, Nigel Collier
In light of recent advances in large language models (LLMs), the expectations for the next generation of virtual assistants include enhanced naturalness and adaptability across diverse usage scenarios. However, the creation of high-quality annotated data for Task-Oriented Dialog (TOD) is recognized to be slow and costly. To address these challenges, we introduce Task-Oriented Automatic Dialogs (TOAD), a novel and scalable TOD dataset along with its automatic generation pipeline. The TOAD dataset simulates realistic app context interaction and provide a variety of system response style options. Two aspects of system response styles are considered, verbosity level and users' expression mirroring. We benchmark TOAD on two response generation tasks and the results show that modelling more verbose or responses without user expression mirroring is more challenging.
鉴于最近在大型语言模型(LLMs)方面取得的进展,人们对下一代虚拟助手的期望包括在各种使用场景中增强自然性和适应性。然而,为面向任务的对话(TOD)创建高质量的注释数据被认为是缓慢而昂贵的。为了应对这些挑战,我们推出了任务导向自动对话(TOAD)--一种新颖且可扩展的 TOD 数据集及其自动生成管道。TOAD 数据集模拟了真实的应用程序上下文交互,并提供了多种系统响应风格选项。我们考虑了系统响应风格的两个方面,即冗长程度和用户表达镜像。我们在两个响应生成任务中对 TOAD 进行了基准测试,结果表明,在没有用户表情镜像的情况下模拟更多的冗长响应或响应更具挑战性。
{"title":"TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles","authors":"Yinhong Liu, Yimai Fang, David Vandyke, Nigel Collier","doi":"10.48550/arXiv.2402.10137","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10137","url":null,"abstract":"In light of recent advances in large language models (LLMs), the expectations for the next generation of virtual assistants include enhanced naturalness and adaptability across diverse usage scenarios. However, the creation of high-quality annotated data for Task-Oriented Dialog (TOD) is recognized to be slow and costly. To address these challenges, we introduce Task-Oriented Automatic Dialogs (TOAD), a novel and scalable TOD dataset along with its automatic generation pipeline. The TOAD dataset simulates realistic app context interaction and provide a variety of system response style options. Two aspects of system response styles are considered, verbosity level and users' expression mirroring. We benchmark TOAD on two response generation tasks and the results show that modelling more verbose or responses without user expression mirroring is more challenging.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"16 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09716
Ece Gumusel, Kyrie Zhixuan Zhou, M. Sanfilippo
This study presents a unique framework that applies and extends Solove (2006)'s taxonomy to address privacy concerns in interactions with text-based AI chatbots. As chatbot prevalence grows, concerns about user privacy have heightened. While existing literature highlights design elements compromising privacy, a comprehensive framework is lacking. Through semi-structured interviews with 13 participants interacting with two AI chatbots, this study identifies 9 privacy harms and 9 privacy risks in text-based interactions. Using a grounded theory approach for interview and chatlog analysis, the framework examines privacy implications at various interaction stages. The aim is to offer developers, policymakers, and researchers a tool for responsible and secure implementation of conversational AI, filling the existing gap in addressing privacy issues associated with text-based AI chatbots.
{"title":"User Privacy Harms and Risks in Conversational AI: A Proposed Framework","authors":"Ece Gumusel, Kyrie Zhixuan Zhou, M. Sanfilippo","doi":"10.48550/arXiv.2402.09716","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09716","url":null,"abstract":"This study presents a unique framework that applies and extends Solove (2006)'s taxonomy to address privacy concerns in interactions with text-based AI chatbots. As chatbot prevalence grows, concerns about user privacy have heightened. While existing literature highlights design elements compromising privacy, a comprehensive framework is lacking. Through semi-structured interviews with 13 participants interacting with two AI chatbots, this study identifies 9 privacy harms and 9 privacy risks in text-based interactions. Using a grounded theory approach for interview and chatlog analysis, the framework examines privacy implications at various interaction stages. The aim is to offer developers, policymakers, and researchers a tool for responsible and secure implementation of conversational AI, filling the existing gap in addressing privacy issues associated with text-based AI chatbots.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"13 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09937
C. Carlet, Marko Ðurasevic, D. Jakobović, S. Picek, L. Mariot
Boolean functions are mathematical objects used in diverse applications. Different applications also have different requirements, making the research on Boolean functions very active. In the last 30 years, evolutionary algorithms have been shown to be a strong option for evolving Boolean functions in different sizes and with different properties. Still, most of those works consider similar settings and provide results that are mostly interesting from the evolutionary algorithm's perspective. This work considers the problem of evolving highly nonlinear Boolean functions in odd sizes. While the problem formulation sounds simple, the problem is remarkably difficult, and the related work is extremely scarce. We consider three solutions encodings and four Boolean function sizes and run a detailed experimental analysis. Our results show that the problem is challenging, and finding optimal solutions is impossible except for the smallest tested size. However, once we added local search to the evolutionary algorithm, we managed to find a Boolean function in nine inputs with nonlinearity 241, which, to our knowledge, had never been accomplished before with evolutionary algorithms.
{"title":"A Systematic Evaluation of Evolving Highly Nonlinear Boolean Functions in Odd Sizes","authors":"C. Carlet, Marko Ðurasevic, D. Jakobović, S. Picek, L. Mariot","doi":"10.48550/arXiv.2402.09937","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09937","url":null,"abstract":"Boolean functions are mathematical objects used in diverse applications. Different applications also have different requirements, making the research on Boolean functions very active. In the last 30 years, evolutionary algorithms have been shown to be a strong option for evolving Boolean functions in different sizes and with different properties. Still, most of those works consider similar settings and provide results that are mostly interesting from the evolutionary algorithm's perspective. This work considers the problem of evolving highly nonlinear Boolean functions in odd sizes. While the problem formulation sounds simple, the problem is remarkably difficult, and the related work is extremely scarce. We consider three solutions encodings and four Boolean function sizes and run a detailed experimental analysis. Our results show that the problem is challenging, and finding optimal solutions is impossible except for the smallest tested size. However, once we added local search to the evolutionary algorithm, we managed to find a Boolean function in nine inputs with nonlinearity 241, which, to our knowledge, had never been accomplished before with evolutionary algorithms.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"20 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09671
David A. Noever, Forrest McKee
This investigation reveals a novel exploit derived from PNG image file formats, specifically their alpha transparency layer, and its potential to fool multiple AI vision systems. Our method uses this alpha layer as a clandestine channel invisible to human observers but fully actionable by AI image processors. The scope tested for the vulnerability spans representative vision systems from Apple, Microsoft, Google, Salesforce, Nvidia, and Facebook, highlighting the attack's potential breadth. This vulnerability challenges the security protocols of existing and fielded vision systems, from medical imaging to autonomous driving technologies. Our experiments demonstrate that the affected systems, which rely on convolutional neural networks or the latest multimodal language models, cannot quickly mitigate these vulnerabilities through simple patches or updates. Instead, they require retraining and architectural changes, indicating a persistent hole in multimodal technologies without some future adversarial hardening against such vision-language exploits.
{"title":"Exploiting Alpha Transparency In Language And Vision-Based AI Systems","authors":"David A. Noever, Forrest McKee","doi":"10.48550/arXiv.2402.09671","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09671","url":null,"abstract":"This investigation reveals a novel exploit derived from PNG image file formats, specifically their alpha transparency layer, and its potential to fool multiple AI vision systems. Our method uses this alpha layer as a clandestine channel invisible to human observers but fully actionable by AI image processors. The scope tested for the vulnerability spans representative vision systems from Apple, Microsoft, Google, Salesforce, Nvidia, and Facebook, highlighting the attack's potential breadth. This vulnerability challenges the security protocols of existing and fielded vision systems, from medical imaging to autonomous driving technologies. Our experiments demonstrate that the affected systems, which rely on convolutional neural networks or the latest multimodal language models, cannot quickly mitigate these vulnerabilities through simple patches or updates. Instead, they require retraining and architectural changes, indicating a persistent hole in multimodal technologies without some future adversarial hardening against such vision-language exploits.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"17 15","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09660
Erasmo Purificato, Ludovico Boratto, Ernesto William De Luca
The integration of artificial intelligence (AI) into daily life, particularly through information retrieval and recommender systems, has necessitated advanced user modeling and profiling techniques to deliver personalized experiences. These techniques aim to construct accurate user representations based on the rich amounts of data generated through interactions with these systems. This paper presents a comprehensive survey of the current state, evolution, and future directions of user modeling and profiling research. We provide a historical overview, tracing the development from early stereotype models to the latest deep learning techniques, and propose a novel taxonomy that encompasses all active topics in this research area, including recent trends. Our survey highlights the paradigm shifts towards more sophisticated user profiling methods, emphasizing implicit data collection, multi-behavior modeling, and the integration of graph data structures. We also address the critical need for privacy-preserving techniques and the push towards explainability and fairness in user modeling approaches. By examining the definitions of core terminology, we aim to clarify ambiguities and foster a clearer understanding of the field by proposing two novel encyclopedic definitions of the main terms. Furthermore, we explore the application of user modeling in various domains, such as fake news detection, cybersecurity, and personalized education. This survey serves as a comprehensive resource for researchers and practitioners, offering insights into the evolution of user modeling and profiling and guiding the development of more personalized, ethical, and effective AI systems.
{"title":"User Modeling and User Profiling: A Comprehensive Survey","authors":"Erasmo Purificato, Ludovico Boratto, Ernesto William De Luca","doi":"10.48550/arXiv.2402.09660","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09660","url":null,"abstract":"The integration of artificial intelligence (AI) into daily life, particularly through information retrieval and recommender systems, has necessitated advanced user modeling and profiling techniques to deliver personalized experiences. These techniques aim to construct accurate user representations based on the rich amounts of data generated through interactions with these systems. This paper presents a comprehensive survey of the current state, evolution, and future directions of user modeling and profiling research. We provide a historical overview, tracing the development from early stereotype models to the latest deep learning techniques, and propose a novel taxonomy that encompasses all active topics in this research area, including recent trends. Our survey highlights the paradigm shifts towards more sophisticated user profiling methods, emphasizing implicit data collection, multi-behavior modeling, and the integration of graph data structures. We also address the critical need for privacy-preserving techniques and the push towards explainability and fairness in user modeling approaches. By examining the definitions of core terminology, we aim to clarify ambiguities and foster a clearer understanding of the field by proposing two novel encyclopedic definitions of the main terms. Furthermore, we explore the application of user modeling in various domains, such as fake news detection, cybersecurity, and personalized education. This survey serves as a comprehensive resource for researchers and practitioners, offering insights into the evolution of user modeling and profiling and guiding the development of more personalized, ethical, and effective AI systems.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"15 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10130
Theodora Kontogianni, Yuanwen Yue, Siyu Tang, Konrad Schindler
Despite continual learning's long and well-established academic history, its application in real-world scenarios remains rather limited. This paper contends that this gap is attributable to a misalignment between the actual challenges of continual learning and the evaluation protocols in use, rendering proposed solutions ineffective for addressing the complexities of real-world setups. We validate our hypothesis and assess progress to date, using a new 3D semantic segmentation benchmark, OCL-3DSS. We investigate various continual learning schemes from the literature by utilizing more realistic protocols that necessitate online and continual learning for dynamic, real-world scenarios (eg., in robotics and 3D vision applications). The outcomes are sobering: all considered methods perform poorly, significantly deviating from the upper bound of joint offline training. This raises questions about the applicability of existing methods in realistic settings. Our paper aims to initiate a paradigm shift, advocating for the adoption of continual learning methods through new experimental protocols that better emulate real-world conditions to facilitate breakthroughs in the field.
{"title":"Is Continual Learning Ready for Real-world Challenges?","authors":"Theodora Kontogianni, Yuanwen Yue, Siyu Tang, Konrad Schindler","doi":"10.48550/arXiv.2402.10130","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10130","url":null,"abstract":"Despite continual learning's long and well-established academic history, its application in real-world scenarios remains rather limited. This paper contends that this gap is attributable to a misalignment between the actual challenges of continual learning and the evaluation protocols in use, rendering proposed solutions ineffective for addressing the complexities of real-world setups. We validate our hypothesis and assess progress to date, using a new 3D semantic segmentation benchmark, OCL-3DSS. We investigate various continual learning schemes from the literature by utilizing more realistic protocols that necessitate online and continual learning for dynamic, real-world scenarios (eg., in robotics and 3D vision applications). The outcomes are sobering: all considered methods perform poorly, significantly deviating from the upper bound of joint offline training. This raises questions about the applicability of existing methods in realistic settings. Our paper aims to initiate a paradigm shift, advocating for the adoption of continual learning methods through new experimental protocols that better emulate real-world conditions to facilitate breakthroughs in the field.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"14 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09844
Quentin Gallou'edec, Edward Beeching, Cl'ement Romac, Emmanuel Dellandr'ea
The search for a general model that can operate seamlessly across multiple domains remains a key goal in machine learning research. The prevailing methodology in Reinforcement Learning (RL) typically limits models to a single task within a unimodal framework, a limitation that contrasts with the broader vision of a versatile, multi-domain model. In this paper, we present Jack of All Trades (JAT), a transformer-based model with a unique design optimized for handling sequential decision-making tasks and multimodal data types. The JAT model demonstrates its robust capabilities and versatility by achieving strong performance on very different RL benchmarks, along with promising results on Computer Vision (CV) and Natural Language Processing (NLP) tasks, all using a single set of weights. The JAT model marks a significant step towards more general, cross-domain AI model design, and notably, it is the first model of its kind to be fully open-sourced (see https://huggingface.co/jat-project/jat), including a pioneering general-purpose dataset.
寻找一种能在多个领域无缝运行的通用模型仍然是机器学习研究的一个关键目标。强化学习(RL)中的主流方法通常将模型限制在单模态框架内的单一任务中,这种局限性与多功能、多领域模型的广阔愿景形成了鲜明对比。在本文中,我们介绍了 Jack of All Trades (JAT),这是一种基于转换器的模型,其独特的设计针对处理顺序决策任务和多模态数据类型进行了优化。JAT 模型在不同的 RL 基准上都取得了很好的性能,在计算机视觉(CV)和自然语言处理(NLP)任务上也取得了可喜的成果,所有这些都使用了单组权重,从而展示了其强大的能力和多功能性。JAT 模型标志着向更通用的跨领域人工智能模型设计迈出了重要一步,值得注意的是,它是首个完全开源的同类模型(见 https://huggingface.co/jat-project/jat),包括一个开创性的通用数据集。
{"title":"Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent","authors":"Quentin Gallou'edec, Edward Beeching, Cl'ement Romac, Emmanuel Dellandr'ea","doi":"10.48550/arXiv.2402.09844","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09844","url":null,"abstract":"The search for a general model that can operate seamlessly across multiple domains remains a key goal in machine learning research. The prevailing methodology in Reinforcement Learning (RL) typically limits models to a single task within a unimodal framework, a limitation that contrasts with the broader vision of a versatile, multi-domain model. In this paper, we present Jack of All Trades (JAT), a transformer-based model with a unique design optimized for handling sequential decision-making tasks and multimodal data types. The JAT model demonstrates its robust capabilities and versatility by achieving strong performance on very different RL benchmarks, along with promising results on Computer Vision (CV) and Natural Language Processing (NLP) tasks, all using a single set of weights. The JAT model marks a significant step towards more general, cross-domain AI model design, and notably, it is the first model of its kind to be fully open-sourced (see https://huggingface.co/jat-project/jat), including a pioneering general-purpose dataset.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"30 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}