Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10009
Hila Manor, T. Michaeli
Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our examples page in https://hilamanor.github.io/AudioEditing/ .
{"title":"Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion","authors":"Hila Manor, T. Michaeli","doi":"10.48550/arXiv.2402.10009","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10009","url":null,"abstract":"Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our examples page in https://hilamanor.github.io/AudioEditing/ .","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"30 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09727
Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John F. Canny, Ian Fischer
Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system that uses the advanced language capabilities of LLMs to (1) decide what content to store together in a memory episode, (2) compress those memory episodes into short episodic memories called gist memories, and (3) take actions to look up passages in the original text if ReadAgent needs to remind itself of relevant details to complete a task. We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: QuALITY, NarrativeQA, and QMSum. ReadAgent outperforms the baselines on all three tasks while extending the effective context window by 3-20x.
{"title":"A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts","authors":"Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John F. Canny, Ian Fischer","doi":"10.48550/arXiv.2402.09727","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09727","url":null,"abstract":"Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system that uses the advanced language capabilities of LLMs to (1) decide what content to store together in a memory episode, (2) compress those memory episodes into short episodic memories called gist memories, and (3) take actions to look up passages in the original text if ReadAgent needs to remind itself of relevant details to complete a task. We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: QuALITY, NarrativeQA, and QMSum. ReadAgent outperforms the baselines on all three tasks while extending the effective context window by 3-20x.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10198
Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, I. Redko
Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by 14.33% on average, while having ~4 times fewer parameters. The code is available at https://github.com/romilbert/samformer.
{"title":"Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention","authors":"Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, I. Redko","doi":"10.48550/arXiv.2402.10198","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10198","url":null,"abstract":"Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by 14.33% on average, while having ~4 times fewer parameters. The code is available at https://github.com/romilbert/samformer.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"8 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09801
Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai
Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.
{"title":"EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models","authors":"Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai","doi":"10.48550/arXiv.2402.09801","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09801","url":null,"abstract":"Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"16 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10046
Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov
Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.
{"title":"How Flawed is ECE? An Analysis via Logit Smoothing","authors":"Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov","doi":"10.48550/arXiv.2402.10046","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10046","url":null,"abstract":"Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"7 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
System-level emulators have been used extensively for system design, debugging and evaluation. They work by providing a system-level virtual machine to support a guest operating system (OS) running on a platform with the same or different native OS that uses the same or different instruction-set architecture. For such system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based DBT approach has shown a significantly improved performance with a higher quality of translated code using automatically learned translation rules. However, it has only been applied to user-level emulation, and not yet to system-level emulation. In this paper, we explore the feasibility of applying this approach to improve system-level emulation, and use QEMU to build a prototype. ... To achieve better performance, we leverage several optimizations that include coordination overhead reduction to reduce the overhead of each coordination, and coordination elimination and code scheduling to reduce the coordination frequency. Experimental results show that it can achieve an average of 1.36X speedup over QEMU 6.1 with negligible coordination overhead in the system emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on real-world applications.
{"title":"A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules","authors":"Jinhu Jiang, Chaoyi Liang, Rongchao Dong, Zhaohui Yang, Zhongjun Zhou, Wenwen Wang, P. Yew, Weihua Zhang","doi":"10.48550/arXiv.2402.09688","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09688","url":null,"abstract":"System-level emulators have been used extensively for system design, debugging and evaluation. They work by providing a system-level virtual machine to support a guest operating system (OS) running on a platform with the same or different native OS that uses the same or different instruction-set architecture. For such system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based DBT approach has shown a significantly improved performance with a higher quality of translated code using automatically learned translation rules. However, it has only been applied to user-level emulation, and not yet to system-level emulation. In this paper, we explore the feasibility of applying this approach to improve system-level emulation, and use QEMU to build a prototype. ... To achieve better performance, we leverage several optimizations that include coordination overhead reduction to reduce the overhead of each coordination, and coordination elimination and code scheduling to reduce the coordination frequency. Experimental results show that it can achieve an average of 1.36X speedup over QEMU 6.1 with negligible coordination overhead in the system emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on real-world applications.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"24 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09966
Junjie Shentu, Matthew Watson, N. A. Moubayed
Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models.
{"title":"Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation","authors":"Junjie Shentu, Matthew Watson, N. A. Moubayed","doi":"10.48550/arXiv.2402.09966","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09966","url":null,"abstract":"Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"18 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text. Code and dataset has been made available at https://github.com/IITB-LEAP-OCR/TEXTRON
{"title":"TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming","authors":"Dhruv Kudale, Badri Vishal Kasuba, Venkatapathy Subramanian, P. Chaudhuri, Ganesh Ramakrishnan","doi":"10.48550/arXiv.2402.09811","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09811","url":null,"abstract":"Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text. Code and dataset has been made available at https://github.com/IITB-LEAP-OCR/TEXTRON","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"27 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09776
Aviv Yaish, S. Abramova, Rainer Bohme
We study the effect of public tallies on online elections, in a setting where voting is costly and voters are allowed to strategically time their votes. The strategic importance of choosing emph{when} to vote arises when votes are public, such as in online event scheduling polls (e.g., Doodle), or in blockchain governance mechanisms. In particular, there is a tension between voting early to influence future votes and waiting to observe interim results and avoid voting costs if the outcome has already been decided. Our study draws on empirical findings showing that"temporal"bandwagon effects occur when interim results are revealed to the electorate: late voters are more likely to vote for leading candidates. To capture this phenomenon, we analyze a novel model where the electorate consists of informed voters who have a preferred candidate, and uninformed swing voters who can be swayed according to the interim outcome at the time of voting. In our main results, we prove the existence of equilibria where both early and late voting occur with a positive probability, and we characterize conditions that lead to the appearance of"last minute"voting behavior, where all informed voters vote late.
{"title":"Strategic Vote Timing in Online Elections With Public Tallies","authors":"Aviv Yaish, S. Abramova, Rainer Bohme","doi":"10.48550/arXiv.2402.09776","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09776","url":null,"abstract":"We study the effect of public tallies on online elections, in a setting where voting is costly and voters are allowed to strategically time their votes. The strategic importance of choosing emph{when} to vote arises when votes are public, such as in online event scheduling polls (e.g., Doodle), or in blockchain governance mechanisms. In particular, there is a tension between voting early to influence future votes and waiting to observe interim results and avoid voting costs if the outcome has already been decided. Our study draws on empirical findings showing that\"temporal\"bandwagon effects occur when interim results are revealed to the electorate: late voters are more likely to vote for leading candidates. To capture this phenomenon, we analyze a novel model where the electorate consists of informed voters who have a preferred candidate, and uninformed swing voters who can be swayed according to the interim outcome at the time of voting. In our main results, we prove the existence of equilibria where both early and late voting occur with a positive probability, and we characterize conditions that lead to the appearance of\"last minute\"voting behavior, where all informed voters vote late.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"14 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article introduces GeoBotsVR, an easily accessible virtual reality game that combines elements of puzzle-solving with robotics learning and aims to cultivate interest and motivation in robotics, programming, and electronics among individuals with limited experience in these domains. The game allows players to build and customize a two-wheeled mobile robot using various robotic components and use their robot to solve various procedurally-generated puzzles in a diverse range of environments. An innovative aspect is the inclusion of a repair feature, requiring players to address randomly generated electronics and programming issues with their robot through hands-on manipulation. GeoBotsVR is designed to be immersive, replayable, and practical application-based, offering an enjoyable and accessible tool for beginners to acquaint themselves with robotics. The game simulates a hands-on learning experience and does not require prior technical knowledge, making it a potentially valuable resource for beginners to get an engaging introduction to the field of robotics.
{"title":"GeoBotsVR: A Robotics Learning Game for Beginners with Hands-on Learning Simulation","authors":"Syed Tanzim, Mubarrat","doi":"10.1145/3613905.3648111","DOIUrl":"https://doi.org/10.1145/3613905.3648111","url":null,"abstract":"This article introduces GeoBotsVR, an easily accessible virtual reality game that combines elements of puzzle-solving with robotics learning and aims to cultivate interest and motivation in robotics, programming, and electronics among individuals with limited experience in these domains. The game allows players to build and customize a two-wheeled mobile robot using various robotic components and use their robot to solve various procedurally-generated puzzles in a diverse range of environments. An innovative aspect is the inclusion of a repair feature, requiring players to address randomly generated electronics and programming issues with their robot through hands-on manipulation. GeoBotsVR is designed to be immersive, replayable, and practical application-based, offering an enjoyable and accessible tool for beginners to acquaint themselves with robotics. The game simulates a hands-on learning experience and does not require prior technical knowledge, making it a potentially valuable resource for beginners to get an engaging introduction to the field of robotics.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"9 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}