首页 > 最新文献

ArXiv最新文献

英文 中文
Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana 有效和可扩展的数学支持:人工智能辅导员对加纳数学成绩影响的证据
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09809
Owen Henkel, Hannah Horne-Robinson, Nessie Kozhakhmetova, Amanda Lee
This study evaluates the impact of Rori, an AI powered conversational math tutor accessible via WhatsApp, on the math performance of approximately 1,000 students in grades 3-9 across 11 schools in Ghana. Each school was assigned to a treatment group or control group; the students in the control group continued their regular math instruction, while students in the treatment group engaged with Rori, for two 30-minute sessions per week over 8 months in addition to regular math instruction. We find that the math growth scores were substantially higher for the treatment group with an effect size of 0.37, and that the results were statistically significant (p<0.001). The fact that Rori works with basic mobile devices on low-bandwidth data networks gives the intervention strong potential to support personalized learning on other low-and-middle-income countries (LMICs), where laptop ownership and high-speed internet - prerequisite for many video-centered learning platforms - remain extremely limited. While the results should be interpreted judiciously, as they only report on year 1 of the intervention, and future research is necessary to better understand which conditions are necessary for successful implementation, they do suggest that chat-based tutoring solutions leveraging artificial intelligence could offer a costeffective approach to enhancing learning outcomes for millions of students globally.
本研究评估了通过 WhatsApp 访问的人工智能对话式数学辅导 Rori 对加纳 11 所学校约 1000 名三至九年级学生数学成绩的影响。每所学校被分配到治疗组和对照组;对照组的学生继续接受常规数学教学,而治疗组的学生则在 8 个月的时间里,除了常规数学教学外,每周还与 Rori 进行两次 30 分钟的对话。我们发现,治疗组学生的数学成绩大幅提高,效应大小为 0.37,而且结果具有统计学意义(P<0.001)。Rori 可在低带宽数据网络上使用基本的移动设备,这使得该干预措施在支持其他中低收入国家的个性化学习方面具有强大的潜力,因为在这些国家,笔记本电脑的拥有率和高速互联网(许多以视频为中心的学习平台的先决条件)仍然极为有限。虽然这些结果只报告了第一年的干预情况,而且未来的研究也需要更好地了解成功实施的必要条件,但这些结果确实表明,利用人工智能的聊天辅导解决方案可以为全球数百万学生提供一种具有成本效益的提高学习成绩的方法。
{"title":"Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana","authors":"Owen Henkel, Hannah Horne-Robinson, Nessie Kozhakhmetova, Amanda Lee","doi":"10.48550/arXiv.2402.09809","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09809","url":null,"abstract":"This study evaluates the impact of Rori, an AI powered conversational math tutor accessible via WhatsApp, on the math performance of approximately 1,000 students in grades 3-9 across 11 schools in Ghana. Each school was assigned to a treatment group or control group; the students in the control group continued their regular math instruction, while students in the treatment group engaged with Rori, for two 30-minute sessions per week over 8 months in addition to regular math instruction. We find that the math growth scores were substantially higher for the treatment group with an effect size of 0.37, and that the results were statistically significant (p<0.001). The fact that Rori works with basic mobile devices on low-bandwidth data networks gives the intervention strong potential to support personalized learning on other low-and-middle-income countries (LMICs), where laptop ownership and high-speed internet - prerequisite for many video-centered learning platforms - remain extremely limited. While the results should be interpreted judiciously, as they only report on year 1 of the intervention, and future research is necessary to better understand which conditions are necessary for successful implementation, they do suggest that chat-based tutoring solutions leveraging artificial intelligence could offer a costeffective approach to enhancing learning outcomes for millions of students globally.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"17 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial synchrony for free? New bounds for Byzantine agreement via a generic transformation across network models 部分同步免费?通过跨网络模型的通用转换实现拜占庭协议的新界限
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10059
P. Civit, M. A. Dzulfikar, S. Gilbert, R. Guerraoui, J. Komatovic, M. Vidigueira, I. Zablotchi
Byzantine consensus allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine consensus exchanges Omega(n^2) bits. In recent years, great advances have been made in deterministic Byzantine agreement for partially synchronous networks, with state-of-the-art cryptographic solutions achieving O(n^2 kappa) bits (where $kappa$ is the security parameter) and nearly matching the lower bound. In contrast, for synchronous networks, optimal solutions with O(n^2) bits, with no cryptography and the same failure tolerance, have been known for more than three decades. Can this gap in network models be closed? In this paper, we present Repeater, the first generic transformation of Byzantine agreement algorithms from synchrony to partial synchrony. Repeater is modular, relying on existing and novel algorithms for its sub-modules. With the right choice of modules, Repeater requires no additional cryptography, is optimally resilient (n = 3t+1, where t is the maximum number of failures) and, for constant-size inputs, preserves the worst-case per-process bit complexity of the transformed synchronous algorithm. Leveraging Repeater, we present the first partially synchronous algorithm that (1) achieves optimal bit complexity (O(n^2) bits), (2) resists a computationally unbounded adversary (no cryptography), and (3) is optimally-resilient (n = 3t+1), thus showing that the Dolev-Reischuk bound is tight in partial synchrony. Moreover, we adapt Repeater for long inputs, introducing several new algorithms with improved complexity and weaker (or completely absent) cryptographic assumptions.
拜占庭共识允许 n 个进程在任意失败的情况下决定一个共同值。开创性的 Dolev-Reischuk 定界指出,拜占庭共识的任何确定性解决方案都会交换欧米茄(n^2)比特。近年来,部分同步网络的确定性拜占庭协议取得了长足进步,最先进的加密解决方案实现了 O(n^2 kappa) 比特(其中 $kappa$ 是安全参数),并几乎与下界相匹配。与此相反,对于同步网络,在没有加密技术和相同故障容忍度的情况下,实现 O(n^2) 比特的最佳解决方案已经问世三十多年了。网络模型中的这一差距能否弥合?在本文中,我们提出了 Repeater,这是拜占庭协议算法从同步到部分同步的首次通用转换。Repeater 是模块化的,其子模块依赖于现有的和新颖的算法。通过正确选择模块,Repeater 不需要额外的加密技术,具有最佳弹性(n = 3t+1,其中 t 为最大故障次数),并且对于恒定大小的输入,保留了转换后同步算法的最坏情况下的每个进程比特复杂度。利用 Repeater,我们提出了第一种部分同步算法,该算法 (1) 实现了最佳比特复杂度(O(n^2) 比特),(2) 抵御了计算上无限制的对手(无密码学),(3) 具有最佳弹性(n = 3t+1),从而证明了 Dolev-Reischuk 约束在部分同步中是紧密的。此外,我们还对 Repeater 进行了调整,使其适用于长输入,并引入了几种新算法,这些算法的复杂度有所提高,加密假设也更弱(或完全不存在)。
{"title":"Partial synchrony for free? New bounds for Byzantine agreement via a generic transformation across network models","authors":"P. Civit, M. A. Dzulfikar, S. Gilbert, R. Guerraoui, J. Komatovic, M. Vidigueira, I. Zablotchi","doi":"10.48550/arXiv.2402.10059","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10059","url":null,"abstract":"Byzantine consensus allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine consensus exchanges Omega(n^2) bits. In recent years, great advances have been made in deterministic Byzantine agreement for partially synchronous networks, with state-of-the-art cryptographic solutions achieving O(n^2 kappa) bits (where $kappa$ is the security parameter) and nearly matching the lower bound. In contrast, for synchronous networks, optimal solutions with O(n^2) bits, with no cryptography and the same failure tolerance, have been known for more than three decades. Can this gap in network models be closed? In this paper, we present Repeater, the first generic transformation of Byzantine agreement algorithms from synchrony to partial synchrony. Repeater is modular, relying on existing and novel algorithms for its sub-modules. With the right choice of modules, Repeater requires no additional cryptography, is optimally resilient (n = 3t+1, where t is the maximum number of failures) and, for constant-size inputs, preserves the worst-case per-process bit complexity of the transformed synchronous algorithm. Leveraging Repeater, we present the first partially synchronous algorithm that (1) achieves optimal bit complexity (O(n^2) bits), (2) resists a computationally unbounded adversary (no cryptography), and (3) is optimally-resilient (n = 3t+1), thus showing that the Dolev-Reischuk bound is tight in partial synchrony. Moreover, we adapt Repeater for long inputs, introducing several new algorithms with improved complexity and weaker (or completely absent) cryptographic assumptions.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"12 24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion 使用 DDPM 反转技术进行零镜头无监督和基于文本的音频编辑
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10009
Hila Manor, T. Michaeli
Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our examples page in https://hilamanor.github.io/AudioEditing/ .
最近,利用大型预训练模型以零镜头方式编辑信号的技术在图像领域取得了飞速发展。然而,这一浪潮尚未波及音频领域。在本文中,我们探索了两种针对音频信号的零镜头编辑技术,它们在预训练的扩散模型上使用 DDPM 反演。第一种技术采用图像领域的技术,可进行基于文本的编辑。第二种是一种新颖的方法,可以在没有监督的情况下发现语义上有意义的编辑方向。当应用于音乐信号时,这种方法可以实现一系列音乐上有趣的修改,从控制特定乐器的参与到对旋律的即兴创作。示例和代码可在我们的示例页面 https://hilamanor.github.io/AudioEditing/ 上找到。
{"title":"Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion","authors":"Hila Manor, T. Michaeli","doi":"10.48550/arXiv.2402.10009","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10009","url":null,"abstract":"Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our examples page in https://hilamanor.github.io/AudioEditing/ .","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"30 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts 具有超长语境要点记忆功能的人类启发式阅读代理
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09727
Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John F. Canny, Ian Fischer
Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system that uses the advanced language capabilities of LLMs to (1) decide what content to store together in a memory episode, (2) compress those memory episodes into short episodic memories called gist memories, and (3) take actions to look up passages in the original text if ReadAgent needs to remind itself of relevant details to complete a task. We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: QuALITY, NarrativeQA, and QMSum. ReadAgent outperforms the baselines on all three tasks while extending the effective context window by 3-20x.
目前的大型语言模型(LLM)不仅受限于某些最大上下文长度,而且无法稳健地处理长输入。为了解决这些局限性,我们提出了 ReadAgent,这是一个 LLM 代理系统,在我们的实验中,它能将有效上下文长度提高 20 倍。受人类交互式阅读长文档方式的启发,我们将 ReadAgent 作为一个简单的提示系统来实现,该系统利用 LLM 的高级语言能力来:(1)决定将哪些内容一起存储在记忆片段中;(2)将这些记忆片段压缩成称为要点记忆的短小片段记忆;以及(3)在 ReadAgent 需要提醒自己相关细节以完成任务时,采取行动查找原文中的段落。我们使用检索方法、原始长语境和要点记忆对 ReadAgent 进行了基线评估。这些评估是在三个长文档阅读理解任务中进行的:QuALITY、NarrativeQA 和 QMSum。在所有三个任务中,ReadAgent 的表现都优于基线,同时将有效上下文窗口扩展了 3-20 倍。
{"title":"A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts","authors":"Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John F. Canny, Ian Fischer","doi":"10.48550/arXiv.2402.09727","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09727","url":null,"abstract":"Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system that uses the advanced language capabilities of LLMs to (1) decide what content to store together in a memory episode, (2) compress those memory episodes into short episodic memories called gist memories, and (3) take actions to look up passages in the original text if ReadAgent needs to remind itself of relevant details to complete a task. We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: QuALITY, NarrativeQA, and QMSum. ReadAgent outperforms the baselines on all three tasks while extending the effective context window by 3-20x.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention 利用锐度感知最小化和渠道明智关注释放时间序列预测中变压器的潜能
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10198
Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, I. Redko
Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by 14.33% on average, while having ~4 times fewer parameters. The code is available at https://github.com/romilbert/samformer.
基于变换器的架构在自然语言处理和计算机视觉方面取得了突破性的性能,但在多变量长期预测方面,它们仍然不如简单的线性基线。为了更好地理解这一现象,我们首先研究了一个玩具线性预测问题,结果表明,尽管变换器具有很强的表达能力,但却无法收敛到其真正的解决方案。我们进一步发现,变换器的注意力是造成这种低泛化能力的原因。基于这一见解,我们提出了一种浅层轻量级变换器模型,当使用锐度感知优化法进行优化时,该模型能成功摆脱局部极小值的困境。我们通过实证证明,这一结果适用于现实世界中所有常用的多变量时间序列数据集。特别是,SAMformer 比目前最先进的模型 TSMixer 平均高出 14.33%,而参数却少了约 4 倍。代码见 https://github.com/romilbert/samformer。
{"title":"Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention","authors":"Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, I. Redko","doi":"10.48550/arXiv.2402.10198","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10198","url":null,"abstract":"Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by 14.33% on average, while having ~4 times fewer parameters. The code is available at https://github.com/romilbert/samformer.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"8 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models EFUF:用于减轻多模态大型语言模型中的幻觉的高效细粒度非学习框架
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09801
Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai
Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.
在过去几年中,多模态大语言模型(MLLMs)吸引了越来越多的关注,但它们生成的描述仍可能包含相应图像中不存在的物体,这种现象被称为物体幻觉。为了消除幻觉,现有的方法是人工标注有幻觉和无幻觉的配对回答,然后采用各种配准算法来提高图像和文本之间的配准能力。然而,这些方法不仅在微调阶段需要大量计算资源,还需要昂贵的人工标注来构建配对算法所需的配对数据。为了解决这些问题,我们借鉴了 "解除学习"(unlearning)的思想,提出了一种高效的细粒度解除学习框架(EFUF),它无需配对数据就能消除幻觉。广泛的实验表明,我们的方法可以持续减少幻觉,同时保持生成质量,计算开销不大。我们的代码和数据集将公开发布。
{"title":"EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models","authors":"Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai","doi":"10.48550/arXiv.2402.09801","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09801","url":null,"abstract":"Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"16 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Flawed is ECE? An Analysis via Logit Smoothing 欧洲经委会有多大缺陷?对数平滑分析
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10046
Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov
Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.
非正式地讲,如果模型预测正确的概率与预测的置信度相匹配,那么该模型就是经过校准的。迄今为止,文献中最常用的校准测量方法是预期校准误差(ECE)。然而,最近的研究指出了 ECE 的缺点,例如它在预测因子空间中是不连续的。在这项工作中,我们要问:这些问题有多根本,它们对现有结果有什么影响?为此,我们完全描述了 ECE 在波兰空间上的一般概率度量的不连续性。然后,我们利用这些不连续性的性质,提出了一种新颖的连续、易于估计的误判度量,我们称之为 Logit 平滑 ECE (LS-ECE)。通过比较预先训练好的图像分类模型的 ECE 和 LS-ECE,我们在初步实验中发现,二进制 ECE 与 LS-ECE 非常接近,这表明 ECE 的理论缺陷在实践中是可以避免的。
{"title":"How Flawed is ECE? An Analysis via Logit Smoothing","authors":"Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov","doi":"10.48550/arXiv.2402.10046","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10046","url":null,"abstract":"Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"7 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules 使用自动学习翻译规则的系统级动态二进制翻译器
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09688
Jinhu Jiang, Chaoyi Liang, Rongchao Dong, Zhaohui Yang, Zhongjun Zhou, Wenwen Wang, P. Yew, Weihua Zhang
System-level emulators have been used extensively for system design, debugging and evaluation. They work by providing a system-level virtual machine to support a guest operating system (OS) running on a platform with the same or different native OS that uses the same or different instruction-set architecture. For such system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based DBT approach has shown a significantly improved performance with a higher quality of translated code using automatically learned translation rules. However, it has only been applied to user-level emulation, and not yet to system-level emulation. In this paper, we explore the feasibility of applying this approach to improve system-level emulation, and use QEMU to build a prototype. ... To achieve better performance, we leverage several optimizations that include coordination overhead reduction to reduce the overhead of each coordination, and coordination elimination and code scheduling to reduce the coordination frequency. Experimental results show that it can achieve an average of 1.36X speedup over QEMU 6.1 with negligible coordination overhead in the system emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on real-world applications.
系统级仿真器被广泛用于系统设计、调试和评估。系统级仿真器的工作原理是提供一个系统级虚拟机,以支持在使用相同或不同指令集架构的平台上运行的客户操作系统(OS)。对于这种系统级仿真,动态二进制转换(DBT)是核心技术之一。最近提出的一种基于学习的 DBT 方法显示,使用自动学习的翻译规则,性能显著提高,翻译代码的质量也更高。然而,这种方法只应用于用户级仿真,尚未应用于系统级仿真。在本文中,我们探讨了应用这种方法改进系统级仿真的可行性,并使用 QEMU 构建了一个原型。...为了获得更好的性能,我们采用了多项优化措施,包括减少协调开销以降低每次协调的开销,消除协调和代码调度以降低协调频率。实验结果表明,在以 SPEC CINT2006 为应用基准的系统仿真模式下,与 QEMU 6.1 相比,在协调开销可忽略不计的情况下,它的平均速度提高了 1.36 倍,在实际应用中提高了 1.15 倍。
{"title":"A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules","authors":"Jinhu Jiang, Chaoyi Liang, Rongchao Dong, Zhaohui Yang, Zhongjun Zhou, Wenwen Wang, P. Yew, Weihua Zhang","doi":"10.48550/arXiv.2402.09688","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09688","url":null,"abstract":"System-level emulators have been used extensively for system design, debugging and evaluation. They work by providing a system-level virtual machine to support a guest operating system (OS) running on a platform with the same or different native OS that uses the same or different instruction-set architecture. For such system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based DBT approach has shown a significantly improved performance with a higher quality of translated code using automatically learned translation rules. However, it has only been applied to user-level emulation, and not yet to system-level emulation. In this paper, we explore the feasibility of applying this approach to improve system-level emulation, and use QEMU to build a prototype. ... To achieve better performance, we leverage several optimizations that include coordination overhead reduction to reduce the overhead of each coordination, and coordination elimination and code scheduling to reduce the coordination frequency. Experimental results show that it can achieve an average of 1.36X speedup over QEMU 6.1 with negligible coordination overhead in the system emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on real-world applications.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"24 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation 文本定位:分解多概念图像,实现主题驱动的文本到图像生成
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09966
Junjie Shentu, Matthew Watson, N. A. Moubayed
Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models.
受试者驱动的文本到图像扩散模型使用户能够利用少量样本图像,根据预训练数据集中缺乏的新概念定制模型。然而,流行的主体驱动模型主要依赖于单一概念输入图像,在处理多概念输入图像时,在指定目标概念方面面临挑战。为此,我们引入了文本本地化文本到图像模型(Texual Localization)来处理多概念输入图像。在微调过程中,我们的方法采用了一种新颖的交叉注意引导来分解多个概念,在目标概念的视觉表示和文本提示中的标识符号之间建立了明显的联系。实验结果表明,在多概念输入图像的图像保真度和图像-文本对齐方面,我们的方法优于或相当于基线模型。与 "自定义扩散 "相比,在单概念和多概念生成中,我们的方法在硬引导下的 CLIP-I 分数分别高出 7.04% 和 8.13%,CLIP-T 分数分别高出 2.22% 和 5.85%。值得注意的是,我们的方法生成的交叉注意图与生成图像中的目标概念一致,这是现有模型所不具备的能力。
{"title":"Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation","authors":"Junjie Shentu, Matthew Watson, N. A. Moubayed","doi":"10.48550/arXiv.2402.09966","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09966","url":null,"abstract":"Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"18 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming TEXTRON:通过数据编程进行弱监督多语言文本检测
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09811
Dhruv Kudale, Badri Vishal Kasuba, Venkatapathy Subramanian, P. Chaudhuri, Ganesh Ramakrishnan
Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text. Code and dataset has been made available at https://github.com/IITB-LEAP-OCR/TEXTRON
最近几种基于深度学习(DL)的技术在基于图像的多语言文本检测方面表现相当出色。然而,它们的性能在很大程度上取决于训练数据的可用性和质量。页面级文档图像种类繁多,包含多种模式、语言、字体和布局的信息。这使得文本检测成为计算机视觉(CV)领域的一个挑战性问题,尤其是对于低资源或手写语言。此外,用于文本检测的单词级标记数据非常稀缺,特别是对于多语言环境和包含印刷和手写文本的印度脚本。按照惯例,印度文字文本检测需要在大量标注数据上训练 DL 模型,但据我们所知,目前还没有相关的数据集。对这些数据进行人工标注需要大量的时间、精力和专业知识。为了解决这个问题,我们提出了基于数据编程的 TEXTRON 方法,用户可以将各种文本检测方法插入基于弱监督的学习框架中。我们可以将这种多语言文本检测方法视为不同的基于 CV 的技术和 DL 方法的集合。TEXTRON 可以利用在大量语言数据上预先训练好的 DL 模型的预测结果,结合基于 CV 的方法来改进其他语言的文本检测。我们证明,尽管缺乏相应的标记数据,TEXTRON 仍能提高以印度语言编写的文档的检测性能。此外,通过广泛的实验,我们展示了我们的方法对当前最先进(SOTA)模型所带来的改进,尤其是在手写 Devanagari 文本方面。代码和数据集可从 https://github.com/IITB-LEAP-OCR/TEXTRON 获取。
{"title":"TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming","authors":"Dhruv Kudale, Badri Vishal Kasuba, Venkatapathy Subramanian, P. Chaudhuri, Ganesh Ramakrishnan","doi":"10.48550/arXiv.2402.09811","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09811","url":null,"abstract":"Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text. Code and dataset has been made available at https://github.com/IITB-LEAP-OCR/TEXTRON","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"27 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1