首页 > 最新文献

arXiv (Cornell University)最新文献

英文 中文
Feature emergence via margin maximization: case studies in algebraic tasks 通过边际最大化产生特征:代数任务中的案例研究
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07568
Morwani, Depen, Edelman, Benjamin L., Oncescu, Costin-Andrei, Zhao, Rosie, Kakade, Sham
Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
理解神经网络学习的内部表征是机器学习科学的一个基石挑战。虽然最近在理解神经网络如何实现特定目标函数的某些情况下取得了重大进展,但本文探讨了一个补充问题——为什么网络会达到特定的计算策略?我们的研究集中在模加法、稀疏奇偶和有限群运算的代数学习任务上。我们的主要理论发现分析表征了风格化神经网络为这些代数任务学习的特征。值得注意的是,我们的主要技术展示了如何单独使用边际最大化原则来充分指定网络学习的特征。具体来说,我们证明了训练后的网络利用傅里叶特征执行模加法,并使用与不可约群理论表示相对应的特征在一般群中执行组合,这与Nanda等人和Chughtai等人的经验观察密切相关。更一般地说,我们希望我们的技术可以帮助加深对神经网络为什么采用特定计算策略的理解。
{"title":"Feature emergence via margin maximization: case studies in algebraic\u0000 tasks","authors":"Morwani, Depen, Edelman, Benjamin L., Oncescu, Costin-Andrei, Zhao, Rosie, Kakade, Sham","doi":"10.48550/arxiv.2311.07568","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07568","url":null,"abstract":"Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring the Dialogue Comprehension Ability of Large Language Models 探索大型语言模型的对话理解能力
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07194
She, Shuaijie, Huang, Shujian, Wang, Xingyun, Zhou, Yanke, Chen, Jiajun
The recent emergence of large language models (LLMs) have attracted considerable attention. LLMs may interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. Without correct comprehension of the dialogue, the model may inevitably generate incorrect responses. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation with the help of the dialogue summarization task. Beside evaluating and analyzing the dialogue summarization performance (DIAC-Sum), we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-FactQA). Our evaluation shows that, on average, 27% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest evaluated model, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average accuracy of all evaluated LLMs is only 62.8%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still the most challenging problem for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data. The experimental results demonstrate that our method achieved an accuracy improvement of 8.9% on DIAC-FactQA.
最近出现的大型语言模型(llm)引起了相当大的关注。llm可能会以对话的形式与用户进行交互,并根据用户的指示生成响应,这自然需要对话理解能力。如果没有对对话的正确理解,模型可能不可避免地产生错误的响应。然而,对话理解是一种普遍的语言能力,很难直接评价。在这项工作中,我们建议在对话总结任务的帮助下进行评估。除了评估和分析对话摘要的性能(DIAC-Sum),我们还从生成的摘要中推导出事实问题,并将其用作更灵活的对话理解度量(DIAC-FactQA)。我们的评估显示,平均而言,法学硕士生成的摘要中有27%包含事实不一致。即使是评估最强的ChatGPT模型,其总结中也有16%的错误。在回答更具挑战性的事实性问题时,所有被评估法学硕士的平均准确率仅为62.8%。这两个结果都显示出严重的缺陷。详细分析表明,对对话主体/客体的理解仍然是法学硕士最具挑战性的问题。此外,为了激发和提高llm的对话理解能力,我们提出了一个自动构建多任务数据的微调范式。实验结果表明,该方法在DIAC-FactQA上的准确率提高了8.9%。
{"title":"Exploring the Dialogue Comprehension Ability of Large Language Models","authors":"She, Shuaijie, Huang, Shujian, Wang, Xingyun, Zhou, Yanke, Chen, Jiajun","doi":"10.48550/arxiv.2311.07194","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07194","url":null,"abstract":"The recent emergence of large language models (LLMs) have attracted considerable attention. LLMs may interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. Without correct comprehension of the dialogue, the model may inevitably generate incorrect responses. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation with the help of the dialogue summarization task. Beside evaluating and analyzing the dialogue summarization performance (DIAC-Sum), we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-FactQA). Our evaluation shows that, on average, 27% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest evaluated model, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average accuracy of all evaluated LLMs is only 62.8%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still the most challenging problem for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data. The experimental results demonstrate that our method achieved an accuracy improvement of 8.9% on DIAC-FactQA.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Slow Passage through a Saddle-Node Bifurcation in Discrete Dynamical Systems 离散动力系统的鞍节点分岔慢通过
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07242
Chu, Jay, Lin, Jun-Jie, Tsai, Je-Chiang
We study a discrete non-autonomous system whose autonomous counterpart (with the frozen bifurcation parameter) admits a saddle-node bifurcation, and in which the bifurcation parameter slowly changes in time and is characterized by a sweep rate constant $epsilon$. The discrete system is more appropriate for modeling realistic systems since only time series data is available. We show that in contrast to its autonomous counterpart, when the time mesh size $Delta t$ is less than the order $O(epsilon)$, there is a bifurcation delay as the bifurcation time-varying parameter is varied through the bifurcation point, and the delay is proportional to the two-thirds power of the sweep rate constant $epsilon$. This bifurcation delay is significant in various realistic systems since it allows one to take necessary action promptly before a sudden collapse or shift to different states. On the other hand, when the time mesh size $Delta t$ is larger than the order $o(epsilon)$, the dynamical behavior of the solution is dramatically changed before the bifurcation point. This behavior is not observed in the autonomous counterpart. Therefore, the dynamical behavior of the system strongly depends on the time mesh size. Finally. due to the very discrete feature of the system, there are no efficient tools for the analytical study of the system. Our approach is elementary and analytical.
我们研究了一个离散非自治系统,其自治对口(具有冻结分岔参数)允许一个鞍节点分岔,其中分岔参数随时间缓慢变化,并以扫描速率常数$epsilon$表征。由于只有时间序列数据可用,离散系统更适合于模拟现实系统。我们表明,与自治模型相反,当时间网格尺寸$Delta t$小于$O(epsilon)$阶时,随着分岔时变参数在分岔点上的变化,存在分岔延迟,并且延迟与扫描速率常数$epsilon$的三分之二次方成正比。这种分岔延迟在各种现实系统中是重要的,因为它允许人们在突然崩溃或转移到不同状态之前迅速采取必要的行动。另一方面,当时间网格尺寸$Delta t$大于阶$o(epsilon)$时,解在分岔点前的动力学行为发生显著变化。在自治对等体中没有观察到这种行为。因此,系统的动力学行为在很大程度上取决于时间网格大小。终于。由于系统的离散性,没有有效的分析研究工具。我们的方法是基本的和分析性的。
{"title":"Slow Passage through a Saddle-Node Bifurcation in Discrete Dynamical\u0000 Systems","authors":"Chu, Jay, Lin, Jun-Jie, Tsai, Je-Chiang","doi":"10.48550/arxiv.2311.07242","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07242","url":null,"abstract":"We study a discrete non-autonomous system whose autonomous counterpart (with the frozen bifurcation parameter) admits a saddle-node bifurcation, and in which the bifurcation parameter slowly changes in time and is characterized by a sweep rate constant $epsilon$. The discrete system is more appropriate for modeling realistic systems since only time series data is available. We show that in contrast to its autonomous counterpart, when the time mesh size $Delta t$ is less than the order $O(epsilon)$, there is a bifurcation delay as the bifurcation time-varying parameter is varied through the bifurcation point, and the delay is proportional to the two-thirds power of the sweep rate constant $epsilon$. This bifurcation delay is significant in various realistic systems since it allows one to take necessary action promptly before a sudden collapse or shift to different states. On the other hand, when the time mesh size $Delta t$ is larger than the order $o(epsilon)$, the dynamical behavior of the solution is dramatically changed before the bifurcation point. This behavior is not observed in the autonomous counterpart. Therefore, the dynamical behavior of the system strongly depends on the time mesh size. Finally. due to the very discrete feature of the system, there are no efficient tools for the analytical study of the system. Our approach is elementary and analytical.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers 传感二自由度抓取器的仿生抓取控制器
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07257
Lach, Luca, Lemaignan, Séverin, Ferro, Francesco, Ritter, Helge, Haschke, Robert
We present a holistic grasping controller, combining free-space position control and in-contact force-control for reliable grasping given uncertain object pose estimates. Employing tactile fingertip sensors, undesired object displacement during grasping is minimized by pausing the finger closing motion for individual joints on first contact until force-closure is established. While holding an object, the controller is compliant with external forces to avoid high internal object forces and prevent object damage. Gravity as an external force is explicitly considered and compensated for, thus preventing gravity-induced object drift. We evaluate the controller in two experiments on the TIAGo robot and its parallel-jaw gripper proving the effectiveness of the approach for robust grasping and minimizing object displacement. In a series of ablation studies, we demonstrate the utility of the individual controller components.
我们提出了一种整体抓取控制器,结合自由空间位置控制和接触力控制,在给定不确定物体姿态估计的情况下实现可靠抓取。采用触觉指尖传感器,通过在第一次接触时暂停单个关节的手指闭合运动,直到力闭合建立,从而最大限度地减少了抓取过程中不希望的物体位移。在握住物体时,控制器要承受一定的外力,避免物体内力过大,防止物体损坏。重力作为一种外力被明确地考虑和补偿,从而防止重力引起的物体漂移。我们在TIAGo机器人及其平行爪爪的两个实验中对控制器进行了评估,证明了该方法在鲁棒抓取和最小化目标位移方面的有效性。在一系列烧蚀研究中,我们展示了单个控制器组件的效用。
{"title":"Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers","authors":"Lach, Luca, Lemaignan, Séverin, Ferro, Francesco, Ritter, Helge, Haschke, Robert","doi":"10.48550/arxiv.2311.07257","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07257","url":null,"abstract":"We present a holistic grasping controller, combining free-space position control and in-contact force-control for reliable grasping given uncertain object pose estimates. Employing tactile fingertip sensors, undesired object displacement during grasping is minimized by pausing the finger closing motion for individual joints on first contact until force-closure is established. While holding an object, the controller is compliant with external forces to avoid high internal object forces and prevent object damage. Gravity as an external force is explicitly considered and compensated for, thus preventing gravity-induced object drift. We evaluate the controller in two experiments on the TIAGo robot and its parallel-jaw gripper proving the effectiveness of the approach for robust grasping and minimizing object displacement. In a series of ablation studies, we demonstrate the utility of the individual controller components.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the asymptotic of lottery numbers 关于彩票号码的渐近性
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07406
Sidorenko, Alexander
Let $L(n,k,r,p)$ denote the minimum number of $k$-subsets of an $n$-set such that all the $binom{n}{p}$ $p$-subsets are intersected by one of them in at least $r$ elements. The case $p=r$ corresponds to the covering numbers, while the case $k=r$ corresponds to the Tur'an numbers. In both cases, there exists a limit of $L(n,k,r,p) / binom{n}{r}$ as $ntoinfty$. We prove the existence of this limit in the general case.
设$L(n,k,r,p)$表示一个$n$ -集合的$k$ -子集的最小数量,使得所有的$binom{n}{p}$ - $p$ -子集在至少$r$个元素中被其中一个相交。案例$p=r$对应于覆盖号,而案例$k=r$对应于Turán号。在这两种情况下,都存在$L(n,k,r,p) / binom{n}{r}$和$ntoinfty$的限制。我们在一般情况下证明了这个极限的存在性。
{"title":"On the asymptotic of lottery numbers","authors":"Sidorenko, Alexander","doi":"10.48550/arxiv.2311.07406","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07406","url":null,"abstract":"Let $L(n,k,r,p)$ denote the minimum number of $k$-subsets of an $n$-set such that all the $binom{n}{p}$ $p$-subsets are intersected by one of them in at least $r$ elements. The case $p=r$ corresponds to the covering numbers, while the case $k=r$ corresponds to the Tur'an numbers. In both cases, there exists a limit of $L(n,k,r,p) / binom{n}{r}$ as $ntoinfty$. We prove the existence of this limit in the general case.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a covariant framework for post-Newtonian expansions for radiative sources 辐射源后牛顿展开的协变框架
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07546
Hartong, Jelle, Musaeus, Jørgen
We consider the classic problem of a compact fluid source that behaves non-relativistically and that radiates gravitational waves. The problem consists of determining the metric close to the source as well as far away from it. The non-relativistic nature of the source leads to a separation of scales resulting in an overlap region where both the $1/c$ and (multipolar) $G$-expansions are valid. Standard approaches to this problem (the Blanchet--Damour and the DIRE approach) use the harmonic gauge. We define a `post-Newtonian' class of gauges that admit a Newtonian regime in inertial coordinates. In this paper we set up a formalism to solve for the metric for any post-Newtonian gauge choice. Our methods are based on previous work on the covariant theory of non-relativistic gravity (a $1/c$-expansion of general relativity that uses post-Newton-Cartan variables). At the order of interest in the $1/c$ and $G$-expansions we split the variables into two sets: transverse and longitudinal. We show that for the transverse variables the problem can be reduced to inverting Laplacian and d'Alembertian operators on their respective domains subject to appropriate boundary conditions. The latter are regularity in the interior and asymptotic flatness with a Sommerfeld no-incoming radiation condition imposed at past null infinity. The longitudinal variables follow from the gauge choice. The full solution is then obtained by the method of matched asymptotic expansion. We show that our methods reproduce existing results in harmonic gauge to 2.5PN order.
我们考虑一个经典问题的致密流体源的行为,非相对论性和辐射引力波。这个问题包括确定离源近和离源远的度量。源的非相对论性导致尺度分离,导致重叠区域,其中$1/c$和(多极)$G$-展开都有效。解决这个问题的标准方法(Blanchet- Damour和DIRE方法)使用谐波测量。我们定义了一个“后牛顿”的量规类,它在惯性坐标系中承认牛顿制度。在本文中,我们建立了求解任何后牛顿规范选择的度规的形式化方法。我们的方法是基于之前关于非相对论引力协变理论的工作(广义相对论的1/c -扩展,使用后牛顿-卡坦变量)。按照$1/c$和$G$展开的兴趣顺序,我们将变量分成两组:横向和纵向。我们证明,对于横向变量,问题可以简化为在适当的边界条件下,在各自的区域上的拉普拉斯算子和达朗伯算子的逆变换。后者是内部正则性和在过去零无穷远处施加无入射辐射条件下的渐近平坦性。纵向变量来源于量规的选择。然后用匹配渐近展开的方法得到了问题的全解。我们证明了我们的方法将现有的结果在谐波测量中再现到2.5PN阶。
{"title":"Towards a covariant framework for post-Newtonian expansions for\u0000 radiative sources","authors":"Hartong, Jelle, Musaeus, Jørgen","doi":"10.48550/arxiv.2311.07546","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07546","url":null,"abstract":"We consider the classic problem of a compact fluid source that behaves non-relativistically and that radiates gravitational waves. The problem consists of determining the metric close to the source as well as far away from it. The non-relativistic nature of the source leads to a separation of scales resulting in an overlap region where both the $1/c$ and (multipolar) $G$-expansions are valid. Standard approaches to this problem (the Blanchet--Damour and the DIRE approach) use the harmonic gauge. We define a `post-Newtonian' class of gauges that admit a Newtonian regime in inertial coordinates. In this paper we set up a formalism to solve for the metric for any post-Newtonian gauge choice. Our methods are based on previous work on the covariant theory of non-relativistic gravity (a $1/c$-expansion of general relativity that uses post-Newton-Cartan variables). At the order of interest in the $1/c$ and $G$-expansions we split the variables into two sets: transverse and longitudinal. We show that for the transverse variables the problem can be reduced to inverting Laplacian and d'Alembertian operators on their respective domains subject to appropriate boundary conditions. The latter are regularity in the interior and asymptotic flatness with a Sommerfeld no-incoming radiation condition imposed at past null infinity. The longitudinal variables follow from the gauge choice. The full solution is then obtained by the method of matched asymptotic expansion. We show that our methods reproduce existing results in harmonic gauge to 2.5PN order.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Elastic Language Models 弹性语言模型
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07204
Zhang, Chen, Wang, Benyou, Song, Dawei
Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.
大规模的预训练语言模型在广泛的语言理解和信息检索任务中取得了令人瞩目的成绩。知识蒸馏提供了一个将大型语言模型压缩为小型语言模型的机会,以达到合理的延迟-性能折衷。然而,对于请求数量(例如,提交给搜索引擎的查询)变化很大的场景,压缩语言模型获得的静态权衡可能并不总是合适的。一旦为模型分配了静态权衡,它可能是不够的,因为当请求数量大时延迟太高,或者当请求数量小时性能太低。为此,我们提出了一种弹性语言模型(elasticm),它可以根据请求流弹性地调整权衡。基本思想是在压缩语言模型中引入计算弹性,这样权衡就可以根据可伸缩和可控的计算动态变化。具体来说,我们通过施加弹性结构来实现具有计算弹性的elasticm,并设计一个弹性优化来学习具有计算弹性的elasticm。为了服务elasticm,我们应用了一个弹性调度。考虑到信息检索的特殊性,我们将elasticclm应用于密集检索和重排序,并分别提出了ElasticDenser和ElasticRanker。对语言理解基准GLUE进行离线评估;和一些信息检索任务,包括自然问题,问答问答,和MS MARCO。结果表明,与静态基线阵列相比,elasticclm与ElasticDenser和ElasticRanker可以正确执行并具有竞争力。此外,还进行了并行在线仿真。结果表明,elasticm可以针对不同的请求流提供弹性折衷。
{"title":"On Elastic Language Models","authors":"Zhang, Chen, Wang, Benyou, Song, Dawei","doi":"10.48550/arxiv.2311.07204","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07204","url":null,"abstract":"Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SL(2, $mathbb C$) quartic vertex for closed string field theory 闭弦场理论的SL(2, $mathbb C$)四次顶点
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07367
Erbin, Harold, Majumder, Suvajit
We construct the $mathrm{SL}(2, mathbb C)$ quartic vertex with a generic stub parameter for the bosonic closed string field theory by characterizing the vertex region in the moduli space of 4-punctured sphere, and providing the necessary and sufficient constraints for the local coordinate maps. While $mathrm{SL}(2, mathbb C)$ vertices are not known to have a nice geometric recursive construction like the minimal area or hyperbolic vertices, they can be studied analytically which makes them more convenient for simple computations. In particular, we obtain exact formulas for the parametrization and volume of the vertex region as a function of the stub parameter. The main objective of having an explicit quartic vertex is to later study its decomposition using auxiliary fields.
通过刻画四刺球模空间中的顶点区域,构造了玻色子闭弦场理论的$ mathm {SL}(2, mathbb C)$四次顶点,并给出了局部坐标映射的充分必要约束条件。虽然$ mathm {SL}(2, mathbb C)$顶点不像最小面积顶点或双曲顶点那样具有良好的几何递归结构,但它们可以通过解析研究,这使得它们更便于简单的计算。特别地,我们得到了作为存根参数函数的顶点区域的参数化和体积的精确公式。有一个显式的四次顶点的主要目的是以后研究它的分解使用辅助场。
{"title":"SL(2, $mathbb C$) quartic vertex for closed string field theory","authors":"Erbin, Harold, Majumder, Suvajit","doi":"10.48550/arxiv.2311.07367","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07367","url":null,"abstract":"We construct the $mathrm{SL}(2, mathbb C)$ quartic vertex with a generic stub parameter for the bosonic closed string field theory by characterizing the vertex region in the moduli space of 4-punctured sphere, and providing the necessary and sufficient constraints for the local coordinate maps. While $mathrm{SL}(2, mathbb C)$ vertices are not known to have a nice geometric recursive construction like the minimal area or hyperbolic vertices, they can be studied analytically which makes them more convenient for simple computations. In particular, we obtain exact formulas for the parametrization and volume of the vertex region as a function of the stub parameter. The main objective of having an explicit quartic vertex is to later study its decomposition using auxiliary fields.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diaconis-Ylvisaker prior penalized likelihood for $p/n to kappa in (0,1)$ logistic regression Diaconis-Ylvisaker在$p/n to kappa (0,1)$逻辑回归中先验惩罚似然
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07419
Sterzinger, Philipp, Kosmidis, Ioannis
We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $kappa in (0,1)$ of the number of observations $n$, as $n to infty$. We derive the estimator's aggregate asymptotic behaviour when covariates are independent normal random variables with mean zero and variance $1/n$, and the vector of regression coefficients has length $gamma sqrt{n}$, asymptotically. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. In the process, we fill in gaps in previous literature by formulating a Lipschitz-smooth approximate message passing recursion, to formally transfer the asymptotic results from approximate message passing to logistic regression. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(kappa, gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(kappa, gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties and compare it to logistic ridge regression and demonstrate our theoretical findings with simulations.
我们在高维逻辑回归中描述了最大Diaconis-Ylvisaker先验惩罚似然估计量的行为,其中协变量的数量是观测数量$n$的一个分数$kappa in (0,1)$,如$n to infty$。当协变量为均值为零、方差为$1/n$的独立正态随机变量,且回归系数向量的长度为$gamma sqrt{n}$时,我们渐近地推导了估计量的总渐近行为。在此基础上,我们设计了调整后的$Z$统计量,惩罚似然比统计量,以及具有任意协方差的渐近结果。在此过程中,我们通过建立一个Lipschitz-smooth近似消息传递递归来填补以往文献的空白,将近似消息传递的渐近结果正式转化为逻辑回归。虽然最大似然估计仅在很小的$(kappa, gamma)$值范围内渐近存在,但最大Diaconis-Ylvisaker先验惩罚似然估计不仅总是存在,而且可以使用最大似然例程直接计算。因此,我们的渐近结果也适用于$(kappa, gamma)$值,其中无法获得最大似然的结果,在实现或计算中没有开销。我们研究了估计器的收缩特性,并将其与逻辑岭回归进行了比较,并通过模拟证明了我们的理论发现。
{"title":"Diaconis-Ylvisaker prior penalized likelihood for $p/n to kappa in\u0000 (0,1)$ logistic regression","authors":"Sterzinger, Philipp, Kosmidis, Ioannis","doi":"10.48550/arxiv.2311.07419","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07419","url":null,"abstract":"We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $kappa in (0,1)$ of the number of observations $n$, as $n to infty$. We derive the estimator's aggregate asymptotic behaviour when covariates are independent normal random variables with mean zero and variance $1/n$, and the vector of regression coefficients has length $gamma sqrt{n}$, asymptotically. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. In the process, we fill in gaps in previous literature by formulating a Lipschitz-smooth approximate message passing recursion, to formally transfer the asymptotic results from approximate message passing to logistic regression. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(kappa, gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(kappa, gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties and compare it to logistic ridge regression and demonstrate our theoretical findings with simulations.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking Re-Identification 用油漆标记重新识别的蜜蜂花斑自动测定方法的研究
Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07407
Meyers, Luke, Cordero, Josué Rodríguez, Bravo, Carlos Corrada, Noel, Fanfan, Agosto-Rivera, José, Giray, Tugrul, Mégret, Rémi
In this paper, we show that paint markings are a feasible approach to automatize the analysis of behavioral assays involving honey bees in the field where marking has to be as lightweight as possible. We contribute a novel dataset for bees re-identification with paint-markings with 4392 images and 27 identities. Contrastive learning with a ResNet backbone and triplet loss led to identity representation features with almost perfect recognition in closed setting where identities are known in advance. Diverse experiments evaluate the capability to generalize to separate IDs, and show the impact of using different body parts for identification, such as using the unmarked abdomen only. In addition, we show the potential to fully automate the visit detection and provide preliminary results of compute time for future real-time deployment in the field on an edge device.
在本文中,我们表明,油漆标记是一种可行的方法,以自动化的行为分析涉及蜜蜂的领域,其中标记必须尽可能轻。我们贡献了一个新的数据集,用4392个图像和27个身份的油漆标记重新识别蜜蜂。使用ResNet主干和三元丢失的对比学习导致在预先知道身份的封闭环境中几乎完全识别身份表示特征。不同的实验评估了推广到单独id的能力,并展示了使用不同身体部位进行识别的影响,例如仅使用未标记的腹部。此外,我们还展示了完全自动化访问检测的潜力,并为未来在边缘设备上的现场实时部署提供了计算时间的初步结果。
{"title":"Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking\u0000 Re-Identification","authors":"Meyers, Luke, Cordero, Josué Rodríguez, Bravo, Carlos Corrada, Noel, Fanfan, Agosto-Rivera, José, Giray, Tugrul, Mégret, Rémi","doi":"10.48550/arxiv.2311.07407","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07407","url":null,"abstract":"In this paper, we show that paint markings are a feasible approach to automatize the analysis of behavioral assays involving honey bees in the field where marking has to be as lightweight as possible. We contribute a novel dataset for bees re-identification with paint-markings with 4392 images and 27 identities. Contrastive learning with a ResNet backbone and triplet loss led to identity representation features with almost perfect recognition in closed setting where identities are known in advance. Diverse experiments evaluate the capability to generalize to separate IDs, and show the impact of using different body parts for identification, such as using the unmarked abdomen only. In addition, we show the potential to fully automate the visit detection and provide preliminary results of compute time for future real-time deployment in the field on an edge device.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv (Cornell University)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1