首页 > 最新文献

Doklady Mathematics最新文献

英文 中文
Solving Differential Equations with Pretrained Out-of-the-Box Models: The Potential of Small-Scale LLMs 用预训练的开箱即用模型求解微分方程:小规模法学硕士的潜力
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700292
S. N. Koltcov, V. V. Ignatenko, A. Yu. Surkov, V. O. Zakharov

This study investigates the capability of small reasoning-oriented language models to construct analytical solutions to differential equations. Computational experiments are conducted on such models as DeepSeek-R1-Distill-Qwen-1.5B, Qwen2.5-1.5B, and Open-Reasoner-Zero-1.5B. To extract the final answers from the models reasoning processes, postprocessing is applied using two additional language models, Qwen2.5:latest and Llama3.2: latest. The extracted solutions are then compared with reference solutions using the BLEU metric. Our results demonstrate that, on average, Open-Reasoner-Zero-1.5B achieves superior performance, reaching the highest BLEU score (0.978) for second-order homogeneous equations.

本研究探讨以推理为导向的小型语言模型建构微分方程解析解的能力。在DeepSeek-R1-Distill-Qwen-1.5B、Qwen2.5-1.5B、Open-Reasoner-Zero-1.5B等模型上进行了计算实验。为了从模型的推理过程中提取最终答案,使用了两个额外的语言模型Qwen2.5:latest和Llama3.2: latest进行后处理。然后使用BLEU度量将提取的溶液与参考溶液进行比较。我们的研究结果表明,平均而言,Open-Reasoner-Zero-1.5B具有较好的性能,在二阶齐次方程中达到了最高的BLEU分数(0.978)。
{"title":"Solving Differential Equations with Pretrained Out-of-the-Box Models: The Potential of Small-Scale LLMs","authors":"S. N. Koltcov,&nbsp;V. V. Ignatenko,&nbsp;A. Yu. Surkov,&nbsp;V. O. Zakharov","doi":"10.1134/S1064562425700292","DOIUrl":"10.1134/S1064562425700292","url":null,"abstract":"<p>This study investigates the capability of small reasoning-oriented language models to construct analytical solutions to differential equations. Computational experiments are conducted on such models as DeepSeek-R1-Distill-Qwen-1.5B, Qwen2.5-1.5B, and Open-Reasoner-Zero-1.5B. To extract the final answers from the models reasoning processes, postprocessing is applied using two additional language models, Qwen2.5:latest and Llama3.2: latest. The extracted solutions are then compared with reference solutions using the BLEU metric. Our results demonstrate that, on average, Open-Reasoner-Zero-1.5B achieves superior performance, reaching the highest BLEU score (0.978) for second-order homogeneous equations.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"273 - 278"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMRFiGN: An Ensemble Graph Segmentation Model for Imbalanced High-Resolution Images Informed by Multicomponent Markov Random Fields 基于多分量马尔可夫随机场的不平衡高分辨率图像集成图分割模型
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700255
A. K. Gorshenin, A. M. Dostovalova

The article presents a novel MMRFiGN ensemble graph neural network model, informed by multicomponent Markov random fields, to improve object segmentation quality in high-resolution images for cases of imbalanced and volatile datasets. A key component of this model is a specially designed two-branch block of graph convolutions. This block simultaneously processes local and global image features based on multiscale image partitions using a multicomponent Markov model to reconstruct spatial relationships between features. A theorem on the faster decrease of the loss function for a multicomponent graph architecture is proven, indicating faster model training compared to graph and convolutional models of comparable size. The MMRFiGN model was tested on the task of image segmentation collected with unmanned aerial vehicles on heterogeneous urban landscapes (open datasets UAVid and UDD were used: Ultra HD 4K resolution, with an imbalance of object classes in terms of numbers). MMRFiGN has outperformed the recognition of both large (buildings, roads) and small objects of different scales (cars) compared to modern convolutional architectures (DeepLabV3, ENet) as well as transformers (SegFormer and SOTA-model 2025 LWGANet): in the first case, an increase in the F1-score reaches 25.04% (on average, up to 12.08%), and in the second, 14.87 (on average, up to 11.52%). MMRFiGN also outperforms alternative ensemble implementations based on graph architectures with attention up to 20.97%. At the same time, MMRFiGN has fewer parameters than the basic networks, demonstrating the possibility of reduction by a factor of 1.78.

本文提出了一种新的MMRFiGN集成图神经网络模型,该模型由多分量马尔可夫随机场提供信息,用于提高高分辨率图像中不平衡和不稳定数据集的目标分割质量。该模型的一个关键组成部分是一个特殊设计的图卷积的双分支块。该块在多尺度图像分割的基础上,利用多分量马尔可夫模型同时处理局部和全局图像特征,重构特征之间的空间关系。证明了多分量图结构的损失函数下降速度更快的定理,表明与同等大小的图和卷积模型相比,模型训练速度更快。在异构城市景观无人机采集的图像分割任务上对MMRFiGN模型进行了测试(使用开放数据集UAVid和UDD:超高清4K分辨率,目标类别数量不平衡)。与现代卷积架构(DeepLabV3, ENet)和变压器(SegFormer和SOTA-model 2025 LWGANet)相比,MMRFiGN在识别大型(建筑物,道路)和不同规模的小型物体(汽车)方面都表现出色:第一种情况下,f1得分提高了25.04%(平均高达12.08%),第二种情况下,f1得分提高了14.87(平均高达11.52%)。MMRFiGN也优于其他基于图架构的集成实现,关注度高达20.97%。同时,与基本网络相比,MMRFiGN具有更少的参数,表明减少了1.78倍的可能性。
{"title":"MMRFiGN: An Ensemble Graph Segmentation Model for Imbalanced High-Resolution Images Informed by Multicomponent Markov Random Fields","authors":"A. K. Gorshenin,&nbsp;A. M. Dostovalova","doi":"10.1134/S1064562425700255","DOIUrl":"10.1134/S1064562425700255","url":null,"abstract":"<p>The article presents a novel MMRFiGN ensemble graph neural network model, informed by multicomponent Markov random fields, to improve object segmentation quality in high-resolution images for cases of imbalanced and volatile datasets. A key component of this model is a specially designed two-branch block of graph convolutions. This block simultaneously processes local and global image features based on multiscale image partitions using a multicomponent Markov model to reconstruct spatial relationships between features. A theorem on the faster decrease of the loss function for a multicomponent graph architecture is proven, indicating faster model training compared to graph and convolutional models of comparable size. The MMRFiGN model was tested on the task of image segmentation collected with unmanned aerial vehicles on heterogeneous urban landscapes (open datasets UAVid and UDD were used: Ultra HD 4K resolution, with an imbalance of object classes in terms of numbers). MMRFiGN has outperformed the recognition of both large (buildings, roads) and small objects of different scales (cars) compared to modern convolutional architectures (DeepLabV3, ENet) as well as transformers (SegFormer and SOTA-model 2025 LWGANet): in the first case, an increase in the <i>F</i><sub>1</sub>-score reaches 25.04% (on average, up to 12.08%), and in the second, 14.87 (on average, up to 11.52%). MMRFiGN also outperforms alternative ensemble implementations based on graph architectures with attention up to 20.97%. At the same time, MMRFiGN has fewer parameters than the basic networks, demonstrating the possibility of reduction by a factor of 1.78.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"308 - 318"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147336332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ruadapt: Cost-Effective Large Language Model Lingual Adaptation Ruadapt:具有成本效益的大型语言模型
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700322
M. M. Tikhomirov, D. I. Chernyshev

Multilingual Large Language Models (LLMs) often exhibit degraded performance for languages other than English due to the imbalance in their training data. Directly adapting these models to a new language, such as Russian, carries the risk of catastrophic forgetting of their original capabilities and demands significant computational resources. The article introduces Ruadapt: a comprehensive and computationally efficient methodology for language adaptation of LLMs, featuring tokenizer replacement. A full adaptation of a single Qwen3-8B model version with our methodology requires less than 2000 GPU h, while subsequent adaptations of other versions are up to ten times less resource-intensive due to the modular nature of the procedure’s steps. An optimal configuration achieves up to an 80% speed-up in generation, with full preservation of long-context capabilities and only minor degradation in instruction-following performance. The authors conduct a detailed empirical study of each adaptation step to identify optimal hyperparameters and to assess the impact of each key stage on the final quality. These resulting guidelines are implemented in the current generation of Ruadapt models, such as RuadaptQwen3-32B-Hybrid. We are open-sourcing our models, code, and datasets to provide the research community with a validated and cost-effective strategy for developing high-quality, language-specific models.

由于训练数据的不平衡,多语言大型语言模型(llm)对英语以外的语言表现出较差的性能。将这些模型直接应用于一种新的语言,比如俄语,会带来灾难性地忘记其原始功能的风险,并且需要大量的计算资源。本文介绍了Ruadapt:一种全面且计算效率高的llm语言适应方法,具有标记器替换的特点。使用我们的方法完全适应单个Qwen3-8B模型版本需要不到2000 GPU h,而其他版本的后续适应由于程序步骤的模块化性质,资源密集型减少了十倍。最优配置可以在生成过程中实现高达80%的加速,同时完全保留长上下文功能,并且指令遵循性能只有轻微的下降。作者对每个适应步骤进行了详细的实证研究,以确定最佳超参数,并评估每个关键阶段对最终质量的影响。这些结果指南在当前一代的Ruadapt模型中实现,例如RuadaptQwen3-32B-Hybrid。我们正在开源我们的模型、代码和数据集,为研究社区提供一个有效的、具有成本效益的策略,用于开发高质量的、特定于语言的模型。
{"title":"Ruadapt: Cost-Effective Large Language Model Lingual Adaptation","authors":"M. M. Tikhomirov,&nbsp;D. I. Chernyshev","doi":"10.1134/S1064562425700322","DOIUrl":"10.1134/S1064562425700322","url":null,"abstract":"<p>Multilingual Large Language Models (LLMs) often exhibit degraded performance for languages other than English due to the imbalance in their training data. Directly adapting these models to a new language, such as Russian, carries the risk of catastrophic forgetting of their original capabilities and demands significant computational resources. The article introduces Ruadapt: a comprehensive and computationally efficient methodology for language adaptation of LLMs, featuring tokenizer replacement. A full adaptation of a single Qwen3-8B model version with our methodology requires less than 2000 GPU h, while subsequent adaptations of other versions are up to ten times less resource-intensive due to the modular nature of the procedure’s steps. An optimal configuration achieves up to an 80% speed-up in generation, with full preservation of long-context capabilities and only minor degradation in instruction-following performance. The authors conduct a detailed empirical study of each adaptation step to identify optimal hyperparameters and to assess the impact of each key stage on the final quality. These resulting guidelines are implemented in the current generation of Ruadapt models, such as RuadaptQwen3-32B-Hybrid. We are open-sourcing our models, code, and datasets to provide the research community with a validated and cost-effective strategy for developing high-quality, language-specific models.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"246 - 254"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Class Surface Generation of Complex Anatomical Structures Using Neural Networks 基于神经网络的复杂解剖结构多类曲面生成
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700231
R. UI. Epifanov, Ya. V. Fedotova, D. R. Popov, R. I. Mulliyazhanov

We propose a universal neural network architecture for single-stage multi-class polygonal model generation of anatomical structures from three-dimensional medical images. The key component of the architecture is a trainable affine module that dynamically positions and scales the initial meshes of anatomical structures. This eliminates the need for manual template preparation and reduces the number of self-intersections in the resulting meshes. The effectiveness of the proposed approach has been confirmed on the CHAOS and MMWHS datasets. On CHAOS, an average Dice score of 0.958 is achieved with an ASSD of 1.399 mm, and self-intersections are observed in only 2 out of 20 generated surfaces. On MMWHS, the average Dice score across heart structures is approximately 0.9, and the proportion of self-intersecting edges is comparable to or lower than in the best available methods. Overall, the results demonstrate an accuracy level comparable to modern standards, while producing meshes with significantly cleaner topology. Ablation analysis also confirms the importance of the affine module for generating topologically correct polygonal models.

我们提出了一种通用的神经网络架构,用于从三维医学图像中生成解剖结构的单阶段多类多边形模型。该体系结构的关键组成部分是一个可训练的仿射模块,它可以动态地定位和缩放解剖结构的初始网格。这消除了手动模板准备的需要,并减少了结果网格中的自相交数量。在CHAOS和MMWHS数据集上验证了该方法的有效性。在CHAOS上,平均Dice得分为0.958,ASSD为1.399 mm,并且在20个生成的表面中仅观察到2个自交。在MMWHS中,心脏结构的平均Dice得分约为0.9,自相交边缘的比例与最佳可用方法相当或更低。总体而言,结果显示出与现代标准相当的精度水平,同时生成具有明显更清晰拓扑的网格。烧蚀分析也证实了仿射模对于生成拓扑正确的多边形模型的重要性。
{"title":"Multi-Class Surface Generation of Complex Anatomical Structures Using Neural Networks","authors":"R. UI. Epifanov,&nbsp;Ya. V. Fedotova,&nbsp;D. R. Popov,&nbsp;R. I. Mulliyazhanov","doi":"10.1134/S1064562425700231","DOIUrl":"10.1134/S1064562425700231","url":null,"abstract":"<p>We propose a universal neural network architecture for single-stage multi-class polygonal model generation of anatomical structures from three-dimensional medical images. The key component of the architecture is a trainable affine module that dynamically positions and scales the initial meshes of anatomical structures. This eliminates the need for manual template preparation and reduces the number of self-intersections in the resulting meshes. The effectiveness of the proposed approach has been confirmed on the CHAOS and MMWHS datasets. On CHAOS, an average Dice score of 0.958 is achieved with an ASSD of 1.399 mm, and self-intersections are observed in only 2 out of 20 generated surfaces. On MMWHS, the average Dice score across heart structures is approximately 0.9, and the proportion of self-intersecting edges is comparable to or lower than in the best available methods. Overall, the results demonstrate an accuracy level comparable to modern standards, while producing meshes with significantly cleaner topology. Ablation analysis also confirms the importance of the affine module for generating topologically correct polygonal models.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"332 - 341"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scene Graph Forecasting Using Neural Network-Based Methods 基于神经网络的场景图预测方法
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700334
A. M. Trunova, D. A. Yudin

Forecasting the future state of a scene is a key computer vision task needed to build systems capable of proactive perception and decision-making in changing environments. This work addresses the problem of forecasting future scene graphs, where, given a video and a sequence of past graphs, one must predict objects and their relations in subsequent frames. Unlike existing approaches limited to static perception, the proposed method, GraphCast, takes into account semantic vision-language features of objects and their temporal dynamics. We introduce a model architecture based on object-centric encoding with a foundation transformer model, interaction modeling via a biaffine relation classification head, and a specialized object presence classifier. In addition, a temporal convolution module is used to extract features and improve robustness to noise. Experiments on the STAR and Action Genome datasets demonstrate that the proposed architecture outperforms existing baselines.

预测场景的未来状态是构建能够在不断变化的环境中进行主动感知和决策的系统所需的关键计算机视觉任务。这项工作解决了预测未来场景图的问题,其中,给定一个视频和一系列过去的图形,必须预测物体及其在后续帧中的关系。与现有的仅限于静态感知的方法不同,本文提出的方法GraphCast考虑了对象的语义视觉语言特征及其时间动态。我们介绍了一种基于以对象为中心的编码的模型体系结构,该模型具有基础转换器模型,通过双affine关系分类头进行交互建模,以及专门的对象存在分类器。此外,利用时间卷积模块提取特征,提高对噪声的鲁棒性。在STAR和Action Genome数据集上的实验表明,所提出的架构优于现有的基线。
{"title":"Scene Graph Forecasting Using Neural Network-Based Methods","authors":"A. M. Trunova,&nbsp;D. A. Yudin","doi":"10.1134/S1064562425700334","DOIUrl":"10.1134/S1064562425700334","url":null,"abstract":"<p>Forecasting the future state of a scene is a key computer vision task needed to build systems capable of proactive perception and decision-making in changing environments. This work addresses the problem of forecasting future scene graphs, where, given a video and a sequence of past graphs, one must predict objects and their relations in subsequent frames. Unlike existing approaches limited to static perception, the proposed method, GraphCast, takes into account semantic vision-language features of objects and their temporal dynamics. We introduce a model architecture based on object-centric encoding with a foundation transformer model, interaction modeling via a biaffine relation classification head, and a specialized object presence classifier. In addition, a temporal convolution module is used to extract features and improve robustness to noise. Experiments on the STAR and Action Genome datasets demonstrate that the proposed architecture outperforms existing baselines.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"239 - 245"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JDCEMB: Joint Distillation and Contrastive Learning for Embeddings in Task-Oriented Dialogue Systems 面向任务的对话系统中嵌入的联合蒸馏和对比学习
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700243
A. I. Burykina, D. R. Ledneva, D. P. Kuznetsov

We present JDCEmb—a new framework for training universal vector representations in task-oriented dialogue tasks. Text encoders play a crucial role in such systems, and their quality determines the effectiveness of dialogue systems. Modern approaches to training dialogue encoders often rely on contrastive methods, which improve the distinguishability of representations but are sensitive to the selection of positive and negative pairs. This can lead to loss of important semantic information. Knowledge distillation-based methods, on the other hand, transfer more context but struggle to distinguish similar utterances and perform poorly with subtle semantic differences.

JDCEmb combines the strengths of both approaches using a teacher–student architecture, where the student model is trained contrastively and aligned with the teacher model’s vector representations simultaneously. This combination makes it possible to maintain semantic richness while enhancing the distinctiveness of vector representations–crucial for dialogue systems. Experimental results on key dialogue tasks demonstrate the effectiveness of the approach: JDCEmb consistently reaches or surpasses state-of-the-art levels, outperforming strong current baseline models.

我们提出jdcemb——一个在面向任务的对话任务中训练通用向量表示的新框架。文本编码器在对话系统中起着至关重要的作用,其质量决定了对话系统的有效性。训练对话编码器的现代方法通常依赖于对比方法,这种方法提高了表征的可区分性,但对正对和负对的选择很敏感。这可能导致重要语义信息的丢失。另一方面,基于知识提取的方法转移了更多的上下文,但在区分相似的话语方面存在困难,并且在细微的语义差异方面表现不佳。JDCEmb使用教师-学生体系结构结合了这两种方法的优点,其中学生模型是对比训练的,并同时与教师模型的向量表示保持一致。这种组合使得在保持语义丰富性的同时增强向量表示的独特性成为可能——这对对话系统至关重要。关键对话任务的实验结果证明了该方法的有效性:JDCEmb始终达到或超过最先进的水平,优于强大的当前基线模型。
{"title":"JDCEMB: Joint Distillation and Contrastive Learning for Embeddings in Task-Oriented Dialogue Systems","authors":"A. I. Burykina,&nbsp;D. R. Ledneva,&nbsp;D. P. Kuznetsov","doi":"10.1134/S1064562425700243","DOIUrl":"10.1134/S1064562425700243","url":null,"abstract":"<div><p>We present JDCEmb—a new framework for training universal vector representations in task-oriented dialogue tasks. Text encoders play a crucial role in such systems, and their quality determines the effectiveness of dialogue systems. Modern approaches to training dialogue encoders often rely on contrastive methods, which improve the distinguishability of representations but are sensitive to the selection of positive and negative pairs. This can lead to loss of important semantic information. Knowledge distillation-based methods, on the other hand, transfer more context but struggle to distinguish similar utterances and perform poorly with subtle semantic differences.</p><p>JDCEmb combines the strengths of both approaches using a teacher–student architecture, where the student model is trained contrastively and aligned with the teacher model’s vector representations simultaneously. This combination makes it possible to maintain semantic richness while enhancing the distinctiveness of vector representations–crucial for dialogue systems. Experimental results on key dialogue tasks demonstrate the effectiveness of the approach: JDCEmb consistently reaches or surpasses state-of-the-art levels, outperforming strong current baseline models.</p></div>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"319 - 331"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FoCAT: Foundation Model for Estimating the Conditional Average Treatment Effect 估计条件平均处理效果的基础模型
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700280
S. R. Kirpichenko, A. V. Konstantinov, L. V. Utkin

The paper presents a novel foundation model, FoCAT (Foundation Causal Adaptive Transformer), developed for estimating the conditional treatment effect. The model addresses several key challenges inherent in causal inference tasks, including a limited sample size in the treatment group, the impossibility of simultaneously observing patient outcomes before and after intervention, and difficulties in testing models on real data. FoCAT employs a hypernetwork architecture. Unlike existing approaches that predict separate outcome functions for control and treatment groups, FoCAT directly estimates the conditional treatment effect. The model allows for control of the context informativeness through specialized classification tokens. Numerical experiments on synthetic and real-world datasets demonstrate superiority of FoCAT in estimation of the treatment effect. The code implementing FoCAT is publicly available.

本文提出了一种新的基础模型,FoCAT(基础因果自适应变压器),用于估计条件处理效果。该模型解决了因果推理任务中固有的几个关键挑战,包括治疗组的样本量有限,不可能同时观察干预前后的患者结果,以及在实际数据上测试模型的困难。FoCAT采用超网络架构。与现有预测对照组和治疗组单独结果函数的方法不同,FoCAT直接估计条件治疗的效果。该模型允许通过专门的分类令牌控制上下文信息。在合成数据集和实际数据集上的数值实验证明了FoCAT在估计处理效果方面的优越性。实现FoCAT的代码是公开的。
{"title":"FoCAT: Foundation Model for Estimating the Conditional Average Treatment Effect","authors":"S. R. Kirpichenko,&nbsp;A. V. Konstantinov,&nbsp;L. V. Utkin","doi":"10.1134/S1064562425700280","DOIUrl":"10.1134/S1064562425700280","url":null,"abstract":"<p>The paper presents a novel foundation model, FoCAT (Foundation Causal Adaptive Transformer), developed for estimating the conditional treatment effect. The model addresses several key challenges inherent in causal inference tasks, including a limited sample size in the treatment group, the impossibility of simultaneously observing patient outcomes before and after intervention, and difficulties in testing models on real data. FoCAT employs a hypernetwork architecture. Unlike existing approaches that predict separate outcome functions for control and treatment groups, FoCAT directly estimates the conditional treatment effect. The model allows for control of the context informativeness through specialized classification tokens. Numerical experiments on synthetic and real-world datasets demonstrate superiority of FoCAT in estimation of the treatment effect. The code implementing FoCAT is publicly available.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"279 - 287"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Theoretically Justified Contrastive Self-Supervised Methods for Continuous Dependent Data 连续相关数据的理论证明对比自监督方法
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700309
A. E. Marusov, A. A. Zaytsev

The task of obtaining informative object representations involves training a model, called an encoder, which constructs informative, compressed representations of signals it receives as input. One approach to solving this problem is through the use of self-supervised learning (SSL) methods. An advantage of these methods lies in utilizing only unlabeled data, which is significantly more abundant than labeled data. Among SSL methods, contrastive approaches are particularly prominent; these are based on bringing representations of semantically similar objects (positive pairs) closer together and pushing representations of different signals (negative pairs) apart. Many modern contrastive SSL methods used for obtaining representations of dependent data—where elements within a sample are semantically related—employ a loss function originally designed for independent data. In this work, we propose a theoretically justified approach for selecting a loss function suitable for continuous dependent data, i.e., data in which neighboring elements within the sample can be considered a positive pair. The analysis presented introduces various ways to model similarity between objects and corresponding loss functions, explicitly accounting for correlations between objects. To empirically assess the effectiveness of the proposed loss functions, we focused on temperature and drought forecasting tasks, which can be classified as continuous dependent data. The results demonstrate that our model, combined with the proposed loss functions, outperforms approaches based on the assumption of semantic independence between data, i.e., when all elements of the sample are semantically unrelated. These findings highlight the importance of considering such dependencies for developing high-quality encoders.

获取信息对象表示的任务包括训练一个称为编码器的模型,该模型对接收到的作为输入的信号构建信息丰富的压缩表示。解决这个问题的一种方法是使用自监督学习(SSL)方法。这些方法的一个优点是只利用未标记的数据,这比标记的数据要丰富得多。在SSL方法中,对比方法尤为突出;这些是基于将语义相似的对象的表示(正对)更靠近,并将不同信号的表示(负对)分开。许多用于获取依赖数据表示的现代对比SSL方法(样本中的元素在语义上是相关的)都使用了最初为独立数据设计的损失函数。在这项工作中,我们提出了一种理论上合理的方法来选择适合于连续相关数据的损失函数,即样本内相邻元素可以被认为是正对的数据。提出的分析介绍了各种方法来模拟对象之间的相似性和相应的损失函数,明确地考虑了对象之间的相关性。为了实证评估所提出的损失函数的有效性,我们将重点放在温度和干旱预测任务上,这些任务可以归类为连续依赖数据。结果表明,我们的模型与所提出的损失函数相结合,优于基于数据之间语义独立假设的方法,即当样本的所有元素在语义上无关时。这些发现强调了考虑这些依赖关系对于开发高质量编码器的重要性。
{"title":"Theoretically Justified Contrastive Self-Supervised Methods for Continuous Dependent Data","authors":"A. E. Marusov,&nbsp;A. A. Zaytsev","doi":"10.1134/S1064562425700309","DOIUrl":"10.1134/S1064562425700309","url":null,"abstract":"<p>The task of obtaining informative object representations involves training a model, called an encoder, which constructs informative, compressed representations of signals it receives as input. One approach to solving this problem is through the use of self-supervised learning (SSL) methods. An advantage of these methods lies in utilizing only unlabeled data, which is significantly more abundant than labeled data. Among SSL methods, contrastive approaches are particularly prominent; these are based on bringing representations of semantically similar objects (positive pairs) closer together and pushing representations of different signals (negative pairs) apart. Many modern contrastive SSL methods used for obtaining representations of dependent data—where elements within a sample are semantically related—employ a loss function originally designed for independent data. In this work, we propose a theoretically justified approach for selecting a loss function suitable for continuous dependent data, i.e., data in which neighboring elements within the sample can be considered a positive pair. The analysis presented introduces various ways to model similarity between objects and corresponding loss functions, explicitly accounting for correlations between objects. To empirically assess the effectiveness of the proposed loss functions, we focused on temperature and drought forecasting tasks, which can be classified as continuous dependent data. The results demonstrate that our model, combined with the proposed loss functions, outperforms approaches based on the assumption of semantic independence between data, i.e., when all elements of the sample are semantically unrelated. These findings highlight the importance of considering such dependencies for developing high-quality encoders.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"263 - 272"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RuWikiBench: Evaluating Large Language Models Through Replication of Encyclopedia Articles RuWikiBench:通过百科全书文章的复制来评估大型语言模型
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700279
D. A. Grigoriev, D. I. Chernyshev

In light of the growing interest in using large language models (LLMs) as tools for generating scientific texts, the evaluation of their ability to produce encyclopedic content is becoming increasingly relevant. However, for Russian-language materials, this issue has not been sufficiently studied and existing benchmarks do not cover key aspects of analytical work with sources. This article presents RuWikiBench—an open benchmark based on Ruwiki for evaluating the ability of LLMs to reproduce Wikipedia-style articles, constructed around three tasks: selection of relevant sources, article structuring, and section generation. The results of testing popular open-source LLMs show that even under ideal conditions, the best models do not always follow the expert logic of composing encyclopedic content: even with a perfect source retrieval system, the models cannot reproduce the reference table of contents, and the quality of section generation shows almost no dependence on the number of parameters.

随着人们对使用大型语言模型(llm)作为生成科学文本的工具越来越感兴趣,对它们生成百科全书式内容的能力的评估也变得越来越重要。但是,对于俄文材料,这个问题没有得到充分的研究,现有的基准没有包括资料来源分析工作的关键方面。本文介绍了ruwikibench——一个基于Ruwiki的开放基准测试,用于评估法学硕士复制维基百科风格文章的能力,它围绕三个任务构建:选择相关来源、文章结构和章节生成。测试流行的开源llm的结果表明,即使在理想的条件下,最好的模型也并不总是遵循组成百科全书式内容的专家逻辑:即使有一个完善的资源检索系统,模型也不能再现参考目录,章节生成的质量几乎不依赖于参数的数量。
{"title":"RuWikiBench: Evaluating Large Language Models Through Replication of Encyclopedia Articles","authors":"D. A. Grigoriev,&nbsp;D. I. Chernyshev","doi":"10.1134/S1064562425700279","DOIUrl":"10.1134/S1064562425700279","url":null,"abstract":"<div><p>In light of the growing interest in using large language models (LLMs) as tools for generating scientific texts, the evaluation of their ability to produce encyclopedic content is becoming increasingly relevant. However, for Russian-language materials, this issue has not been sufficiently studied and existing benchmarks do not cover key aspects of analytical work with sources. This article presents RuWikiBench—an open benchmark based on Ruwiki for evaluating the ability of LLMs to reproduce Wikipedia-style articles, constructed around three tasks: selection of relevant sources, article structuring, and section generation. The results of testing popular open-source LLMs show that even under ideal conditions, the best models do not always follow the expert logic of composing encyclopedic content: even with a perfect source retrieval system, the models cannot reproduce the reference table of contents, and the quality of section generation shows almost no dependence on the number of parameters.</p></div>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"299 - 307"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NP-Completeness of Hanabi Game with Minimal Parameters 最小参数Hanabi对策的np -完备性
IF 0.6 4区 数学 Q3 MATHEMATICS Pub Date : 2026-03-02 DOI: 10.1134/S1064562425700310
A. A. Onorpienko

We study the algorithmic complexity of the cooperative card game Hanabi. A feature of Hanabi is that players can see other players’ cards, but not their own, and exchange information through hints. Even in the model with one player who has full information about the deck, Hanabi remains NP-hard. We found the minimal parameters of the game that preserve NP-hardness. If these parameters are further reduced, the game turns out to be solvable in polynomial time.

研究了合作纸牌游戏“花比”的算法复杂度。《Hanabi》的一大特点是玩家可以看到其他玩家的卡片,但不能看到自己的卡片,并通过提示交换信息。即使在一个玩家拥有关于牌组的全部信息的模型中,Hanabi仍然具有np难度。我们找到了保持np硬度的最小游戏参数。如果这些参数进一步减小,博弈结果在多项式时间内可解。
{"title":"NP-Completeness of Hanabi Game with Minimal Parameters","authors":"A. A. Onorpienko","doi":"10.1134/S1064562425700310","DOIUrl":"10.1134/S1064562425700310","url":null,"abstract":"<p>We study the algorithmic complexity of the cooperative card game Hanabi. A feature of Hanabi is that players can see other players’ cards, but not their own, and exchange information through hints. Even in the model with one player who has full information about the deck, Hanabi remains NP-hard. We found the minimal parameters of the game that preserve NP-hardness. If these parameters are further reduced, the game turns out to be solvable in polynomial time.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"112 1","pages":"255 - 262"},"PeriodicalIF":0.6,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147335859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Doklady Mathematics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1