Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-11-11 DOI:10.1109/TPAMI.2024.3495999

Alessandra Carbone;Aurélien Decelle;Lorenzo Rosset;Beatriz Seoane

{"title":"Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics","authors":"Alessandra Carbone;Aurélien Decelle;Lorenzo Rosset;Beatriz Seoane","doi":"10.1109/TPAMI.2024.3495999","DOIUrl":null,"url":null,"abstract":"In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied to the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to five different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, homologous RNA sequences from specific taxonomies and real classical piano pieces classified by their composer.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"1309-1316"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10750287/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied to the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to five different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, homologous RNA sequences from specific taxonomies and real classical piano pieces classified by their composer.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

植根于非平衡物理学的快速功能结构化数据生成器

在本研究中，我们解决了在复杂的结构化数据集（如群体遗传学、RNA或蛋白质序列数据）中使用基于能量的模型生成高质量、标签特定数据的挑战。传统的训练方法由于马尔可夫链蒙特卡罗混合效率低下，影响了合成数据的多样性，增加了生成次数，因而遇到了困难。为了解决这些问题，我们使用了一种利用非平衡效应的新型训练算法。该方法应用于受限玻尔兹曼机，提高了模型正确分类样本的能力，并在几个采样步骤中生成高质量的合成数据。该方法成功地应用于五种不同类型的数据：手写数字、按大陆起源分类的人类基因组突变、酶蛋白家族的功能特征序列、来自特定分类的同源RNA序列以及由作曲家分类的真实古典钢琴曲。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

期刊最新文献

FSD V2: Improving Fully Sparse 3D Object Detection With Virtual Voxels Online Learning Under a Separable Stochastic Approximation Framework Event-Enhanced Snapshot Compressive Videography at 10K FPS Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics Estimating Information Theoretic Measures via Multidimensional Gaussianization