On learning sparse linear models from cross samples

IF 3.4 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Signal Processing Pub Date : 2024-08-27 DOI:10.1016/j.sigpro.2024.109680

Mina Sadat Mahmoudi , Seyed Abolfazl Motahari , Babak Khalaj

{"title":"On learning sparse linear models from cross samples","authors":"Mina Sadat Mahmoudi , Seyed Abolfazl Motahari , Babak Khalaj","doi":"10.1016/j.sigpro.2024.109680","DOIUrl":null,"url":null,"abstract":"<div><p>The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independently, and after crossing (making all possible combinations of drugs and cells), the resulting output (efficacy of drugs) is measured. We call these types of samples as “cross samples”. The dependency among such samples is strong, and existing theoretical studies are either inapplicable or fail to provide realistic bounds. We aim at analyzing the performance of the Lasso estimator where the underlying distributions are mixtures of Gaussians and the data dependency arises from the crossing procedure. Our theoretical results show that the performance of the Lasso estimator in case of cross samples follows that of the i.i.d. samples with differences in constant factors. Through numerical results, we observe a phase transition: When datasets are too small, the error for cross samples is much larger than for i.i.d. samples, but once the size is large enough, cross samples are nearly as useful as i.i.d. samples. Our theoretical analysis suggests that the transition threshold is governed by the level of sparsity of the true parameter vector being estimated.</p></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"227 ","pages":"Article 109680"},"PeriodicalIF":3.4000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168424003001","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independently, and after crossing (making all possible combinations of drugs and cells), the resulting output (efficacy of drugs) is measured. We call these types of samples as “cross samples”. The dependency among such samples is strong, and existing theoretical studies are either inapplicable or fail to provide realistic bounds. We aim at analyzing the performance of the Lasso estimator where the underlying distributions are mixtures of Gaussians and the data dependency arises from the crossing procedure. Our theoretical results show that the performance of the Lasso estimator in case of cross samples follows that of the i.i.d. samples with differences in constant factors. Through numerical results, we observe a phase transition: When datasets are too small, the error for cross samples is much larger than for i.i.d. samples, but once the size is large enough, cross samples are nearly as useful as i.i.d. samples. Our theoretical analysis suggests that the transition threshold is governed by the level of sparsity of the true parameter vector being estimated.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关于从交叉样本中学习稀疏线性模型

本文研究了样本具有依赖性的稀疏线性模型的样本复杂性。我们考虑了样本的一种特定依赖结构，这种结构出现在某些实验设计中，如药物敏感性研究，其中两组对象（药物和细胞）被独立采样，在交叉（对药物和细胞进行所有可能的组合）后，对结果输出（药物疗效）进行测量。我们称这类样本为 "交叉样本"。这类样本之间的依赖性很强，现有的理论研究要么不适用，要么无法提供现实的界限。我们的目标是分析 Lasso 估计器的性能，在这种情况下，底层分布是高斯混合物，数据依赖性来自交叉过程。我们的理论结果表明，在交叉样本的情况下，拉索估计器的性能与具有常数因子差异的 i.i.d. 样本的性能相同。通过数值结果，我们观察到一个阶段性转变：当数据集太小时，交叉样本的误差比 i.i.d. 样本的误差大得多，但一旦数据集足够大，交叉样本就几乎和 i.i.d. 样本一样有用。我们的理论分析表明，过渡阈值取决于所估计的真实参数向量的稀疏程度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Signal Processing 工程技术-工程：电子与电气

CiteScore

9.20

自引率

9.10%

发文量

309

审稿时长

41 days

期刊介绍： Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing. Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.