{"title":"利用数据驱动的信号区域实现新物理的模型诊断式探测","authors":"Soheun Yi, John Alison, Mikael Kuusela","doi":"arxiv-2409.06960","DOIUrl":null,"url":null,"abstract":"In the search for new particles in high-energy physics, it is crucial to\nselect the Signal Region (SR) in such a way that it is enriched with signal\nevents if they are present. While most existing search methods set the region\nrelying on prior domain knowledge, it may be unavailable for a completely novel\nparticle that falls outside the current scope of understanding. We address this\nissue by proposing a method built upon a model-agnostic but often realistic\nassumption about the localized topology of the signal events, in which they are\nconcentrated in a certain area of the feature space. Considering the signal\ncomponent as a localized high-frequency feature, our approach employs the\nnotion of a low-pass filter. We define the SR as an area which is most affected\nwhen the observed events are smeared with additive random noise. We overcome\nchallenges in density estimation in the high-dimensional feature space by\nlearning the density ratio of events that potentially include a signal to the\ncomplementary observation of events that closely resemble the target events but\nare free of any signals. By applying our method to simulated $\\mathrm{HH}\n\\rightarrow 4b$ events, we demonstrate that the method can efficiently identify\na data-driven SR in a high-dimensional feature space in which a high portion of\nsignal events concentrate.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward Model-Agnostic Detection of New Physics Using Data-Driven Signal Regions\",\"authors\":\"Soheun Yi, John Alison, Mikael Kuusela\",\"doi\":\"arxiv-2409.06960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the search for new particles in high-energy physics, it is crucial to\\nselect the Signal Region (SR) in such a way that it is enriched with signal\\nevents if they are present. While most existing search methods set the region\\nrelying on prior domain knowledge, it may be unavailable for a completely novel\\nparticle that falls outside the current scope of understanding. We address this\\nissue by proposing a method built upon a model-agnostic but often realistic\\nassumption about the localized topology of the signal events, in which they are\\nconcentrated in a certain area of the feature space. Considering the signal\\ncomponent as a localized high-frequency feature, our approach employs the\\nnotion of a low-pass filter. We define the SR as an area which is most affected\\nwhen the observed events are smeared with additive random noise. We overcome\\nchallenges in density estimation in the high-dimensional feature space by\\nlearning the density ratio of events that potentially include a signal to the\\ncomplementary observation of events that closely resemble the target events but\\nare free of any signals. By applying our method to simulated $\\\\mathrm{HH}\\n\\\\rightarrow 4b$ events, we demonstrate that the method can efficiently identify\\na data-driven SR in a high-dimensional feature space in which a high portion of\\nsignal events concentrate.\",\"PeriodicalId\":501340,\"journal\":{\"name\":\"arXiv - STAT - Machine Learning\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在寻找高能物理中的新粒子时,关键是要选择信号区域(SR),以便在出现信号事件时能使其丰富起来。虽然现有的大多数搜索方法都是根据先前的领域知识来设定区域,但对于超出当前理解范围的全新粒子来说,这种方法可能是不可用的。为了解决这个问题,我们提出了一种方法,该方法建立在与模型无关但通常符合实际的信号事件局部拓扑假设之上,即信号事件集中在特征空间的某个区域。考虑到信号分量是局部高频特征,我们的方法采用了低通滤波器。我们将 SR 定义为当观测到的事件被加性随机噪声玷污时受影响最大的区域。我们通过学习可能包含信号的事件与与目标事件非常相似但没有任何信号的事件的互补观测密度比,克服了在高维特征空间中进行密度估计的挑战。通过将我们的方法应用于模拟的$\mathrm{HH}\rightarrow 4b$事件,我们证明了该方法可以在信号事件高度集中的高维特征空间中有效识别数据驱动的SR。
Toward Model-Agnostic Detection of New Physics Using Data-Driven Signal Regions
In the search for new particles in high-energy physics, it is crucial to
select the Signal Region (SR) in such a way that it is enriched with signal
events if they are present. While most existing search methods set the region
relying on prior domain knowledge, it may be unavailable for a completely novel
particle that falls outside the current scope of understanding. We address this
issue by proposing a method built upon a model-agnostic but often realistic
assumption about the localized topology of the signal events, in which they are
concentrated in a certain area of the feature space. Considering the signal
component as a localized high-frequency feature, our approach employs the
notion of a low-pass filter. We define the SR as an area which is most affected
when the observed events are smeared with additive random noise. We overcome
challenges in density estimation in the high-dimensional feature space by
learning the density ratio of events that potentially include a signal to the
complementary observation of events that closely resemble the target events but
are free of any signals. By applying our method to simulated $\mathrm{HH}
\rightarrow 4b$ events, we demonstrate that the method can efficiently identify
a data-driven SR in a high-dimensional feature space in which a high portion of
signal events concentrate.