arXiv - CS - Machine Learning最新文献

英文中文

DEMAU: Decompose, Explore, Model and Analyse Uncertainties DEMAU：分解、探索、模拟和分析不确定性

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.08105

Arthur Hoarau, Vincent Lemaire

Recent research in machine learning has given rise to a flourishingliterature on the quantification and decomposition of model uncertainty. Thisinformation can be very useful during interactions with the learner, such as inactive learning or adaptive learning, and especially in uncertainty sampling.To allow a simple representation of these total, epistemic (reducible) andaleatoric (irreducible) uncertainties, we offer DEMAU, an open-sourceeducational, exploratory and analytical tool allowing to visualize and exploreseveral types of uncertainty for classification models in machine learning.

最近的机器学习研究催生了大量关于模型不确定性量化和分解的文献。这些信息在与学习者的交互过程中非常有用，比如非主动学习或自适应学习，尤其是在不确定性采样中。为了能够简单地表示这些总的不确定性、认识论的（可还原的）不确定性和理论的（不可还原的）不确定性，我们提供了 DEMAU，这是一个开源的教育、探索和分析工具，可以可视化和探索机器学习中分类模型的各种类型的不确定性。

引用次数: 0

XMOL: Explainable Multi-property Optimization of Molecules XMOL：可解释的分子多性能优化

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.07786

Aye Phyu Phyu Aung, Jay Chaudhary, Ji Wei Yoon, Senthilnath Jayavelu

Molecular optimization is a key challenge in drug discovery and materialscience domain, involving the design of molecules with desired properties.Existing methods focus predominantly on single-property optimization,necessitating repetitive runs to target multiple properties, which isinefficient and computationally expensive. Moreover, these methods often lacktransparency, making it difficult for researchers to understand and control theoptimization process. To address these issues, we propose a novel framework,Explainable Multi-property Optimization of Molecules (XMOL), to optimizemultiple molecular properties simultaneously while incorporatingexplainability. Our approach builds on state-of-the-art geometric diffusionmodels, extending them to multi-property optimization through the introductionof spectral normalization and enhanced molecular constraints for stabilizedtraining. Additionally, we integrate interpretive and explainable techniquesthroughout the optimization process. We evaluated XMOL on the real-worldmolecular datasets i.e., QM9, demonstrating its effectiveness in both singleproperty and multiple properties optimization while offering interpretableresults, paving the way for more efficient and reliable molecular design.

分子优化是药物发现和材料科学领域的一项关键挑战，涉及设计具有所需性质的分子。现有方法主要侧重于单一性质的优化，需要针对多种性质重复运行，效率低且计算成本高。此外，这些方法往往缺乏透明度，使研究人员难以理解和控制优化过程。为了解决这些问题，我们提出了一个新颖的框架--可解释的分子多特性优化（XMOL），在结合可解释性的同时优化多种分子特性。我们的方法以最先进的几何扩散模型为基础，通过引入光谱归一化和增强的稳定训练分子约束，将其扩展到多属性优化。此外，我们还在整个优化过程中整合了解释性和可解释性技术。我们在真实世界的分子数据集（如 QM9）上对 XMOL 进行了评估，证明了它在单属性和多属性优化中的有效性，同时提供了可解释的结果，为更高效、更可靠的分子设计铺平了道路。

{"title":"XMOL: Explainable Multi-property Optimization of Molecules","authors":"Aye Phyu Phyu Aung, Jay Chaudhary, Ji Wei Yoon, Senthilnath Jayavelu","doi":"arxiv-2409.07786","DOIUrl":"https://doi.org/arxiv-2409.07786","url":null,"abstract":"Molecular optimization is a key challenge in drug discovery and material\u0000science domain, involving the design of molecules with desired properties.\u0000Existing methods focus predominantly on single-property optimization,\u0000necessitating repetitive runs to target multiple properties, which is\u0000inefficient and computationally expensive. Moreover, these methods often lack\u0000transparency, making it difficult for researchers to understand and control the\u0000optimization process. To address these issues, we propose a novel framework,\u0000Explainable Multi-property Optimization of Molecules (XMOL), to optimize\u0000multiple molecular properties simultaneously while incorporating\u0000explainability. Our approach builds on state-of-the-art geometric diffusion\u0000models, extending them to multi-property optimization through the introduction\u0000of spectral normalization and enhanced molecular constraints for stabilized\u0000training. Additionally, we integrate interpretive and explainable techniques\u0000throughout the optimization process. We evaluated XMOL on the real-world\u0000molecular datasets i.e., QM9, demonstrating its effectiveness in both single\u0000property and multiple properties optimization while offering interpretable\u0000results, paving the way for more efficient and reliable molecular design.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding BLens：利用集合嵌入对二进制函数进行对比式字幕制作

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.07889

Tristan Benoit, Yunru Wang, Moritz Dannehl, Johannes Kinder

Function names can greatly aid human reverse engineers, which has spurreddevelopment of machine learning-based approaches to predicting function namesin stripped binaries. Much current work in this area now uses transformers,applying a metaphor of machine translation from code to function names. Still,function naming models face challenges in generalizing to projects completelyunrelated to the training set. In this paper, we take a completely new approachby transferring advances in automated image captioning to the domain of binaryreverse engineering, such that different parts of a binary function can beassociated with parts of its name. We propose BLens, which combines multiplebinary function embeddings into a new ensemble representation, aligns it withthe name representation latent space via a contrastive learning approach, andgenerates function names with a transformer architecture tailored for functionnames. In our experiments, we demonstrate that BLens significantly outperformsthe state of the art. In the usual setting of splitting per binary, we achievean $F_1$ score of 0.77 compared to 0.67. Moreover, in the cross-projectsetting, which emphasizes generalizability, we achieve an $F_1$ score of 0.46compared to 0.29.

函数名可以极大地帮助人类逆向工程师，这也刺激了基于机器学习的方法的发展，以预测剥离二进制文件中的函数名。目前，该领域的许多工作都使用了转换器，应用了从代码到函数名的机器翻译隐喻。不过，函数命名模型在推广到与训练集完全无关的项目时仍然面临挑战。在本文中，我们采用了一种全新的方法，将自动图像字幕技术的进步应用到二进制逆向工程领域，从而将二进制函数的不同部分与其名称的不同部分联系起来。我们提出了 BLens，它将多个二进制函数嵌入结合到一个新的集合表示中，通过对比学习方法将其与名称表示的潜在空间对齐，并通过专为函数名称定制的转换器架构生成函数名称。我们在实验中证明，BLens 的性能明显优于现有技术。在按二进制拆分的常规设置中，我们的 $F_1$ 得分为 0.77，而后者为 0.67。此外，在强调通用性的跨项目设置中，我们的 $F_1$ 得分为 0.46，而之前的得分为 0.29。

{"title":"BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding","authors":"Tristan Benoit, Yunru Wang, Moritz Dannehl, Johannes Kinder","doi":"arxiv-2409.07889","DOIUrl":"https://doi.org/arxiv-2409.07889","url":null,"abstract":"Function names can greatly aid human reverse engineers, which has spurred\u0000development of machine learning-based approaches to predicting function names\u0000in stripped binaries. Much current work in this area now uses transformers,\u0000applying a metaphor of machine translation from code to function names. Still,\u0000function naming models face challenges in generalizing to projects completely\u0000unrelated to the training set. In this paper, we take a completely new approach\u0000by transferring advances in automated image captioning to the domain of binary\u0000reverse engineering, such that different parts of a binary function can be\u0000associated with parts of its name. We propose BLens, which combines multiple\u0000binary function embeddings into a new ensemble representation, aligns it with\u0000the name representation latent space via a contrastive learning approach, and\u0000generates function names with a transformer architecture tailored for function\u0000names. In our experiments, we demonstrate that BLens significantly outperforms\u0000the state of the art. In the usual setting of splitting per binary, we achieve\u0000an $F_1$ score of 0.77 compared to 0.67. Moreover, in the cross-project\u0000setting, which emphasizes generalizability, we achieve an $F_1$ score of 0.46\u0000compared to 0.29.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Alignment with Preference Optimization Is All You Need for LLM Safety 与偏好优化保持一致是保证 LLM 安全的必要条件

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.07772

Reda Alami, Ali Khalifa Almansoori, Ahmed Alzubaidi, Mohamed El Amine Seddik, Mugariya Farooq, Hakim Hacid

We demonstrate that preference optimization methods can effectively enhanceLLM safety. Applying various alignment techniques to the Falcon 11B model usingsafety datasets, we achieve a significant boost in global safety score (from$57.64%$ to $99.90%$) as measured by LlamaGuard 3 8B, competing withstate-of-the-art models. On toxicity benchmarks, average scores in adversarialsettings dropped from over $0.6$ to less than $0.07$. However, this safetyimprovement comes at the cost of reduced general capabilities, particularly inmath, suggesting a trade-off. We identify noise contrastive alignment(Safe-NCA) as an optimal method for balancing safety and performance. Our studyultimately shows that alignment techniques can be sufficient for building safeand robust models.

我们证明了偏好优化方法可以有效提高LLM 的安全性。在使用安全数据集的 Falcon 11B 模型中应用各种配准技术后，我们显著提高了 LlamaGuard 3 8B 测定的全球安全得分（从 57.64%$ 提高到 99.90%$），与最先进的模型不相上下。在毒性基准上，对抗环境中的平均得分从 0.6 美元以上降至 0.07 美元以下。然而，这种安全性的提高是以通用能力的降低为代价的，特别是在数学方面，这表明需要权衡利弊。我们认为噪声对比对齐（Safe-NCA）是平衡安全性和性能的最佳方法。我们的研究最终表明，配准技术足以构建安全而稳健的模型。

引用次数: 0

Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification algorithms 使用 Nvidia GPU 和混合精度训练分类算法，改善机器学习的碳足迹

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.07853

Andrew Antonopoulos

This study was part of my dissertation for my master degree and compares thepower consumption using the default floating point (32bit) and Nvidia mixedprecision (16bit and 32bit) while training a classification ML model. A customPC with specific hardware was built to perform the experiments, and differentML hyper-parameters, such as batch size, neurons, and epochs, were chosen tobuild Deep Neural Networks (DNN). Additionally, various software was usedduring the experiments to collect the power consumption data in Watts from theGraphics Processing Unit (GPU), Central Processing Unit (CPU), Random AccessMemory (RAM) and manually from a wattmeter connected to the wall. Abenchmarking test with default hyper parameter values for the DNN was used as areference, while the experiments used a combination of different settings. Theresults were recorded in Excel, and descriptive statistics were chosen tocalculate the mean between the groups and compare them using graphs and tables.The outcome was positive when using mixed precision combined with specifichyper-parameters. Compared to the benchmarking, the optimisation for theclassification reduced the power consumption between 7 and 11 Watts. Similarly,the carbon footprint is reduced because the calculation uses the same powerconsumption data. Still, a consideration is required when configuringhyper-parameters because it can negatively affect hardware performance.However, this research required inferential statistics, specifically ANOVA andT-test, to compare the relationship between the means. Furthermore, testsindicated no statistical significance of the relationship between thebenchmarking and experiments. However, a more extensive implementation with acluster of GPUs can increase the sample size significantly, as it is anessential factor and can change the outcome of the statistical analysis.

本研究是我硕士论文的一部分，比较了在训练分类 ML 模型时使用默认浮点（32 位）和 Nvidia 混合精度（16 位和 32 位）的功耗。为了进行实验，我们构建了一台具有特定硬件的定制电脑，并选择了不同的 ML 超参数，如批量大小、神经元和历时，以构建深度神经网络（DNN）。此外，在实验过程中还使用了各种软件从图形处理器（GPU）、中央处理器（CPU）、随机存取存储器（RAM）收集功耗数据（单位：瓦特），并通过连接到墙上的电表手动收集数据。基准测试使用 DNN 的默认超参数值作为参考，而实验则使用不同设置的组合。实验结果记录在 Excel 中，并选择了描述性统计来计算各组之间的平均值，并使用图表对其进行比较。与基准相比，分类优化降低了 7 到 11 瓦特的功耗。同样，碳足迹也减少了，因为计算使用了相同的功耗数据。不过，在配置超参数时仍需考虑，因为这会对硬件性能产生负面影响。然而，这项研究需要推断统计，特别是方差分析和T检验，以比较平均值之间的关系。此外，测试表明基准测试和实验之间的关系在统计上并不显著。然而，使用 GPU 群集进行更广泛的实施可以显著增加样本量，因为它是一个重要因素，可以改变统计分析的结果。

{"title":"Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification algorithms","authors":"Andrew Antonopoulos","doi":"arxiv-2409.07853","DOIUrl":"https://doi.org/arxiv-2409.07853","url":null,"abstract":"This study was part of my dissertation for my master degree and compares the\u0000power consumption using the default floating point (32bit) and Nvidia mixed\u0000precision (16bit and 32bit) while training a classification ML model. A custom\u0000PC with specific hardware was built to perform the experiments, and different\u0000ML hyper-parameters, such as batch size, neurons, and epochs, were chosen to\u0000build Deep Neural Networks (DNN). Additionally, various software was used\u0000during the experiments to collect the power consumption data in Watts from the\u0000Graphics Processing Unit (GPU), Central Processing Unit (CPU), Random Access\u0000Memory (RAM) and manually from a wattmeter connected to the wall. A\u0000benchmarking test with default hyper parameter values for the DNN was used as a\u0000reference, while the experiments used a combination of different settings. The\u0000results were recorded in Excel, and descriptive statistics were chosen to\u0000calculate the mean between the groups and compare them using graphs and tables.\u0000The outcome was positive when using mixed precision combined with specific\u0000hyper-parameters. Compared to the benchmarking, the optimisation for the\u0000classification reduced the power consumption between 7 and 11 Watts. Similarly,\u0000the carbon footprint is reduced because the calculation uses the same power\u0000consumption data. Still, a consideration is required when configuring\u0000hyper-parameters because it can negatively affect hardware performance.\u0000However, this research required inferential statistics, specifically ANOVA and\u0000T-test, to compare the relationship between the means. Furthermore, tests\u0000indicated no statistical significance of the relationship between the\u0000benchmarking and experiments. However, a more extensive implementation with a\u0000cluster of GPUs can increase the sample size significantly, as it is an\u0000essential factor and can change the outcome of the statistical analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT 大语言模型是模式匹配器：使用 ChatGPT 编辑半结构化和结构化文档

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.07732

Irene Weber

Large Language Models (LLMs) offer numerous applications, the full extent ofwhich is not yet understood. This paper investigates if LLMs can be applied forediting structured and semi-structured documents with minimal effort. Using aqualitative research approach, we conduct two case studies with ChatGPT andthoroughly analyze the results. Our experiments indicate that LLMs caneffectively edit structured and semi-structured documents when provided withbasic, straightforward prompts. ChatGPT demonstrates a strong ability torecognize and process the structure of annotated documents. This suggests thatexplicitly structuring tasks and data in prompts might enhance an LLM's abilityto understand and solve tasks. Furthermore, the experiments also revealimpressive pattern matching skills in ChatGPT. This observation deservesfurther investigation, as it may contribute to understanding the processesleading to hallucinations in LLMs.

大型语言模型（LLMs）的应用领域非常广泛，但人们还不了解其全部范围。本文研究了 LLM 是否能在编辑结构化和半结构化文档时以最小的工作量得到应用。我们采用定量研究方法，使用 ChatGPT 进行了两项案例研究，并对结果进行了全面分析。我们的实验表明，当提供简单明了的提示时，LLM 可以有效地编辑结构化和半结构化文档。ChatGPT 展示了识别和处理注释文档结构的强大能力。这表明，在提示中明确提出任务和数据的结构可能会提高 LLM 理解和解决任务的能力。此外，实验还揭示了 ChatGPT 令人印象深刻的模式匹配技能。这一观察结果值得进一步研究，因为它可能有助于理解导致 LLM 产生幻觉的过程。

引用次数: 0

Heterogeneous Sheaf Neural Networks 异构片状神经网络

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.08036

Luke Braithwaite, Iulia Duta, Pietro Liò

Heterogeneous graphs, with nodes and edges of different types, are commonlyused to model relational structures in many real-world applications. StandardGraph Neural Networks (GNNs) struggle to process heterogeneous data due tooversmoothing. Instead, current approaches have focused on accounting for theheterogeneity in the model architecture, leading to increasingly complexmodels. Inspired by recent work, we propose using cellular sheaves to model theheterogeneity in the graph's underlying topology. Instead of modelling the dataas a graph, we represent it as cellular sheaves, which allows us to encode thedifferent data types directly in the data structure, eliminating the need toinject them into the architecture. We introduce HetSheaf, a general frameworkfor heterogeneous sheaf neural networks, and a series of heterogeneous sheafpredictors to better encode the data's heterogeneity into the sheaf structure.Finally, we empirically evaluate HetSheaf on several standard heterogeneousgraph benchmarks, achieving competitive results whilst being moreparameter-efficient.

在许多实际应用中，具有不同类型节点和边的异构图通常被用来模拟关系结构。由于过度平滑，标准图神经网络（GNN）难以处理异构数据。相反，当前的方法侧重于在模型架构中考虑异质性，从而导致模型越来越复杂。受近期工作的启发，我们提议使用蜂窝剪切来模拟图的底层拓扑中的异质性。我们不再将数据建模为图，而是将其表示为蜂窝切弗，这样我们就可以直接在数据结构中编码不同的数据类型，而无需将它们注入到架构中。最后，我们在几个标准异构图基准上对 HetSheaf 进行了实证评估，在获得具有竞争力的结果的同时，还提高了参数效率。

引用次数: 0

FPMT: Enhanced Semi-Supervised Model for Traffic Incident Detection FPMT：用于交通事故检测的增强型半监督模型

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.07839

Xinying Lu, Jianli Xiao

For traffic incident detection, the acquisition of data and labels is notablyresource-intensive, rendering semi-supervised traffic incident detection both aformidable and consequential challenge. Thus, this paper focuses on trafficincident detection with a semi-supervised learning way. It proposes asemi-supervised learning model named FPMT within the framework of MixText. Thedata augmentation module introduces Generative Adversarial Networks to balanceand expand the dataset. During the mix-up process in the hidden space, itemploys a probabilistic pseudo-mixing mechanism to enhance regularization andelevate model precision. In terms of training strategy, it initiates withunsupervised training on all data, followed by supervised fine-tuning on asubset of labeled data, and ultimately completing the goal of semi-supervisedtraining. Through empirical validation on four authentic datasets, our FPMTmodel exhibits outstanding performance across various metrics. Particularlynoteworthy is its robust performance even in scenarios with low label rates.

对于交通事故检测而言，数据和标签的获取显然是资源密集型的，这使得半监督交通事故检测成为一项艰巨而又重要的挑战。因此，本文重点关注采用半监督学习方式的交通事故检测。它在 MixText 框架内提出了一个名为 FPMT 的半监督学习模型。数据增强模块引入生成对抗网络（Generative Adversarial Networks）来平衡和扩展数据集。在隐藏空间的混合过程中，它采用了概率伪混合机制来增强正则化和提高模型精度。在训练策略上，它首先对所有数据进行无监督训练，然后对标注数据的子集进行监督微调，最终完成半监督训练的目标。通过在四个真实数据集上的经验验证，我们的 FPMT 模型在各种指标上都表现出了卓越的性能。尤其值得注意的是，即使在标签率较低的情况下，它的性能也很稳定。

引用次数: 0

A framework for measuring the training efficiency of a neural architecture 衡量神经架构训练效率的框架

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.07925

Eduardo Cueto-Mendoza, John D. Kelleher

Measuring Efficiency in neural network system development is an open researchproblem. This paper presents an experimental framework to measure the trainingefficiency of a neural architecture. To demonstrate our approach, we analyzethe training efficiency of Convolutional Neural Networks and Bayesianequivalents on the MNIST and CIFAR-10 tasks. Our results show that trainingefficiency decays as training progresses and varies across different stoppingcriteria for a given neural model and learning task. We also find a non-linearrelationship between training stopping criteria, training Efficiency, modelsize, and training Efficiency. Furthermore, we illustrate the potential confounding effects of overtrainingon measuring the training efficiency of a neural architecture. Regardingrelative training efficiency across different architectures, our resultsindicate that CNNs are more efficient than BCNNs on both datasets. Moregenerally, as a learning task becomes more complex, the relative difference intraining efficiency between different architectures becomes more pronounced.

衡量神经网络系统开发的效率是一个尚未解决的研究问题。本文提出了一个测量神经架构训练效率的实验框架。为了证明我们的方法，我们分析了卷积神经网络和贝叶斯等效网络在 MNIST 和 CIFAR-10 任务上的训练效率。我们的结果表明，训练效率会随着训练的进行而下降，并且在给定神经模型和学习任务的不同停止标准下也会有所不同。我们还发现训练停止标准、训练效率、模型大小和训练效率之间存在非线性关系。此外，我们还说明了过度训练对衡量神经架构训练效率的潜在干扰效应。关于不同架构的相对训练效率，我们的结果表明，在两个数据集上，CNN 比 BCNN 更有效率。一般来说，随着学习任务变得越来越复杂，不同架构之间训练效率的相对差异也会越来越明显。

引用次数: 0

Click2Mask: Local Editing with Dynamic Mask Generation Click2Mask：本地编辑与动态蒙版生成

arXiv - CS - Machine Learning

Pub Date : 2024-09-12 DOI: arxiv-2409.08272

Omer Regev, Omri Avrahami, Dani Lischinski

Recent advancements in generative models have revolutionized image generationand editing, making these tasks accessible to non-experts. This paper focuseson local image editing, particularly the task of adding new content to aloosely specified area. Existing methods often require a precise mask or adetailed description of the location, which can be cumbersome and prone toerrors. We propose Click2Mask, a novel approach that simplifies the localediting process by requiring only a single point of reference (in addition tothe content description). A mask is dynamically grown around this point duringa Blended Latent Diffusion (BLD) process, guided by a masked CLIP-basedsemantic loss. Click2Mask surpasses the limitations of segmentation-based andfine-tuning dependent methods, offering a more user-friendly and contextuallyaccurate solution. Our experiments demonstrate that Click2Mask not onlyminimizes user effort but also delivers competitive or superior local imagemanipulation results compared to SoTA methods, according to both humanjudgement and automatic metrics. Key contributions include the simplificationof user input, the ability to freely add objects unconstrained by existingsegments, and the integration potential of our dynamic mask approach withinother editing methods.

生成模型的最新进展彻底改变了图像的生成和编辑，使非专业人员也能完成这些任务。本文的重点是局部图像编辑，尤其是在随意指定的区域添加新内容的任务。现有的方法通常需要精确的遮罩或对位置的详细描述，既麻烦又容易出错。我们提出的 Click2Mask 是一种新颖的方法，它只需要一个参考点（除内容描述外），从而简化了本地编辑过程。在混合潜在扩散（BLD）过程中，在基于掩码 CLIP 语义损失的指导下，围绕该点动态生成掩码。Click2Mask 超越了基于分割的方法和依赖于微调的方法的局限性，提供了一种更加用户友好和上下文准确的解决方案。我们的实验证明，与 SoTA 方法相比，Click2Mask 不仅最大限度地减少了用户的工作量，而且根据人工判断和自动指标，它还能提供具有竞争力或更优越的局部图像处理结果。我们的主要贡献包括简化了用户输入，能够不受现有片段的限制自由添加对象，以及我们的动态遮罩方法在其他编辑方法中的整合潜力。

{"title":"Click2Mask: Local Editing with Dynamic Mask Generation","authors":"Omer Regev, Omri Avrahami, Dani Lischinski","doi":"arxiv-2409.08272","DOIUrl":"https://doi.org/arxiv-2409.08272","url":null,"abstract":"Recent advancements in generative models have revolutionized image generation\u0000and editing, making these tasks accessible to non-experts. This paper focuses\u0000on local image editing, particularly the task of adding new content to a\u0000loosely specified area. Existing methods often require a precise mask or a\u0000detailed description of the location, which can be cumbersome and prone to\u0000errors. We propose Click2Mask, a novel approach that simplifies the local\u0000editing process by requiring only a single point of reference (in addition to\u0000the content description). A mask is dynamically grown around this point during\u0000a Blended Latent Diffusion (BLD) process, guided by a masked CLIP-based\u0000semantic loss. Click2Mask surpasses the limitations of segmentation-based and\u0000fine-tuning dependent methods, offering a more user-friendly and contextually\u0000accurate solution. Our experiments demonstrate that Click2Mask not only\u0000minimizes user effort but also delivers competitive or superior local image\u0000manipulation results compared to SoTA methods, according to both human\u0000judgement and automatic metrics. Key contributions include the simplification\u0000of user input, the ability to freely add objects unconstrained by existing\u0000segments, and the integration potential of our dynamic mask approach within\u0000other editing methods.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Machine Learning

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀