首页 > 最新文献

International Journal of Computer Vision最新文献

英文 中文
When Visual Privacy Protection Meets Multimodal Large Language Models 当视觉隐私保护遇到多模态大语言模型时
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-07 DOI: 10.1007/s11263-026-02761-y
Xiaofei Hui, Qian Wu, Haoxuan Qu, Majid Mirmehdi, Hossein Rahmani, Jun Liu
The emergence of Multimodal Large Language Models (MLLMs) and the widespread usage of MLLM cloud services such as GPT-4V raised great concerns about privacy leakage in visual data. As these models are typically deployed in cloud services, users are required to submit their images and videos, posing serious privacy risks. However, how to tackle such privacy concerns is an under-explored problem. Thus, in this paper, we aim to conduct a new investigation to protect visual privacy when enjoying the convenience brought by MLLM services. We address the practical case where the MLLM is a “black box”, i.e., we only have access to its input and output without knowing its internal model information. To tackle such a challenging yet demanding problem, we propose a novel framework, in which we carefully design the learning objective with Pareto optimality to seek a better trade-off between visual privacy and MLLM’s performance, and propose critical-history enhanced optimization to effectively optimize the framework with the black-box MLLM. Our experiments show that our method is effective on different benchmarks.
多模态大语言模型(Multimodal Large Language Models, MLLM)的出现以及GPT-4V等MLLM云服务的广泛使用,引起了人们对视觉数据隐私泄露的极大关注。由于这些模型通常部署在云服务中,用户需要提交他们的图像和视频,这带来了严重的隐私风险。然而,如何解决这些隐私问题是一个未被充分探讨的问题。因此,在本文中,我们旨在对在享受mlm服务带来的便利的同时,如何保护视觉隐私进行新的研究。我们解决了MLLM是一个“黑盒”的实际情况,也就是说,我们只能访问它的输入和输出,而不知道它的内部模型信息。为了解决这一具有挑战性且要求较高的问题,我们提出了一种新的框架,其中我们使用Pareto最优来精心设计学习目标,以寻求视觉隐私和MLLM性能之间更好的权衡,并提出了关键历史增强优化,以有效地优化框架与黑盒MLLM。我们的实验表明,我们的方法在不同的基准测试中是有效的。
{"title":"When Visual Privacy Protection Meets Multimodal Large Language Models","authors":"Xiaofei Hui, Qian Wu, Haoxuan Qu, Majid Mirmehdi, Hossein Rahmani, Jun Liu","doi":"10.1007/s11263-026-02761-y","DOIUrl":"https://doi.org/10.1007/s11263-026-02761-y","url":null,"abstract":"The emergence of Multimodal Large Language Models (MLLMs) and the widespread usage of MLLM cloud services such as GPT-4V raised great concerns about privacy leakage in visual data. As these models are typically deployed in cloud services, users are required to submit their images and videos, posing serious privacy risks. However, how to tackle such privacy concerns is an under-explored problem. Thus, in this paper, we aim to conduct a new investigation to protect visual privacy when enjoying the convenience brought by MLLM services. We address the practical case where the MLLM is a “black box”, i.e., we only have access to its input and output without knowing its internal model information. To tackle such a challenging yet demanding problem, we propose a novel framework, in which we carefully design the learning objective with Pareto optimality to seek a better trade-off between visual privacy and MLLM’s performance, and propose critical-history enhanced optimization to effectively optimize the framework with the black-box MLLM. Our experiments show that our method is effective on different benchmarks.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"46 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147374226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative Global Mapping-Local Searching for Heterogeneous Change Detection with Unregistered Images 未配准图像异构变化检测的迭代全局映射-局部搜索
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-025-02719-6
Yuli Sun, Junzheng Wu, Han Zhang, Zhang Li, Lin Lei, Gangyao Kuang
{"title":"Iterative Global Mapping-Local Searching for Heterogeneous Change Detection with Unregistered Images","authors":"Yuli Sun, Junzheng Wu, Han Zhang, Zhang Li, Lin Lei, Gangyao Kuang","doi":"10.1007/s11263-025-02719-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02719-6","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"68 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object-Scene-Camera Decomposition and Recomposition for Data Efficient Monocular 3D Object Detection 目标-场景-相机分解和重构用于数据高效的单目三维目标检测
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-026-02755-w
Zhaonian Kuang, Rui Ding, Meng Yang, Xinhu Zheng, Gang Hua
{"title":"Object-Scene-Camera Decomposition and Recomposition for Data Efficient Monocular 3D Object Detection","authors":"Zhaonian Kuang, Rui Ding, Meng Yang, Xinhu Zheng, Gang Hua","doi":"10.1007/s11263-026-02755-w","DOIUrl":"https://doi.org/10.1007/s11263-026-02755-w","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"31 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era 揭示深层阴影:深度学习时代图像和视频阴影检测,去除和生成的调查和基准
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-026-02744-z
Xiaowei Hu, Zhenghao Xing, Tianyu Wang, Chi-Wing Fu, Pheng-Ann Heng
{"title":"Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era","authors":"Xiaowei Hu, Zhenghao Xing, Tianyu Wang, Chi-Wing Fu, Pheng-Ann Heng","doi":"10.1007/s11263-026-02744-z","DOIUrl":"https://doi.org/10.1007/s11263-026-02744-z","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"46 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos 从上面看群动力学:在无人机视频先进的目标跟踪框架
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-025-02713-y
Pia Bideau, Duc Pham, Félicie Dhellemmes, Matthew Hansen, Jens Krause
Easily accessible technologies, such as drones equipped with diverse onboard sensors, have greatly expanded opportunities to study animal behavior in natural environments. However, analyzing large volumes of unlabeled video data, often spanning hours, remains a significant challenge for machine learning, particularly in computer vision. Existing approaches typically process only a small number of frames, and accurate georeferencing of tracked positions is still largely unresolved, particularly in dynamic environments where static landmarks cannot be established. In this work, we focus on long-term tracking of animal behavior in real-world geographic coordinates. To address this challenge, we utilize classical probabilistic methods for state estimation, such as particle filtering. Particle filters offer a useful algorithmic structure for recursively adding new incoming information and thus ensuring time consistency. By incorporating recent developments in semantic object segmentation, we enable continuous tracking of rapidly evolving object formations, even in scenarios with limited data availability. We propose a novel approach for tracking schools of fish in the open ocean from drone videos. Our framework not only performs classical object tracking in image coordinates, instead it additionally tracks the position and spatial expansion of the fish school in geographic coordinates by fusing video data and the drone’s on board sensor information (GPS and IMU). No landmarks with known geographic coordinates are required, making the proposed method adaptable to unstructured, dynamic environments like the open ocean, where static landmarks are unavailable. With this, the presented framework enables researchers to study the collective behavior of fish schools within their social and environmental context.
容易获得的技术,如配备各种机载传感器的无人机,极大地扩大了在自然环境中研究动物行为的机会。然而,分析大量未标记的视频数据,通常跨越数小时,仍然是机器学习的重大挑战,特别是在计算机视觉方面。现有的方法通常只处理少量帧,并且跟踪位置的精确地理参考在很大程度上仍然没有解决,特别是在无法建立静态地标的动态环境中。在这项工作中,我们专注于在现实世界的地理坐标中对动物行为的长期跟踪。为了解决这一挑战,我们利用经典的概率方法进行状态估计,如粒子滤波。粒子滤波器提供了一种有用的算法结构,用于递归地添加新的传入信息,从而确保时间一致性。通过结合语义对象分割的最新发展,我们能够持续跟踪快速发展的对象形成,即使在数据可用性有限的情况下。我们提出了一种从无人机视频中跟踪公海鱼群的新方法。我们的框架不仅在图像坐标中执行经典的目标跟踪,而是通过融合视频数据和无人机的机载传感器信息(GPS和IMU),在地理坐标中跟踪鱼群的位置和空间扩展。不需要已知地理坐标的地标,使所提出的方法适用于像开放海洋这样的非结构化动态环境,在这些环境中,静态地标是不可用的。有了这个,提出的框架使研究人员能够在其社会和环境背景下研究鱼群的集体行为。
{"title":"Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos","authors":"Pia Bideau, Duc Pham, Félicie Dhellemmes, Matthew Hansen, Jens Krause","doi":"10.1007/s11263-025-02713-y","DOIUrl":"https://doi.org/10.1007/s11263-025-02713-y","url":null,"abstract":"Easily accessible technologies, such as drones equipped with diverse onboard sensors, have greatly expanded opportunities to study animal behavior in natural environments. However, analyzing large volumes of unlabeled video data, often spanning hours, remains a significant challenge for machine learning, particularly in computer vision. Existing approaches typically process only a small number of frames, and accurate georeferencing of tracked positions is still largely unresolved, particularly in dynamic environments where static landmarks cannot be established. In this work, we focus on long-term tracking of animal behavior in real-world geographic coordinates. To address this challenge, we utilize classical probabilistic methods for state estimation, such as particle filtering. Particle filters offer a useful algorithmic structure for recursively adding new incoming information and thus ensuring time consistency. By incorporating recent developments in semantic object segmentation, we enable continuous tracking of rapidly evolving object formations, even in scenarios with limited data availability. We propose a novel approach for tracking schools of fish in the open ocean from drone videos. Our framework not only performs classical object tracking in image coordinates, instead it additionally tracks the position and spatial expansion of the fish school in geographic coordinates by fusing video data and the drone’s on board sensor information (GPS and IMU). No landmarks with known geographic coordinates are required, making the proposed method adaptable to unstructured, dynamic environments like the open ocean, where static landmarks are unavailable. With this, the presented framework enables researchers to study the collective behavior of fish schools within their social and environmental context.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"56 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts HiPrompt:具有分层mlm提示符的免调优高分辨率生成
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-026-02736-z
Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Yan Li, Chi-Min Chan, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo
The potential for higher-resolution image generation using pretrained diffusion models is immense. However, these models often struggle with object repetition and structural artifacts especially when scaling to 4K resolution and beyond. Our analysis reveals that causes the problem, a single prompt for the generation of multiple scales provides insufficient efficacy. To address this, we propose HiPrompt, a new tuning-free solution that tackles the above problems by introducing hierarchical prompts. The hierarchical prompts provide both global and local semantic guidance. Specifically, the global prompt captures overall scene semantics from user input, while local guidance comes from patch-wise descriptions generated by MLLMs to refine regional structures and textures. Furthermore, during inverse denoising, noise is decomposed into low- and high-frequency components, each conditioned on different prompt levels, facilitating prompt-guided denoising under hierarchical semantic guidance. It further allows the generation to focus more on local spatial regions and ensures the generated images maintain coherent local and global semantics, structures, and textures with high definition. Extensive experiments demonstrate that HiPrompt outperforms state-of-the-art works in higher-resolution image generation, significantly reducing object repetition and enhancing structural quality. The demo and code can be found on the project website: https://liuxinyv.github.io/HiPrompt/ .
使用预训练扩散模型生成高分辨率图像的潜力是巨大的。然而,这些模型经常与对象重复和结构工件作斗争,特别是在缩放到4K分辨率及更高分辨率时。我们的分析表明,导致问题的原因是,单一的提示对多个量表的生成没有足够的功效。为了解决这个问题,我们提出了HiPrompt,这是一种新的无需调优的解决方案,通过引入分层提示来解决上述问题。分层提示提供全局和局部语义指导。具体来说,全局提示从用户输入中捕获整体场景语义,而局部引导来自mllm生成的补丁智能描述,以细化区域结构和纹理。此外,在逆去噪过程中,噪声被分解为低频和高频分量,每个分量都有不同的提示水平,便于分层语义引导下的提示引导去噪。它进一步允许生成更多地关注局部空间区域,并确保生成的图像保持高清晰度的连贯的局部和全局语义、结构和纹理。大量实验表明,HiPrompt在高分辨率图像生成方面优于最先进的作品,显著减少了对象重复并提高了结构质量。演示和代码可以在项目网站上找到:https://liuxinyv.github.io/HiPrompt/。
{"title":"HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts","authors":"Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Yan Li, Chi-Min Chan, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo","doi":"10.1007/s11263-026-02736-z","DOIUrl":"https://doi.org/10.1007/s11263-026-02736-z","url":null,"abstract":"The potential for higher-resolution image generation using pretrained diffusion models is immense. However, these models often struggle with object repetition and structural artifacts especially when scaling to 4K resolution and beyond. Our analysis reveals that causes the problem, a single prompt for the generation of multiple scales provides insufficient efficacy. To address this, we propose HiPrompt, a new tuning-free solution that tackles the above problems by introducing hierarchical prompts. The hierarchical prompts provide both global and local semantic guidance. Specifically, the global prompt captures overall scene semantics from user input, while local guidance comes from patch-wise descriptions generated by MLLMs to refine regional structures and textures. Furthermore, during inverse denoising, noise is decomposed into low- and high-frequency components, each conditioned on different prompt levels, facilitating prompt-guided denoising under hierarchical semantic guidance. It further allows the generation to focus more on local spatial regions and ensures the generated images maintain coherent local and global semantics, structures, and textures with high definition. Extensive experiments demonstrate that HiPrompt outperforms state-of-the-art works in higher-resolution image generation, significantly reducing object repetition and enhancing structural quality. The demo and code can be found on the project website: <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://liuxinyv.github.io/HiPrompt/\" ext-link-type=\"uri\">https://liuxinyv.github.io/HiPrompt/</jats:ext-link> .","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"53 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling Local and Global Semantics in Diffusion Models for Image Editing 图像编辑扩散模型中局部和全局语义的分离
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-025-02694-y
Manos Plitsis, Theodoros Kouzelis, Panagiotis Koromilas, Vassilis Katsouros, Mihalis A. Nicolaou, Yannis Panagakis
Diffusion models have achieved state-of-the-art image synthesis, yet unlike GANs, they lack a well-structured latent space for intuitive image editing. Existing diffusion-based editing methods often rely on supervised fine-tuning or text-based guidance, while recent unsupervised techniques leveraging the model’s bottleneck layer suffer from one or more key limitations: (i) they focus only on global attributes, (ii) fail to disentangle local and global semantics, or (iii) require extensive human intervention. To fill this gap, we first propose an unsupervised method for localized image editing in pre-trained unconditional diffusion models that disentangles local and global semantics in the model’s latent space. Given an input image and a user-specified region of interest, our approach uses the denoising network’s Jacobian to map that region to a corresponding latent subspace. We then separate this subspace into shared (global) and region-specific components to uncover latent directions that control local attributes. These directions generalize across images, enabling semantically consistent edits without retraining. We go one step further by extending our method to minimize manual supervision by automatically inferring edit directions from a single reference image and generating region masks without human input. Experiments on multiple datasets show that our method yields more localized, high-fidelity edits than state-of-the-art approaches.
扩散模型已经实现了最先进的图像合成,但与gan不同的是,它们缺乏结构良好的潜在空间来进行直观的图像编辑。现有的基于扩散的编辑方法通常依赖于监督微调或基于文本的指导,而最近利用模型瓶颈层的无监督技术则受到一个或多个关键限制:(i)它们只关注全局属性,(ii)无法区分局部和全局语义,或者(iii)需要大量的人为干预。为了填补这一空白,我们首先提出了一种在预训练无条件扩散模型中进行局部图像编辑的无监督方法,该方法在模型的潜在空间中解开了局部和全局语义。给定输入图像和用户指定的感兴趣区域,我们的方法使用去噪网络的雅可比矩阵将该区域映射到相应的潜在子空间。然后,我们将该子空间分离为共享(全局)和特定于区域的组件,以发现控制局部属性的潜在方向。这些方向可以在图像之间进行推广,从而实现语义上一致的编辑,而无需重新训练。我们进一步扩展了我们的方法,通过从单个参考图像自动推断编辑方向并在没有人工输入的情况下生成区域掩码,从而最大限度地减少人工监督。在多个数据集上的实验表明,我们的方法比最先进的方法产生更本地化、高保真的编辑。
{"title":"Disentangling Local and Global Semantics in Diffusion Models for Image Editing","authors":"Manos Plitsis, Theodoros Kouzelis, Panagiotis Koromilas, Vassilis Katsouros, Mihalis A. Nicolaou, Yannis Panagakis","doi":"10.1007/s11263-025-02694-y","DOIUrl":"https://doi.org/10.1007/s11263-025-02694-y","url":null,"abstract":"Diffusion models have achieved state-of-the-art image synthesis, yet unlike GANs, they lack a well-structured latent space for intuitive image editing. Existing diffusion-based editing methods often rely on supervised fine-tuning or text-based guidance, while recent unsupervised techniques leveraging the model’s bottleneck layer suffer from one or more key limitations: (i) they focus only on global attributes, (ii) fail to disentangle local and global semantics, or (iii) require extensive human intervention. To fill this gap, we first propose an unsupervised method for localized image editing in pre-trained unconditional diffusion models that disentangles local and global semantics in the model’s latent space. Given an input image and a user-specified region of interest, our approach uses the denoising network’s Jacobian to map that region to a corresponding latent subspace. We then separate this subspace into shared (global) and region-specific components to uncover latent directions that control local attributes. These directions generalize across images, enabling semantically consistent edits without retraining. We go one step further by extending our method to minimize manual supervision by automatically inferring edit directions from a single reference image and generating region masks without human input. Experiments on multiple datasets show that our method yields more localized, high-fidelity edits than state-of-the-art approaches.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"30 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CylindFormer: Image-to-Point Cloud Registration with Cylindrical Transformer 圆柱形变压器的图像到点云配准
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-026-02747-w
Jingtao Wang, Hao Tang, Yanpeng Sun, Shengfeng He, Zechao Li
{"title":"CylindFormer: Image-to-Point Cloud Registration with Cylindrical Transformer","authors":"Jingtao Wang, Hao Tang, Yanpeng Sun, Shengfeng He, Zechao Li","doi":"10.1007/s11263-026-02747-w","DOIUrl":"https://doi.org/10.1007/s11263-026-02747-w","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"260 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstructing a Sphere and the Camera Focal Length from a Single View by Fitting Planes 通过拟合平面从单一视图重建球体和相机焦距
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-025-02697-9
Erol Ozgur, Mohammad Alkhatib, Youcef Mezouar, Adrien Bartoli
{"title":"Reconstructing a Sphere and the Camera Focal Length from a Single View by Fitting Planes","authors":"Erol Ozgur, Mohammad Alkhatib, Youcef Mezouar, Adrien Bartoli","doi":"10.1007/s11263-025-02697-9","DOIUrl":"https://doi.org/10.1007/s11263-025-02697-9","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"5 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery 基于记忆一致性的分类与征服学习的广义类别发现
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-06 DOI: 10.1007/s11263-026-02745-y
Yuanpeng Tu, Zhun Zhong, Yuxi Li, Hengshuang Zhao
{"title":"Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery","authors":"Yuanpeng Tu, Zhun Zhong, Yuxi Li, Hengshuang Zhao","doi":"10.1007/s11263-026-02745-y","DOIUrl":"https://doi.org/10.1007/s11263-026-02745-y","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"16 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1