Regarding intelligent transportation systems, low-bitrate transmission via lossy point cloud compression is vital for facilitating real-time collaborative perception among connected agents, such as vehicles and infrastructures, under restricted bandwidth. In existing compression transmission systems, the sender lossily compresses point coordinates and reflectance to generate a transmission code stream, which faces transmission burdens from reflectance encoding and limited detection robustness due to information loss. To address these issues, this paper proposes a 3D object detection framework with reflectance prediction-based knowledge distillation (RPKD). We compress point coordinates while discarding reflectance during low-bitrate transmission, and feed the decoded non-reflectance compressed point clouds into a student detector. The discarded reflectance is then reconstructed by a geometry-based reflectance prediction (RP) module within the student detector for precise detection. A teacher detector with the same structure as the student detector is designed for performing reflectance knowledge distillation (RKD) and detection knowledge distillation (DKD) from raw to compressed point clouds. Our cross-source distillation training strategy (CDTS) equips the student detector with robustness to low-quality compressed data while preserving the accuracy benefits of raw data through transferred distillation knowledge. Experimental results on the KITTI and DAIR-V2X-V datasets demonstrate that our method can boost detection accuracy for compressed point clouds across multiple code rates. We will release the code publicly at https://github.com/HaoJing-SX/RPKD.
Recent advances in surgical robotics and computer vision have greatly improved intelligent systems' autonomy and perception in the operating room (OR), especially in endoscopic and minimally invasive surgeries. However, for open surgery, which is still the predominant form of surgical intervention worldwide, there has been relatively limited exploration due to its inherent complexity and the lack of large-scale, diverse datasets. To close this gap, we present OpenSurgery, by far the largest video-text pretraining and evaluation dataset for open surgery understanding. OpenSurgery consists of two subsets: OpenSurgery-Pretrain and OpenSurgery-EVAL. OpenSurgery-Pretrain consists of 843 publicly available open surgery videos for pretraining, spanning 102 hours and encompassing over 20 distinct surgical types. OpenSurgery-EVAL is a benchmark dataset for evaluating model performance in open surgery understanding, comprising 280 training and 120 test videos, totaling 49 hours. Each video in OpenSurgery is meticulously annotated by expert surgeons at three hierarchical levels of video, operation, and frame to ensure both high quality and strong clinical applicability. Next, we propose the Hierarchical Surgical Knowledge Pretraining (HierSKP) framework to facilitate large-scale multimodal representation learning for open surgery understanding. HierSKP leverages a granularity-aware contrastive learning strategy and enhances procedural comprehension by constructing hard negative samples and incorporating a Dynamic Time Warping (DTW)-based loss to capture fine-grained temporal alignment of visual semantics. Extensive experiments show that HierSKP achieves state-of-the-art performance on OpenSurgegy-EVAL across multiple tasks, including operation recognition, temporal action localization, and zero-shot cross-modal retrieval. This demonstrates its strong generalizability for further advances in open surgery understanding.
Deep unfolding networks (DUNs), combining conventional iterative optimization algorithms and deep neural networks into a multi-stage framework, have achieved remarkable accomplishments in Image Restoration (IR), such as spectral imaging reconstruction, compressive sensing and super-resolution. It unfolds the iterative optimization steps into a stack of sequentially linked blocks. Each block consists of a Gradient Descent Module (GDM) and a Proximal Mapping Module (PMM) which is equivalent to a denoiser from a Bayesian perspective, operating on Gaussian noise with a known level. However, existing DUNs suffer from two critical limitations: 1) their PMMs share identical architectures and denoising objectives across stages, ignoring the need for stage-specific adaptation to varying noise levels; and 2) their chain of structurally repetitive blocks results in severe parameter redundancy and high memory consumption, hindering deployment in large-scale or resource-constrained scenarios. To address these challenges, we introduce generalized Deep Low-rank Adaptation (LoRA) Unfolding Networks for image restoration, named LoRun, harmonizing denoising objectives and adapting different denoising levels between stages with compressed memory usage for more efficient DUN. LoRun introduces a novel paradigm where a single pretrained base denoiser is shared across all stages, while lightweight, stage-specific LoRA adapters are injected into the PMMs to dynamically modulate denoising behavior according to the noise level at each unfolding step. This design decouples the core restoration capability from task-specific adaptation, enabling precise control over denoising intensity without duplicating full network parameters and achieving up to $N$ times parameter reduction for an $N$ -stage DUN with on-par or better performance. Extensive experiments conducted on three IR tasks validate the efficiency of our method.
Recovering High Dynamic Range (HDR) images from multiple Standard Dynamic Range (SDR) images becomes challenging when the SDR images exhibit noticeable degradation and missing content. Leveraging scene-specific semantic priors offers a promising solution for restoring heavily degraded regions. However, these priors are typically extracted from sRGB SDR images, the domain/format gap poses a significant challenge when applying it to HDR imaging. To address this issue, we propose a general framework that transfers semantic knowledge derived from SDR domain via self-distillation to boost existing HDR reconstruction. Specifically, the proposed framework first introduces the Semantic Priors Guided Reconstruction Model (SPGRM), which leverages SDR image semantic knowledge to address ill-posed problems in the initial HDR reconstruction results. Subsequently, we leverage a self-distillation mechanism that constrains the color and content information with semantic knowledge, aligning the external outputs between the baseline and SPGRM. Furthermore, to transfer the semantic knowledge of the internal features, we utilize a Semantic Knowledge Alignment Module (SKAM) to fill the missing semantic contents with the complementary masks. Extensive experiments demonstrate that our framework significantly boosts HDR imaging quality for existing methods without altering the network architecture.

