Pub Date : 2024-09-02DOI: 10.1109/TMM.2024.3453059
Defu Qiu;Yuhu Cheng;Kelvin K.L. Wong;Wenjun Zhang;Zhang Yi;Xuesong Wang
Cardiac magnetic resonance imaging (CMRI) can help experts quickly diagnose cardiovascular diseases. Due to the patient's breathing and slight movement during the magnetic resonance imaging scan, the obtained CMRI may be severely blurred, affecting the accuracy of clinical diagnosis. To address this issue, we propose the quadratic conditional diffusion model for blind CMRI super-resolution (DBSR). Specifically, we propose a conditional blur kernel noise predictor, which predicts the blur kernel from low-resolution images by the diffusion model, transforming the unknown blur kernel in low-resolution CMRI into a known one. Meanwhile, we design a novel conditional CMRI noise predictor, which uses the predicted blur kernel as prior knowledge to guide the diffusion model in reconstructing high-resolution CMRI. Furthermore, we propose a cascaded residual attention network feature extractor, which extracts feature information from CMRI low-resolution images for blur kernel prediction and SR reconstruction of CMRI images. Extensive experimental results indicate that our proposed DBSR achieves better blind super-resolution reconstruction results than several state-of-the-art baselines.
{"title":"DBSR: Quadratic Conditional Diffusion Model for Blind Cardiac MRI Super-Resolution","authors":"Defu Qiu;Yuhu Cheng;Kelvin K.L. Wong;Wenjun Zhang;Zhang Yi;Xuesong Wang","doi":"10.1109/TMM.2024.3453059","DOIUrl":"10.1109/TMM.2024.3453059","url":null,"abstract":"Cardiac magnetic resonance imaging (CMRI) can help experts quickly diagnose cardiovascular diseases. Due to the patient's breathing and slight movement during the magnetic resonance imaging scan, the obtained CMRI may be severely blurred, affecting the accuracy of clinical diagnosis. To address this issue, we propose the quadratic conditional diffusion model for blind CMRI super-resolution (DBSR). Specifically, we propose a conditional blur kernel noise predictor, which predicts the blur kernel from low-resolution images by the diffusion model, transforming the unknown blur kernel in low-resolution CMRI into a known one. Meanwhile, we design a novel conditional CMRI noise predictor, which uses the predicted blur kernel as prior knowledge to guide the diffusion model in reconstructing high-resolution CMRI. Furthermore, we propose a cascaded residual attention network feature extractor, which extracts feature information from CMRI low-resolution images for blur kernel prediction and SR reconstruction of CMRI images. Extensive experimental results indicate that our proposed DBSR achieves better blind super-resolution reconstruction results than several state-of-the-art baselines.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11358-11371"},"PeriodicalIF":8.4,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1109/TMM.2024.3453050
Rao Fu;Kai Hormann;Pierre Alliez
We present a novel approach for generating isotropic surface triangle meshes directly from unoriented 3D point clouds, with the mesh density adapting to the estimated local feature size (LFS). Popular reconstruction pipelines first reconstruct a dense mesh from the input point cloud and then apply remeshing to obtain an isotropic mesh. The sequential pipeline makes it hard to find a lower-density mesh while preserving more details. Instead, our approach reconstructs both an implicit function and an LFS-aware mesh sizing function directly from the input point cloud, which is then used to produce the final LFS-aware mesh without remeshing. We combine local curvature radius and shape diameter to estimate the LFS directly from the input point clouds. Additionally, we propose a new mesh solver to solve an implicit function whose zero level set delineates the surface without requiring normal orientation. The added value of our approach is generating isotropic meshes directly from 3D point clouds with an LFS-aware density, thus achieving a trade-off between geometric detail and mesh complexity. Our experiments also demonstrate the robustness of our method to noise, outliers, and missing data and can preserve sharp features for CAD point clouds.
{"title":"LFS-Aware Surface Reconstruction From Unoriented 3D Point Clouds","authors":"Rao Fu;Kai Hormann;Pierre Alliez","doi":"10.1109/TMM.2024.3453050","DOIUrl":"10.1109/TMM.2024.3453050","url":null,"abstract":"We present a novel approach for generating isotropic surface triangle meshes directly from unoriented 3D point clouds, with the mesh density adapting to the estimated local feature size (LFS). Popular reconstruction pipelines first reconstruct a dense mesh from the input point cloud and then apply remeshing to obtain an isotropic mesh. The sequential pipeline makes it hard to find a lower-density mesh while preserving more details. Instead, our approach reconstructs both an implicit function and an LFS-aware mesh sizing function directly from the input point cloud, which is then used to produce the final LFS-aware mesh without remeshing. We combine local curvature radius and shape diameter to estimate the LFS directly from the input point clouds. Additionally, we propose a new mesh solver to solve an implicit function whose zero level set delineates the surface without requiring normal orientation. The added value of our approach is generating isotropic meshes directly from 3D point clouds with an LFS-aware density, thus achieving a trade-off between geometric detail and mesh complexity. Our experiments also demonstrate the robustness of our method to noise, outliers, and missing data and can preserve sharp features for CAD point clouds.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11415-11427"},"PeriodicalIF":8.4,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning techniques are increasingly integrated into rescaling-based video compression frameworks and have shown great potential in improving compression efficiency. However, existing methods achieve limited performance because 1) they treat context priors generated by codec as independent sources of information, ignoring potential interactions between multiple priors in rescaling, which may not effectively facilitate compression; 2) they often employ a uniform sampling ratio across regions with varying content complexities, resulting in the loss of important information. To address the above two issues, this paper proposes a spatial multi-prior driven resolution rescaling framework for intra-frame coding, called MP-RRF, consisting of three sub-networks: a multi-prior driven network, a downscaling network, and an upscaling network. First, the multi-prior driven network employs complexity and similarity priors to smooth the unnecessarily complicated information while leveraging similarity and quality priors to produce high-fidelity complementary information. This interaction of complexity, similarity and quality priors ensures redundancy reduction and texture enhancement. Second, the downscaling network discriminatively processes components of different granularities to generate a compact, low-resolution image for encoding. The upscaling network aggregates a complementary set of contextual multi-scale features to reconstruct realistic details while combining variable receptive fields to suppress multi-scale compression artifacts and resampling noise. Extensive experiments show that our network achieves a significant 23.84% Bjøntegaard Delta Rate (BD-Rate) reduction under all-intra configuration compared to the codec anchor, offering the state-of-the-art coding performance.
{"title":"Multi-Prior Driven Resolution Rescaling Blocks for Intra Frame Coding","authors":"Peiying Wu;Shiwei Wang;Liquan Shen;Feifeng Wang;Zhaoyi Tian;Xia Hua","doi":"10.1109/TMM.2024.3453033","DOIUrl":"10.1109/TMM.2024.3453033","url":null,"abstract":"Deep learning techniques are increasingly integrated into rescaling-based video compression frameworks and have shown great potential in improving compression efficiency. However, existing methods achieve limited performance because 1) they treat context priors generated by codec as independent sources of information, ignoring potential interactions between multiple priors in rescaling, which may not effectively facilitate compression; 2) they often employ a uniform sampling ratio across regions with varying content complexities, resulting in the loss of important information. To address the above two issues, this paper proposes a spatial multi-prior driven resolution rescaling framework for intra-frame coding, called MP-RRF, consisting of three sub-networks: a multi-prior driven network, a downscaling network, and an upscaling network. First, the multi-prior driven network employs complexity and similarity priors to smooth the unnecessarily complicated information while leveraging similarity and quality priors to produce high-fidelity complementary information. This interaction of complexity, similarity and quality priors ensures redundancy reduction and texture enhancement. Second, the downscaling network discriminatively processes components of different granularities to generate a compact, low-resolution image for encoding. The upscaling network aggregates a complementary set of contextual multi-scale features to reconstruct realistic details while combining variable receptive fields to suppress multi-scale compression artifacts and resampling noise. Extensive experiments show that our network achieves a significant 23.84% Bjøntegaard Delta Rate (BD-Rate) reduction under all-intra configuration compared to the codec anchor, offering the state-of-the-art coding performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11274-11289"},"PeriodicalIF":8.4,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1109/TMM.2024.3452980
Feixiang Zhou;Zheheng Jiang;Huiyu Zhou;Xuelong Li
Semi-supervised temporal action segmentation (SS-TAS) aims to perform frame-wise classification in long untrimmed videos, where only a fraction of videos in the training set have labels. Recent studies have shown the potential of contrastive learning in unsupervised representation learning using unlabelled data. However, learning the representation of each frame by unsupervised contrastive learning for action segmentation remains an open and challenging problem. In this paper, we propose a novel Semantic-guided Multi-level Contrast scheme with a Neighbourhood-Consistency-Aware unit (SMC-NCA) to extract strong frame-wise representations for SS-TAS. Specifically, for representation learning, SMC is first used to explore intra- and inter-information variations in a unified and contrastive way, based on action-specific semantic information and temporal information highlighting relations between actions. Then, the NCA module, which is responsible for enforcing spatial consistency between neighbourhoods centered at different frames to alleviate over-segmentation issues, works alongside SMC for semi-supervised learning (SSL). Our SMC outperforms the other state-of-the-art methods on three benchmarks, offering improvements of up to 17.8 $%$