Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.
Accurately predicting intracerebral hemorrhage (ICH) prognosis is a critical and indispensable step in the clinical management of patients post-ICH. Recently, integrating artificial intelligence, particularly deep learning, has significantly enhanced prediction accuracy and alleviated neurosurgeons from the burden of manual prognosis assessment. However, uni-modal methods have shown suboptimal performance due to the intricate pathophysiology of the ICH. On the other hand, existing cross-modal approaches that incorporate tabular data have often failed to effectively extract complementary information and cross-modal features between modalities, thereby limiting their prognostic capabilities. This study introduces a novel cross-modal network, ICH-PRNet, designed to predict ICH prognosis outcomes. Specifically, we propose a joint-attention interaction encoder that effectively integrates computed tomography images and clinical texts within a unified representational space. Additionally, we define a multi-loss function comprising three components to comprehensively optimize cross-modal fusion capabilities. To balance the training process, we employ a self-adaptive dynamic prioritization algorithm that adjusts the weights of each component, accordingly. Our model, through these innovative designs, establishes robust semantic connections between modalities and uncovers rich, complementary cross-modal information, thereby achieving superior prediction results. Extensive experimental results and comparisons with state-of-the-art methods on both in-house and publicly available datasets unequivocally demonstrate the superiority and efficacy of the proposed method. Our code is at https://github.com/YU-deep/ICH-PRNet.git.
Modifying the structure of an existing network is a common method to further improve the performance of the network. However, modifying some layers in network often results in pre-trained weight mismatch, and fine-tune process is time-consuming and resource-inefficient. To address this issue, we propose a novel technique called Identity Model Transformation (IMT), which keep the output before and after transformation in an equal form by rigorous algebraic transformations. This approach ensures the preservation of the original model's performance when modifying layers. Additionally, IMT significantly reduces the total training time required to achieve optimal results while further enhancing network performance. IMT has established a bridge for rapid transformation between model architectures, enabling a model to quickly perform analytic continuation and derive a family of tree-like models with better performance. This model family possesses a greater potential for optimization improvements compared to a single model. Extensive experiments across various object detection tasks validated the effectiveness and efficiency of our proposed IMT solution, which saved 94.76% time in fine-tuning the basic model YOLOv4-Rot on DOTA 1.5 dataset, and by using the IMT method, we saw stable performance improvements of 9.89%, 6.94%, 2.36%, and 4.86% on the four datasets: AI-TOD, DOTA1.5, coco2017, and MRSAText, respectively.
Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.
Recommendation systems are vital tools for helping users discover content that suits their interests. Collaborative filtering methods are one of the techniques employed for analyzing interactions between users and items, which are typically stored in a sparse matrix. This inherent sparsity poses a challenge because it necessitates accurately and effectively filling in these gaps to provide users with meaningful and personalized recommendations. Our solution addresses sparsity in recommendations by incorporating diverse data sources, including trust statements and an imputation graph. The trust graph captures user relationships and trust levels, working in conjunction with an imputation graph, which is constructed by estimating the missing rates of each user based on the user-item matrix using the average rates of the most similar users. Combined with the user-item rating graph, an attention mechanism fine tunes the influence of these graphs, resulting in more personalized and effective recommendations. Our method consistently outperforms state-of-the-art recommenders in real-world dataset evaluations, underscoring its potential to strengthen recommendation systems and mitigate sparsity challenges.