Pub Date : 2025-02-03DOI: 10.3103/S1060992X24700498
A. Rasmi
Cardiac magnetic resonance imaging (MRI) commonly yields numerous images per scan, and manually delineating structures from these images is a laborious and time-intensive task. The automation of this process is highly desirable as it would enable the generation of crucial clinical measurements like ejection fraction and stroke volume. However, due to variations in scanning settings and patient characteristics, automated segmentation faces several challenges that lead to a high degree of variability in picture statistics and quality. Our study presents a neural network approach that utilizes the UNet and ResNet-50 architectures to efficiently partition the left and right ventricles' endocardial and epicardial boundaries. The Dice metric is used as the loss function in our strategy to maximize the trainable parameters in the network. Additionally, in the neural network’s predicted binary picture, we employed a preprocessing step to save just the segmentation labels' most connected component. Using datasets from the Multi-Vendor & Multi-Disease Cardiac Image Segmentation Challenge, the suggested method was learned. The test set of 160 that had been reserved for testing was used by the challenge organizers to evaluate the approach.
{"title":"Hybrid Network Model for Cardiac Image Segmentation Using MRI Images","authors":"A. Rasmi","doi":"10.3103/S1060992X24700498","DOIUrl":"10.3103/S1060992X24700498","url":null,"abstract":"<p>Cardiac magnetic resonance imaging (MRI) commonly yields numerous images per scan, and manually delineating structures from these images is a laborious and time-intensive task. The automation of this process is highly desirable as it would enable the generation of crucial clinical measurements like ejection fraction and stroke volume. However, due to variations in scanning settings and patient characteristics, automated segmentation faces several challenges that lead to a high degree of variability in picture statistics and quality. Our study presents a neural network approach that utilizes the UNet and ResNet-50 architectures to efficiently partition the left and right ventricles' endocardial and epicardial boundaries. The Dice metric is used as the loss function in our strategy to maximize the trainable parameters in the network. Additionally, in the neural network’s predicted binary picture, we employed a preprocessing step to save just the segmentation labels' most connected component. Using datasets from the Multi-Vendor & Multi-Disease Cardiac Image Segmentation Challenge, the suggested method was learned. The test set of 160 that had been reserved for testing was used by the challenge organizers to evaluate the approach.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 4","pages":"447 - 454"},"PeriodicalIF":1.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143108112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-03DOI: 10.3103/S1060992X24700814
Changgeng Yu, Chaowen He, Dashi Lin
In this paper, we propose an abnormal sound event detection method based on Time-Frequency Spectral Information Fusion Neural Network (TFSIFNN), addressing the problem that the time structure and frequency information of sound events in real environment are widely varied, resulting in poor performance of abnormal sound event detection. First, we construct a TCN-BiLSTM network based on Temporal Convolutional Networks (TCN) and Bidirectional Long Short-Term Memory (BiLSTM) networks to extract the temporal context information from sound events. Next, we enhance the feature learning capability of the MobileNetV3 network through Efficient Channel Attention (ECA), culminating in the design of an ECA-MobileNetV3 network to capture the spectral information within sound events. Finally, a TFSIFNN model was established based on TCN-BiLSTM and ECA-MobileNetV3 to improve the performance of abnormal sound event detection. The experimental results, conducted on the Urbansound8K and TUT Rare Sound Events 2017 datasets, demonstrate that our TFSIFNN model achieved notable performance improvements. Specifically, it reached an accuracy of 93.93% and an F1-Score of 94.15% on the Urbansound8K dataset. On the TUT Rare Sound Events 2017 dataset, compared to the baseline method, the error rate on the evaluation set decreased by 0.55, and the F1-Score improved by 29.69%.
{"title":"Abnormal Sound Event Detection Method Based on Time-Spectrum Information Fusion","authors":"Changgeng Yu, Chaowen He, Dashi Lin","doi":"10.3103/S1060992X24700814","DOIUrl":"10.3103/S1060992X24700814","url":null,"abstract":"<p>In this paper, we propose an abnormal sound event detection method based on Time-Frequency Spectral Information Fusion Neural Network (TFSIFNN), addressing the problem that the time structure and frequency information of sound events in real environment are widely varied, resulting in poor performance of abnormal sound event detection. First, we construct a TCN-BiLSTM network based on Temporal Convolutional Networks (TCN) and Bidirectional Long Short-Term Memory (BiLSTM) networks to extract the temporal context information from sound events. Next, we enhance the feature learning capability of the MobileNetV3 network through Efficient Channel Attention (ECA), culminating in the design of an ECA-MobileNetV3 network to capture the spectral information within sound events. Finally, a TFSIFNN model was established based on TCN-BiLSTM and ECA-MobileNetV3 to improve the performance of abnormal sound event detection. The experimental results, conducted on the Urbansound8K and TUT Rare Sound Events 2017 datasets, demonstrate that our TFSIFNN model achieved notable performance improvements. Specifically, it reached an accuracy of 93.93% and an <i>F</i>1<i>-Score</i> of 94.15% on the Urbansound8K dataset. On the TUT Rare Sound Events 2017 dataset, compared to the baseline method, the error rate on the evaluation set decreased by 0.55, and the <i>F</i>1<i>-Score</i> improved by 29.69%.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 4","pages":"411 - 421"},"PeriodicalIF":1.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143108126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-03DOI: 10.3103/S1060992X24700826
S. N. Dobryakov, V. V. Privezentsev
In this paper we use EPR spectrums to explore interactions between elements of a quantum pair 31P–31P embedded into 28Si isotope substrate supposing that several silicon atoms separate phosphorus isotopes. The EPR method allows us to identify at a quantum level mechanisms of interaction between the phosphorus atoms and to analyze the influence of the silicon substrate on the spin-spin interaction between 31P atoms in the quantum pairs. We also examined possibilities to control these interactions. When simulating, we take into account scalar and vector exchange interactions as well as a dipole interaction between unpaired electrons of 31P atoms. We suppose that an indirect dipole-dipole interaction is carried out via a system of conjugated 3d-orbits and by means of a polarization of the medium (the 28Si isotope substrate). The exchange interaction between the spins (the magnetic moments) of electrons of the two phosphorus atoms also is carried out via the polarized medium. We discuss the obtained simulated EPR spectrums.
{"title":"Computer Analysis of EPR Spectra of 31P Atom Quantum Pair Embedded in Spinless Isotope 28Si Substrate","authors":"S. N. Dobryakov, V. V. Privezentsev","doi":"10.3103/S1060992X24700826","DOIUrl":"10.3103/S1060992X24700826","url":null,"abstract":"<p>In this paper we use EPR spectrums to explore interactions between elements of a quantum pair <sup>31</sup>P–<sup>31</sup>P embedded into <sup>28</sup>Si isotope substrate supposing that several silicon atoms separate phosphorus isotopes. The EPR method allows us to identify at a quantum level mechanisms of interaction between the phosphorus atoms and to analyze the influence of the silicon substrate on the spin-spin interaction between <sup>31</sup>P atoms in the quantum pairs. We also examined possibilities to control these interactions. When simulating, we take into account scalar and vector exchange interactions as well as a dipole interaction between unpaired electrons of <sup>31</sup>P atoms. We suppose that an indirect dipole-dipole interaction is carried out via a system of conjugated 3<i>d</i>-orbits and by means of a polarization of the medium (the <sup>28</sup>Si isotope substrate). The exchange interaction between the spins (the magnetic moments) of electrons of the two phosphorus atoms also is carried out via the polarized medium. We discuss the obtained simulated EPR spectrums.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 4","pages":"422 - 428"},"PeriodicalIF":1.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143108220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.3103/S1060992X24700784
A. Korsakov, V. Ivanova, A. Demcheva, R. Eidelman, I. Fomin, A. Bakhshiev
The task of developing and applying neuromorphic elements of an information control system for mobile robots is considered. The description of the compartmental spiking neuron model used in the work and the algorithm of its structural learning is given. The elements of the information control system used in the work are described: a neuromorphic emergency detector, a neuromorphic extrapolator, and a neuromorphic model for the formation of associative connections. Based on these elements, a scheme for the formation of a conditioned reflex reaction with negative reinforcement is proposed. In addition, a scheme is considered that allows a mobile robot to move at a given distance from the wall. The first of these schemes was tested on a real mobile robotics platform. The conclusion is made about the possibility of constructing neuromorphic information control systems from the presented elements and the prospects for the development of this approach.
{"title":"Development and Implementation of Neuromorphic Elements of the Information and Control System of a Mobile Robot","authors":"A. Korsakov, V. Ivanova, A. Demcheva, R. Eidelman, I. Fomin, A. Bakhshiev","doi":"10.3103/S1060992X24700784","DOIUrl":"10.3103/S1060992X24700784","url":null,"abstract":"<p>The task of developing and applying neuromorphic elements of an information control system for mobile robots is considered. The description of the compartmental spiking neuron model used in the work and the algorithm of its structural learning is given. The elements of the information control system used in the work are described: a neuromorphic emergency detector, a neuromorphic extrapolator, and a neuromorphic model for the formation of associative connections. Based on these elements, a scheme for the formation of a conditioned reflex reaction with negative reinforcement is proposed. In addition, a scheme is considered that allows a mobile robot to move at a given distance from the wall. The first of these schemes was tested on a real mobile robotics platform. The conclusion is made about the possibility of constructing neuromorphic information control systems from the presented elements and the prospects for the development of this approach.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S504 - S512"},"PeriodicalIF":1.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143109184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.3103/S1060992X2470067X
Gaurav Dhiman, Yu. V. Tiumentsev, R. A. Tskhai
The task of aircraft motion control has to be solved under conditions of numerous heterogeneous uncertainties both in the aircraft motion model and in the environment in which the aircraft is flying. These uncertainties, in particular, are caused by the fact that in the flight of the aircraft can occur various kinds of abnormal situations caused by failures of equipment and systems of the aircraft, damage to the airframe and propulsion system of the aircraft. Some of these failures and damages have a direct impact on the dynamic characteristics of the aircraft as a control object. In this regard, the problem arises of such an adjustment of aircraft control algorithms that would provide the ability to adapt to the changed dynamics of the aircraft. It is extremely difficult, and in some cases impossible, to foresee in advance all possible damages, failures and their combinations. Hence, it is necessary to implement adaptive flight control algorithms that are able to adjust to the changing situation. One of the effective tools for solving such problems is reinforcement learning in the Approximate Dynamic Programming (ADP) variant, in combination with artificial neural networks. In the last decade, a family of methods known as Adaptive Critic Design (ACD) has been actively developed within the ADP approach to control the behavior of complex dynamic systems. In our paper we consider the application of one of the variants of the ACD approach, namely SNAC (Single Network Adaptive Critic) and its development through its joint use with the method of dynamic inversion. The effectiveness of this approach is demonstrated on the example of longitudinal motion control of a supersonic transport airplane.
{"title":"Combined Use of Dynamic Inversion and Reinforcement Learning for Motion Control of an Supersonic Transport Aircraft","authors":"Gaurav Dhiman, Yu. V. Tiumentsev, R. A. Tskhai","doi":"10.3103/S1060992X2470067X","DOIUrl":"10.3103/S1060992X2470067X","url":null,"abstract":"<p>The task of aircraft motion control has to be solved under conditions of numerous heterogeneous uncertainties both in the aircraft motion model and in the environment in which the aircraft is flying. These uncertainties, in particular, are caused by the fact that in the flight of the aircraft can occur various kinds of abnormal situations caused by failures of equipment and systems of the aircraft, damage to the airframe and propulsion system of the aircraft. Some of these failures and damages have a direct impact on the dynamic characteristics of the aircraft as a control object. In this regard, the problem arises of such an adjustment of aircraft control algorithms that would provide the ability to adapt to the changed dynamics of the aircraft. It is extremely difficult, and in some cases impossible, to foresee in advance all possible damages, failures and their combinations. Hence, it is necessary to implement adaptive flight control algorithms that are able to adjust to the changing situation. One of the effective tools for solving such problems is reinforcement learning in the Approximate Dynamic Programming (ADP) variant, in combination with artificial neural networks. In the last decade, a family of methods known as Adaptive Critic Design (ACD) has been actively developed within the ADP approach to control the behavior of complex dynamic systems. In our paper we consider the application of one of the variants of the ACD approach, namely SNAC (Single Network Adaptive Critic) and its development through its joint use with the method of dynamic inversion. The effectiveness of this approach is demonstrated on the example of longitudinal motion control of a supersonic transport airplane.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S399 - S413"},"PeriodicalIF":1.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143108934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.3103/S1060992X24700796
A. V. Demidovskij, I. G. Salnikov, A. M. Tugaryov, A. I. Trutnev, I. A. Novikova
Large Language Models fine-tuning is an essential part of modern artificial intelligent systems that solve numerous tasks, such as natural language processing and computer vision. Among the various fine-tuning strategies, the most prominent approach for Large Language Model fine-tuning is Parameter-Efficient Fine-Tuning (PEFT), as it allows to achieve state-of-the-art performance on multiple tasks while minimizing computational resources and training time. Recently, an increasing number of PEFT methodologies have been developed, each asserting superiority based on performance metrics. However, a critical evaluation of how these methods align with the tuning dynamic of the full fine-tuning (FT) remains largely unexplored. This study focuses on bridging this gap by analyzing the learning behavior of such PEFT approaches as LoRA, LoRA+, AdaLoRA, DoRA, VeRA, PiSSA, LoKr and LoHa in comparison to FT. This work provides a comprehensive comparative analysis aimed at identifying which PEFT methods diverge significantly in weights update dynamic from the FT standard. The findings reveal insights into the underlying causes of these discrepancies, offering a deeper understanding of each method’s behavior and efficacy.
{"title":"Comprehensive Weight Decomposition Analysis of Modern Parameter-Efficient Methods","authors":"A. V. Demidovskij, I. G. Salnikov, A. M. Tugaryov, A. I. Trutnev, I. A. Novikova","doi":"10.3103/S1060992X24700796","DOIUrl":"10.3103/S1060992X24700796","url":null,"abstract":"<p>Large Language Models fine-tuning is an essential part of modern artificial intelligent systems that solve numerous tasks, such as natural language processing and computer vision. Among the various fine-tuning strategies, the most prominent approach for Large Language Model fine-tuning is Parameter-Efficient Fine-Tuning (PEFT), as it allows to achieve state-of-the-art performance on multiple tasks while minimizing computational resources and training time. Recently, an increasing number of PEFT methodologies have been developed, each asserting superiority based on performance metrics. However, a critical evaluation of how these methods align with the tuning dynamic of the full fine-tuning (FT) remains largely unexplored. This study focuses on bridging this gap by analyzing the learning behavior of such PEFT approaches as LoRA, LoRA+, AdaLoRA, DoRA, VeRA, PiSSA, LoKr and LoHa in comparison to FT. This work provides a comprehensive comparative analysis aimed at identifying which PEFT methods diverge significantly in weights update dynamic from the FT standard. The findings reveal insights into the underlying causes of these discrepancies, offering a deeper understanding of each method’s behavior and efficacy.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S513 - S522"},"PeriodicalIF":1.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143109182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.3103/S1060992X24700747
G. Kupriyanov, I. Isaev, K. Laptinskiy, T. Dolenko, S. Dolenko
Kolmogorov-Arnold Networks (KAN), introduced in May 2024, are a novel type of artificial neural networks, whose abilities and properties are now being actively investigated by the machine learning community. In this study, we test application of KAN to solve an inverse problem for development of multimodal carbon luminescent nanosensors of ions dissolved in water, including heavy metal cations. We compare the results of solving this problem with four various machine learning methods—random forest, gradient boosting over decision trees, multi-layer perceptron neural networks, and KAN. Advantages and disadvantages of KAN are discussed, and it is demonstrated that KAN has high chance to become one of the algorithms most recommended for use in solving highly non-linear regression problems with moderate number of input features.
{"title":"Solution of an Inverse Problem of Optical Spectroscopy Using Kolmogorov-Arnold Networks","authors":"G. Kupriyanov, I. Isaev, K. Laptinskiy, T. Dolenko, S. Dolenko","doi":"10.3103/S1060992X24700747","DOIUrl":"10.3103/S1060992X24700747","url":null,"abstract":"<p>Kolmogorov-Arnold Networks (KAN), introduced in May 2024, are a novel type of artificial neural networks, whose abilities and properties are now being actively investigated by the machine learning community. In this study, we test application of KAN to solve an inverse problem for development of multimodal carbon luminescent nanosensors of ions dissolved in water, including heavy metal cations. We compare the results of solving this problem with four various machine learning methods—random forest, gradient boosting over decision trees, multi-layer perceptron neural networks, and KAN. Advantages and disadvantages of KAN are discussed, and it is demonstrated that KAN has high chance to become one of the algorithms most recommended for use in solving highly non-linear regression problems with moderate number of input features.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S475 - S482"},"PeriodicalIF":1.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143109141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.3103/S1060992X24700735
A. Bulatov, Y. Kuratov, M. Burtsev
Recent advancements have significantly improved the skills and performance of language models, but have also increased computational demands due to the increasing number of parameters and the quadratic complexity of the attention mechanism. As context sizes expand into millions of tokens, making long-context processing more accessible and efficient becomes a critical challenge. Furthermore, modern benchmarks such as BABILong [1] underscore the inefficiency of even the most powerful LLMs in long context reasoning. In this paper, we employ finetuning and multi-task learning to train a model capable of mastering multiple BABILong long-context reasoning skills. We demonstrate that even models with fewer than 140 million parameters can outperform much larger counterparts by learning multiple essential tasks simultaneously. By conditioning Recurrent Memory Transformer [2] on task description, we achieve state-of-the-art results on multi-task BABILong QA1–QA5 set for up to 32k tokens. The proposed model also shows generalization abilities to new lengths and tasks, along with increased robustness to input perturbations.
{"title":"Mastering Long-Context Multi-Task Reasoning with Transformers and Recurrent Memory","authors":"A. Bulatov, Y. Kuratov, M. Burtsev","doi":"10.3103/S1060992X24700735","DOIUrl":"10.3103/S1060992X24700735","url":null,"abstract":"<p>Recent advancements have significantly improved the skills and performance of language models, but have also increased computational demands due to the increasing number of parameters and the quadratic complexity of the attention mechanism. As context sizes expand into millions of tokens, making long-context processing more accessible and efficient becomes a critical challenge. Furthermore, modern benchmarks such as BABILong [1] underscore the inefficiency of even the most powerful LLMs in long context reasoning. In this paper, we employ finetuning and multi-task learning to train a model capable of mastering multiple BABILong long-context reasoning skills. We demonstrate that even models with fewer than 140 million parameters can outperform much larger counterparts by learning multiple essential tasks simultaneously. By conditioning Recurrent Memory Transformer [2] on task description, we achieve state-of-the-art results on multi-task BABILong QA1–QA5 set for up to 32k tokens. The proposed model also shows generalization abilities to new lengths and tasks, along with increased robustness to input perturbations.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S466 - S474"},"PeriodicalIF":1.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143109252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.3103/S1060992X24700711
V. Bezuglyj, D. A. Yudin
Semantic-aware mapping is crucial for advancing robotic navigation and interaction within complex environments. Traditional 3D mapping techniques primarily capture geometric details, missing the semantic richness necessary for autonomous systems to understand their surroundings comprehensively. This paper presents Sea-SHINE, a novel approach that integrates semantic information within a neural implicit mapping framework for large-scale environments. Our method enhances the utility and navigational relevance of 3D maps by embedding semantic awareness into the mapping process, allowing robots to recognize, understand, and reconstruct environments effectively. The proposed system leverages dual decoders and a semantic awareness module, which utilizes Feature-wise Linear Modulation (FiLM) to condition mapping on semantic labels. Extensive experiments on datasets such as SemanticKITTI, KITTI-360, and ITLP-Campus demonstrate significant improvements in map precision and recall, particularly in recognizing crucial objects like road signs. Our implementation bridges the gap between geometric accuracy and semantic understanding, fostering a deeper interaction between robots and their operational environments. The code is publicly available at https://github.com/VitalyyBezuglyj/Sea-SHINE.
{"title":"Sea-SHINE: Semantic-Aware 3D Neural Mapping Using Implicit Representations","authors":"V. Bezuglyj, D. A. Yudin","doi":"10.3103/S1060992X24700711","DOIUrl":"10.3103/S1060992X24700711","url":null,"abstract":"<p>Semantic-aware mapping is crucial for advancing robotic navigation and interaction within complex environments. Traditional 3D mapping techniques primarily capture geometric details, missing the semantic richness necessary for autonomous systems to understand their surroundings comprehensively. This paper presents Sea-SHINE, a novel approach that integrates semantic information within a neural implicit mapping framework for large-scale environments. Our method enhances the utility and navigational relevance of 3D maps by embedding semantic awareness into the mapping process, allowing robots to recognize, understand, and reconstruct environments effectively. The proposed system leverages dual decoders and a semantic awareness module, which utilizes Feature-wise Linear Modulation (FiLM) to condition mapping on semantic labels. Extensive experiments on datasets such as SemanticKITTI, KITTI-360, and ITLP-Campus demonstrate significant improvements in map precision and recall, particularly in recognizing crucial objects like road signs. Our implementation bridges the gap between geometric accuracy and semantic understanding, fostering a deeper interaction between robots and their operational environments. The code is publicly available at https://github.com/VitalyyBezuglyj/Sea-SHINE.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S445 - S456"},"PeriodicalIF":1.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143108817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.3103/S1060992X24700693
A. Samarin, A. Savelev, A. Toropov, A. Nazarenko, A. Motyko, E. Kotenko, A. Dozorceva, A. Dzestelova, E. Mikhailova, V. Malykh
This study explores the development of classifiers for microbial images, specifically focusing on streptococci captured via microscopy of live samples. Our approach uses AutoML-based techniques and automates the creation and analysis of feature spaces to produce optimal descriptors for classifying these microscopic images. This technique leverages interpretable taxonomic features based on the external geometric attributes of various microorganisms. We have released an annotated dataset we assembled to validate our solution, featuring microbial images from unfixed microscopic scenes. Additionally, we assessed the classification performance of our method against several classifiers, including those employing deep neural networks. Our approach outperformed all others tested, achieving the highest Precision (0.980), Recall (0.979), and F1-score (0.980).
{"title":"Streptococci Recognition in Microscope Images Using Taxonomy-based Visual Features","authors":"A. Samarin, A. Savelev, A. Toropov, A. Nazarenko, A. Motyko, E. Kotenko, A. Dozorceva, A. Dzestelova, E. Mikhailova, V. Malykh","doi":"10.3103/S1060992X24700693","DOIUrl":"10.3103/S1060992X24700693","url":null,"abstract":"<p>This study explores the development of classifiers for microbial images, specifically focusing on streptococci captured via microscopy of live samples. Our approach uses AutoML-based techniques and automates the creation and analysis of feature spaces to produce optimal descriptors for classifying these microscopic images. This technique leverages interpretable taxonomic features based on the external geometric attributes of various microorganisms. We have released an annotated dataset we assembled to validate our solution, featuring microbial images from unfixed microscopic scenes. Additionally, we assessed the classification performance of our method against several classifiers, including those employing deep neural networks. Our approach outperformed all others tested, achieving the highest Precision (0.980), Recall (0.979), and F1-score (0.980).</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S424 - S434"},"PeriodicalIF":1.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143108960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}