Pub Date : 2023-10-01Epub Date: 2023-08-10DOI: 10.1142/S0129065723500521
Axel De Nardin, Silvia Zottin, Claudio Piciarelli, Emanuela Colombi, Gian Luca Foresti
Over the years, the humanities community has increasingly requested the creation of artificial intelligence frameworks to help the study of cultural heritage. Document Layout segmentation, which aims at identifying the different structural components of a document page, is a particularly interesting task connected to this trend, specifically when it comes to handwritten texts. While there are many effective approaches to this problem, they all rely on large amounts of data for the training of the underlying models, which is rarely possible in a real-world scenario, as the process of producing the ground truth segmentation task with the required precision to the pixel level is a very time-consuming task and often requires a certain degree of domain knowledge regarding the documents at hand. For this reason, in this paper, we propose an effective few-shot learning framework for document layout segmentation relying on two novel components, namely a dynamic instance generation and a segmentation refinement module. This approach is able of achieving performances comparable to the current state of the art on the popular Diva-HisDB dataset, while relying on just a fraction of the available data.
{"title":"Few-Shot Pixel-Precise Document Layout Segmentation via Dynamic Instance Generation and Local Thresholding.","authors":"Axel De Nardin, Silvia Zottin, Claudio Piciarelli, Emanuela Colombi, Gian Luca Foresti","doi":"10.1142/S0129065723500521","DOIUrl":"10.1142/S0129065723500521","url":null,"abstract":"<p><p>Over the years, the humanities community has increasingly requested the creation of artificial intelligence frameworks to help the study of cultural heritage. Document Layout segmentation, which aims at identifying the different structural components of a document page, is a particularly interesting task connected to this trend, specifically when it comes to handwritten texts. While there are many effective approaches to this problem, they all rely on large amounts of data for the training of the underlying models, which is rarely possible in a real-world scenario, as the process of producing the ground truth segmentation task with the required precision to the pixel level is a very time-consuming task and often requires a certain degree of domain knowledge regarding the documents at hand. For this reason, in this paper, we propose an effective few-shot learning framework for document layout segmentation relying on two novel components, namely a dynamic instance generation and a segmentation refinement module. This approach is able of achieving performances comparable to the current state of the art on the popular Diva-HisDB dataset, while relying on just a fraction of the available data.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":" ","pages":"2350052"},"PeriodicalIF":8.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10351091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-08-10DOI: 10.1142/S0129065723500491
Rongbiao You, Fuxiong He, Weiming Lin
Automatic checkout (ACO) aims at correctly generating complete shopping lists from checkout images. However, the domain gap between the single product in training data and multiple products in checkout images endows ACO tasks with a major difficulty. Despite remarkable advancements in recent years, resolving the significant domain gap remains challenging. It is possibly because networks trained solely on synthesized images may struggle to generalize well to realistic checkout scenarios. To this end, we propose a decoupled edge guidance network (DEGNet), which integrates synthesized and checkout images via a supervised domain adaptation approach and further learns common domain representations using a domain adapter. Specifically, an edge embedding module is designed for generating edge embedding images to introduce edge information. On this basis, we develop a decoupled feature extractor that takes original images and edge embedding images as input to jointly utilize image information and edge information. Furthermore, a novel proposal divide-and-conquer strategy (PDS) is proposed for the purpose of augmenting high-quality samples. Through experimental evaluation, DEGNet achieves state-of-the-art performance on the retail product checkout (RPC) dataset, with checkout accuracy (cAcc) results of 93.47% and 95.25% in the average mode of faster RCNN and cascade RCNN frameworks, respectively. Codes are available at https://github.com/yourbikun/DEGNet.
{"title":"Decoupled Edge Guidance Network for Automatic Checkout.","authors":"Rongbiao You, Fuxiong He, Weiming Lin","doi":"10.1142/S0129065723500491","DOIUrl":"10.1142/S0129065723500491","url":null,"abstract":"<p><p>Automatic checkout (ACO) aims at correctly generating complete shopping lists from checkout images. However, the domain gap between the single product in training data and multiple products in checkout images endows ACO tasks with a major difficulty. Despite remarkable advancements in recent years, resolving the significant domain gap remains challenging. It is possibly because networks trained solely on synthesized images may struggle to generalize well to realistic checkout scenarios. To this end, we propose a decoupled edge guidance network (DEGNet), which integrates synthesized and checkout images via a supervised domain adaptation approach and further learns common domain representations using a domain adapter. Specifically, an edge embedding module is designed for generating edge embedding images to introduce edge information. On this basis, we develop a decoupled feature extractor that takes original images and edge embedding images as input to jointly utilize image information and edge information. Furthermore, a novel proposal divide-and-conquer strategy (PDS) is proposed for the purpose of augmenting high-quality samples. Through experimental evaluation, DEGNet achieves state-of-the-art performance on the retail product checkout (RPC) dataset, with checkout accuracy (cAcc) results of 93.47% and 95.25% in the average mode of faster RCNN and cascade RCNN frameworks, respectively. Codes are available at https://github.com/yourbikun/DEGNet.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":" ","pages":"2350049"},"PeriodicalIF":8.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9979249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01Epub Date: 2023-02-24DOI: 10.1142/S0129065723750011
Olivier Darbin
{"title":"Discussion on S. Shirani, A. Valentin, G. Alarcon, F. Kazi and S. Sanei, Separating Inhibitory and Excitatory Responses of Epileptic Brain to Single-Pulse Electrical Stimulation, International Journal of Neural Systems, Vol. 33, No. 2 (2023) 2350008.","authors":"Olivier Darbin","doi":"10.1142/S0129065723750011","DOIUrl":"10.1142/S0129065723750011","url":null,"abstract":"","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 9","pages":"2375001"},"PeriodicalIF":8.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10113125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1142/S0129065723500478
Ioannis Vernikos, Evaggelos Spyrou, Ioannis-Aris Kostis, Eirini Mathe, Phivos Mylonas
In real-life scenarios, Human Activity Recognition (HAR) from video data is prone to occlusion of one or more body parts of the human subjects involved. Although it is common sense that the recognition of the majority of activities strongly depends on the motion of some body parts, which when occluded compromise the performance of recognition approaches, this problem is often underestimated in contemporary research works. Currently, training and evaluation is based on datasets that have been shot under laboratory (ideal) conditions, i.e. without any kind of occlusion. In this work, we propose an approach for HAR in the presence of partial occlusion, in cases wherein up to two body parts are involved. We assume that human motion is modeled using a set of 3D skeletal joints and also that occluded body parts remain occluded during the whole duration of the activity. We solve this problem using regression, performed by a novel deep Convolutional Recurrent Neural Network (CRNN). Specifically, given a partially occluded skeleton, we attempt to reconstruct the missing information regarding the motion of its occluded part(s). We evaluate our approach using four publicly available human motion datasets. Our experimental results indicate a significant increase of performance, when compared to baseline approaches, wherein networks that have been trained using only nonoccluded or both occluded and nonoccluded samples are evaluated using occluded samples. To the best of our knowledge, this is the first research work that formulates and copes with the problem of HAR under occlusion as a regression task.
{"title":"A Deep Regression Approach for Human Activity Recognition Under Partial Occlusion.","authors":"Ioannis Vernikos, Evaggelos Spyrou, Ioannis-Aris Kostis, Eirini Mathe, Phivos Mylonas","doi":"10.1142/S0129065723500478","DOIUrl":"https://doi.org/10.1142/S0129065723500478","url":null,"abstract":"<p><p>In real-life scenarios, Human Activity Recognition (HAR) from video data is prone to occlusion of one or more body parts of the human subjects involved. Although it is common sense that the recognition of the majority of activities strongly depends on the motion of some body parts, which when occluded compromise the performance of recognition approaches, this problem is often underestimated in contemporary research works. Currently, training and evaluation is based on datasets that have been shot under laboratory (ideal) conditions, i.e. without any kind of occlusion. In this work, we propose an approach for HAR in the presence of partial occlusion, in cases wherein up to two body parts are involved. We assume that human motion is modeled using a set of 3D skeletal joints and also that occluded body parts remain occluded during the whole duration of the activity. We solve this problem using regression, performed by a novel deep Convolutional Recurrent Neural Network (CRNN). Specifically, given a partially occluded skeleton, we attempt to reconstruct the missing information regarding the motion of its occluded part(s). We evaluate our approach using four publicly available human motion datasets. Our experimental results indicate a significant increase of performance, when compared to baseline approaches, wherein networks that have been trained using only nonoccluded or both occluded and nonoccluded samples are evaluated using occluded samples. To the best of our knowledge, this is the first research work that formulates and copes with the problem of HAR under occlusion as a regression task.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 9","pages":"2350047"},"PeriodicalIF":8.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10491948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seizures are the most prevalent clinical indication of neurological disorders in neonates. In this study, a class-imbalance aware and explainable deep learning approach based on Convolutional Neural Networks (CNNs) and Graph Attention Networks (GATs) is proposed for the accurate automated detection of neonatal seizures. The proposed model integrates the temporal information of EEG signals with the spatial information on the EEG channels through the graph representation of the multi-channel EEG segments. One-dimensional CNNs are used to automatically develop a feature set that accurately represents the differences between seizure and nonseizure epochs in the time domain. By employing GAT, the attention mechanism is utilized to emphasize the critical channel pairs and information flow among brain regions. GAT coefficients were then used to empirically visualize the important regions during the seizure and nonseizure epochs, which can provide valuable insight into the location of seizures in the neonatal brain. Additionally, to tackle the severe class imbalance in the neonatal seizure dataset using under-sampling and focal loss techniques are used. Overall, the final Spatio-Temporal Graph Attention Network (ST-GAT) outperformed previous benchmarked methods with a mean AUC of 96.6% and Kappa of 0.88, demonstrating its high accuracy and potential for clinical applications.
{"title":"A Class-Imbalance Aware and Explainable Spatio-Temporal Graph Attention Network for Neonatal Seizure Detection.","authors":"Khadijeh Raeisi, Mohammad Khazaei, Gabriella Tamburro, Pierpaolo Croce, Silvia Comani, Filippo Zappasodi","doi":"10.1142/S0129065723500466","DOIUrl":"https://doi.org/10.1142/S0129065723500466","url":null,"abstract":"<p><p>Seizures are the most prevalent clinical indication of neurological disorders in neonates. In this study, a class-imbalance aware and explainable deep learning approach based on Convolutional Neural Networks (CNNs) and Graph Attention Networks (GATs) is proposed for the accurate automated detection of neonatal seizures. The proposed model integrates the temporal information of EEG signals with the spatial information on the EEG channels through the graph representation of the multi-channel EEG segments. One-dimensional CNNs are used to automatically develop a feature set that accurately represents the differences between seizure and nonseizure epochs in the time domain. By employing GAT, the attention mechanism is utilized to emphasize the critical channel pairs and information flow among brain regions. GAT coefficients were then used to empirically visualize the important regions during the seizure and nonseizure epochs, which can provide valuable insight into the location of seizures in the neonatal brain. Additionally, to tackle the severe class imbalance in the neonatal seizure dataset using under-sampling and focal loss techniques are used. Overall, the final Spatio-Temporal Graph Attention Network (ST-GAT) outperformed previous benchmarked methods with a mean AUC of 96.6% and Kappa of 0.88, demonstrating its high accuracy and potential for clinical applications.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 9","pages":"2350046"},"PeriodicalIF":8.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10108361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1142/S012906572350048X
Jae Moon, Tom Chau
Brain-computer interfaces (BCIs) provide communicative alternatives to those without functional speech. Covert speech (CS)-based BCIs enable communication simply by thinking of words and thus have intuitive appeal. However, an elusive barrier to their clinical translation is the collection of voluminous examples of high-quality CS signals, as iteratively rehearsing words for long durations is mentally fatiguing. Research on CS and speech perception (SP) identifies common spatiotemporal patterns in their respective electroencephalographic (EEG) signals, pointing towards shared encoding mechanisms. The goal of this study was to investigate whether a model that leverages the signal similarities between SP and CS can differentiate speech-related EEG signals online. Ten participants completed a dyadic protocol where in each trial, they listened to a randomly selected word and then subsequently mentally rehearsed the word. In the offline sessions, eight words were presented to participants. For the subsequent online sessions, the two most distinct words (most separable in terms of their EEG signals) were chosen to form a ternary classification problem (two words and rest). The model comprised a functional mapping derived from SP and CS signals of the same speech token (features are extracted via a Riemannian approach). An average ternary online accuracy of 75.3% (60% chance level) was achieved across participants, with individual accuracies as high as 93%. Moreover, we observed that the signal-to-noise ratio (SNR) of CS signals was enhanced by perception-covert modeling according to the level of high-frequency ([Formula: see text]-band) correspondence between CS and SP. These findings may lead to less burdensome data collection for training speech BCIs, which could eventually enhance the rate at which the vocabulary can grow.
{"title":"Online Ternary Classification of Covert Speech by Leveraging the Passive Perception of Speech.","authors":"Jae Moon, Tom Chau","doi":"10.1142/S012906572350048X","DOIUrl":"https://doi.org/10.1142/S012906572350048X","url":null,"abstract":"<p><p>Brain-computer interfaces (BCIs) provide communicative alternatives to those without functional speech. Covert speech (CS)-based BCIs enable communication simply by thinking of words and thus have intuitive appeal. However, an elusive barrier to their clinical translation is the collection of voluminous examples of high-quality CS signals, as iteratively rehearsing words for long durations is mentally fatiguing. Research on CS and speech perception (SP) identifies common spatiotemporal patterns in their respective electroencephalographic (EEG) signals, pointing towards shared encoding mechanisms. The goal of this study was to investigate whether a model that leverages the signal similarities between SP and CS can differentiate speech-related EEG signals online. Ten participants completed a dyadic protocol where in each trial, they listened to a randomly selected word and then subsequently mentally rehearsed the word. In the offline sessions, eight words were presented to participants. For the subsequent online sessions, the two most distinct words (most separable in terms of their EEG signals) were chosen to form a ternary classification problem (two words and rest). The model comprised a functional mapping derived from SP and CS signals of the same speech token (features are extracted via a Riemannian approach). An average ternary online accuracy of 75.3% (60% chance level) was achieved across participants, with individual accuracies as high as 93%. Moreover, we observed that the signal-to-noise ratio (SNR) of CS signals was enhanced by perception-covert modeling according to the level of high-frequency ([Formula: see text]-band) correspondence between CS and SP. These findings may lead to less burdensome data collection for training speech BCIs, which could eventually enhance the rate at which the vocabulary can grow.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 9","pages":"2350048"},"PeriodicalIF":8.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10108376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01Epub Date: 2023-02-24DOI: 10.1142/S0129065723750023
Sepehr Shirani, Antonio Valentin, Gonzalo Alarcon, Farhana Kazi, Saeid Sanei
{"title":"Response to the Discussion on S. Shirani, A. Valentin, G. Alarcon, F. Kazi and S. Sanei, Separating Inhibitory and Excitatory Responses of Epileptic Brain to Single-Pulse Electrical Stimulation, International Journal of Neural Systems, Vol. 33, No. 2 (2023) 2350008.","authors":"Sepehr Shirani, Antonio Valentin, Gonzalo Alarcon, Farhana Kazi, Saeid Sanei","doi":"10.1142/S0129065723750023","DOIUrl":"10.1142/S0129065723750023","url":null,"abstract":"","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 9","pages":"2375002"},"PeriodicalIF":8.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10113126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The majority of psychogenic nonepileptic seizures (PNESs) are brought on by psychogenic causes, but because their symptoms resemble those of epilepsy, they are frequently misdiagnosed. Although EEG signals are normal in PNES cases, electroencephalography (EEG) recordings alone are not sufficient to identify the illness. Hence, accurate diagnosis and effective treatment depend on long-term video EEG data and a complete patient history. Video EEG setup, however, is more expensive than using standard EEG equipment. To distinguish PNES signals from conventional epileptic seizure (ES) signals, it is crucial to develop methods solely based on EEG recordings. The proposed study presents a technique utilizing short-term EEG data for the classification of inter-PNES, PNES, and ES segments using time-frequency methods such as the Continuous Wavelet transform (CWT), Short-Time Fourier transform (STFT), CWT-based synchrosqueezed transform (WSST), and STFT-based SST (FSST), which provide high-resolution time-frequency representations (TFRs). TFRs of EEG segments are utilized to generate 13 joint TF (J-TF)-based features, four gray-level co-occurrence matrix (GLCM)-based features, and 16 higher-order joint TF moment (HOJ-Mom)-based features. These features are then employed in the classification procedure. Both three-class (inter-PNES versus PNES versus ES: ACC: 80.9%, SEN: 81.8%, and PRE: 84.7%) and two-class (Inter-PNES versus PNES: ACC: 88.2%, SEN: 87.2%, and PRE: 86.1%; PNES versus ES: ACC: 98.5%, SEN: 99.3%, and PRE: 98.9%) classification algorithms performed well, according to the experimental results. The STFT and FSST strategies surpass the CWT and WSST strategies in terms of classification accuracy, sensitivity, and precision. Moreover, the J-TF-based feature sets often perform better than the other two.
{"title":"Classification of Epileptic and Psychogenic Nonepileptic Seizures via Time-Frequency Features of EEG Data.","authors":"Ozlem Karabiber Cura, Aydin Akan, Hatice Sabiha Ture","doi":"10.1142/S0129065723500454","DOIUrl":"https://doi.org/10.1142/S0129065723500454","url":null,"abstract":"<p><p>The majority of psychogenic nonepileptic seizures (PNESs) are brought on by psychogenic causes, but because their symptoms resemble those of epilepsy, they are frequently misdiagnosed. Although EEG signals are normal in PNES cases, electroencephalography (EEG) recordings alone are not sufficient to identify the illness. Hence, accurate diagnosis and effective treatment depend on long-term video EEG data and a complete patient history. Video EEG setup, however, is more expensive than using standard EEG equipment. To distinguish PNES signals from conventional epileptic seizure (ES) signals, it is crucial to develop methods solely based on EEG recordings. The proposed study presents a technique utilizing short-term EEG data for the classification of inter-PNES, PNES, and ES segments using time-frequency methods such as the Continuous Wavelet transform (CWT), Short-Time Fourier transform (STFT), CWT-based synchrosqueezed transform (WSST), and STFT-based SST (FSST), which provide high-resolution time-frequency representations (TFRs). TFRs of EEG segments are utilized to generate 13 joint TF (J-TF)-based features, four gray-level co-occurrence matrix (GLCM)-based features, and 16 higher-order joint TF moment (HOJ-Mom)-based features. These features are then employed in the classification procedure. Both three-class (inter-PNES versus PNES versus ES: ACC: 80.9%, SEN: 81.8%, and PRE: 84.7%) and two-class (Inter-PNES versus PNES: ACC: 88.2%, SEN: 87.2%, and PRE: 86.1%; PNES versus ES: ACC: 98.5%, SEN: 99.3%, and PRE: 98.9%) classification algorithms performed well, according to the experimental results. The STFT and FSST strategies surpass the CWT and WSST strategies in terms of classification accuracy, sensitivity, and precision. Moreover, the J-TF-based feature sets often perform better than the other two.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 9","pages":"2350045"},"PeriodicalIF":8.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10119932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiking Neural Networks (SNNs) help achieve brain-like efficiency and functionality by building neurons and synapses that mimic the human brain's transmission of electrical signals. However, optimal SNN implementation requires a precise balance of parametric values. To design such ubiquitous neural networks, a graphical tool for visualizing, analyzing, and explaining the internal behavior of spikes is crucial. Although some popular SNN simulators are available, these tools do not allow users to interact with the neural network during simulation. To this end, we have introduced the first runtime interactive simulator, called Runtime Analyzing and Visualization Simulator (RAVSim),a developed to analyze and dynamically visualize the behavior of SNNs, allowing end-users to interact, observe output concentration reactions, and make changes directly during the simulation. In this paper, we present RAVSim with the current implementation of runtime interaction using the LIF neural model with different connectivity schemes, an image classification model using SNNs, and a dataset creation feature. Our main objective is to primarily investigate binary classification using SNNs with RGB images. We created a feed-forward network using the LIF neural model for an image classification algorithm and evaluated it by using RAVSim. The algorithm classifies faces with and without masks, achieving an accuracy of 91.8% using 1000 neurons in a hidden layer, 0.0758 MSE, and an execution time of ∼10[Formula: see text]min on the CPU. The experimental results show that using RAVSim not only increases network design speed but also accelerates user learning capability.
{"title":"Evaluation of Spiking Neural Nets-Based Image Classification Using the Runtime Simulator RAVSim.","authors":"Sanaullah, Shamini Koravuna, Ulrich Rückert, Thorsten Jungeblut","doi":"10.1142/S0129065723500442","DOIUrl":"https://doi.org/10.1142/S0129065723500442","url":null,"abstract":"<p><p>Spiking Neural Networks (SNNs) help achieve brain-like efficiency and functionality by building neurons and synapses that mimic the human brain's transmission of electrical signals. However, optimal SNN implementation requires a precise balance of parametric values. To design such ubiquitous neural networks, a graphical tool for visualizing, analyzing, and explaining the internal behavior of spikes is crucial. Although some popular SNN simulators are available, these tools do not allow users to interact with the neural network during simulation. To this end, we have introduced the first runtime interactive simulator, called Runtime Analyzing and Visualization Simulator (<i>RAVSim</i>),<sup>a</sup> developed to analyze and dynamically visualize the behavior of SNNs, allowing end-users to interact, observe output concentration reactions, and make changes directly during the simulation. In this paper, we present <i>RAVSim</i> with the current implementation of runtime interaction using the LIF neural model with different connectivity schemes, an image classification model using SNNs, and a dataset creation feature. Our main objective is to primarily investigate binary classification using SNNs with RGB images. We created a feed-forward network using the LIF neural model for an image classification algorithm and evaluated it by using <i>RAVSim</i>. The algorithm classifies faces with and without masks, achieving an accuracy of 91.8% using 1000 neurons in a hidden layer, 0.0758 MSE, and an execution time of ∼10[Formula: see text]min on the CPU. The experimental results show that using <i>RAVSim</i> not only increases network design speed but also accelerates user learning capability.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 9","pages":"2350044"},"PeriodicalIF":8.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10109353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.1142/S0129065723500430
Lu Wen, Jianghong Xiao, Shuai Tan, Xi Wu, Jiliu Zhou, Xingchen Peng, Yan Wang
Radiation therapy is a fundamental cancer treatment in the clinic. However, to satisfy the clinical requirements, radiologists have to iteratively adjust the radiotherapy plan based on experience, causing it extremely subjective and time-consuming to obtain a clinically acceptable plan. To this end, we introduce a transformer-embedded multi-task dose prediction (TransMTDP) network to automatically predict the dose distribution in radiotherapy. Specifically, to achieve more stable and accurate dose predictions, three highly correlated tasks are included in our TransMTDP network, i.e. a main dose prediction task to provide each pixel with a fine-grained dose value, an auxiliary isodose lines prediction task to produce coarse-grained dose ranges, and an auxiliary gradient prediction task to learn subtle gradient information such as radiation patterns and edges in the dose maps. The three correlated tasks are integrated through a shared encoder, following the multi-task learning strategy. To strengthen the connection of the output layers for different tasks, we further use two additional constraints, i.e. isodose consistency loss and gradient consistency loss, to reinforce the match between the dose distribution features generated by the auxiliary tasks and the main task. Additionally, considering many organs in the human body are symmetrical and the dose maps present abundant global features, we embed the transformer into our framework to capture the long-range dependencies of the dose maps. Evaluated on an in-house rectum cancer dataset and a public head and neck cancer dataset, our method gains superior performance compared with the state-of-the-art ones. Code is available at https://github.com/luuuwen/TransMTDP.
{"title":"A Transformer-Embedded Multi-Task Model for Dose Distribution Prediction.","authors":"Lu Wen, Jianghong Xiao, Shuai Tan, Xi Wu, Jiliu Zhou, Xingchen Peng, Yan Wang","doi":"10.1142/S0129065723500430","DOIUrl":"https://doi.org/10.1142/S0129065723500430","url":null,"abstract":"<p><p>Radiation therapy is a fundamental cancer treatment in the clinic. However, to satisfy the clinical requirements, radiologists have to iteratively adjust the radiotherapy plan based on experience, causing it extremely subjective and time-consuming to obtain a clinically acceptable plan. To this end, we introduce a transformer-embedded multi-task dose prediction (TransMTDP) network to automatically predict the dose distribution in radiotherapy. Specifically, to achieve more stable and accurate dose predictions, three highly correlated tasks are included in our TransMTDP network, i.e. a main dose prediction task to provide each pixel with a fine-grained dose value, an auxiliary isodose lines prediction task to produce coarse-grained dose ranges, and an auxiliary gradient prediction task to learn subtle gradient information such as radiation patterns and edges in the dose maps. The three correlated tasks are integrated through a shared encoder, following the multi-task learning strategy. To strengthen the connection of the output layers for different tasks, we further use two additional constraints, i.e. isodose consistency loss and gradient consistency loss, to reinforce the match between the dose distribution features generated by the auxiliary tasks and the main task. Additionally, considering many organs in the human body are symmetrical and the dose maps present abundant global features, we embed the transformer into our framework to capture the long-range dependencies of the dose maps. Evaluated on an in-house rectum cancer dataset and a public head and neck cancer dataset, our method gains superior performance compared with the state-of-the-art ones. Code is available at https://github.com/luuuwen/TransMTDP.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 8","pages":"2350043"},"PeriodicalIF":8.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9915076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}