Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9747266
Fei Ye, Zhiqiang Wang, Sheng Zhu, Xuanya Li, Kai Hu
In this paper, we propose a novel convolutional neural network based on adaptive multi-scale feature aggregation and boundary-aware for lateral ventricle segmentation (MB-Net), which mainly includes three parts, i.e., an adaptive multi-scale feature aggregation module (AMSFM), an embedded boundary refinement module (EBRM), and a local feature extraction module (LFM). Specifically, the AMSFM is used to extract multi-scale features through the different receptive fields to effectively solve the problem of distinct target regions on magnetic resonance (MR) images. The EBRM is intended to extract boundary information to effectively solve blurred boundary problems. The LFM can make the extraction of local information based on spatial and channel attention mechanisms to solve the problem of irregular shapes. Finally, extensive experiments are conducted from different perspectives to evaluate the performance of the proposed MB-Net. Furthermore, we also verify the robustness of the model on other public datasets, i.e., COVID-SemiSeg and CHASE DB1. The results show that our MB-Net can achieve competitive results when compared with state-of-the-art methods.
{"title":"A Novel Convolutional Neural Network Based on Adaptive Multi-Scale Aggregation and Boundary-Aware for Lateral Ventricle Segmentation on MR images","authors":"Fei Ye, Zhiqiang Wang, Sheng Zhu, Xuanya Li, Kai Hu","doi":"10.1109/icassp43922.2022.9747266","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747266","url":null,"abstract":"In this paper, we propose a novel convolutional neural network based on adaptive multi-scale feature aggregation and boundary-aware for lateral ventricle segmentation (MB-Net), which mainly includes three parts, i.e., an adaptive multi-scale feature aggregation module (AMSFM), an embedded boundary refinement module (EBRM), and a local feature extraction module (LFM). Specifically, the AMSFM is used to extract multi-scale features through the different receptive fields to effectively solve the problem of distinct target regions on magnetic resonance (MR) images. The EBRM is intended to extract boundary information to effectively solve blurred boundary problems. The LFM can make the extraction of local information based on spatial and channel attention mechanisms to solve the problem of irregular shapes. Finally, extensive experiments are conducted from different perspectives to evaluate the performance of the proposed MB-Net. Furthermore, we also verify the robustness of the model on other public datasets, i.e., COVID-SemiSeg and CHASE DB1. The results show that our MB-Net can achieve competitive results when compared with state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"559 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114791894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The novel working principle enables spiking cameras to capture high-speed moving objects. However, the applications of spiking cameras can be affected by many factors, such as brightness intensity, detectable distance, and the maximum speed of moving targets. Improper settings such as weak ambient brightness and too short object-camera distance, will lead to failure in the application of such cameras. To address the issue, this paper proposes a modeling algorithm that studies the detection capability of spiking cameras. The algorithm deduces the maximum detectable speed of spiking cameras corresponding to different scenario settings (e.g., brightness intensity, camera lens, and object-camera distance) based on the basic technical parameters of cameras (e.g., pixel size, spatial and temporal resolution). Thereby, the proper camera settings for various applications can be determined. Extensive experiments verify the effectiveness of the modeling algorithm. To our best knowledge, it is the first work to investigate the detection capability of spiking cameras.
{"title":"Modeling The Detection Capability Of High-Speed Spiking Cameras","authors":"Junwei Zhao, Zhaofei Yu, Lei Ma, Ziluo Ding, Shiliang Zhang, Yonghong Tian, Tiejun Huang","doi":"10.1109/ICASSP43922.2022.9747018","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747018","url":null,"abstract":"The novel working principle enables spiking cameras to capture high-speed moving objects. However, the applications of spiking cameras can be affected by many factors, such as brightness intensity, detectable distance, and the maximum speed of moving targets. Improper settings such as weak ambient brightness and too short object-camera distance, will lead to failure in the application of such cameras. To address the issue, this paper proposes a modeling algorithm that studies the detection capability of spiking cameras. The algorithm deduces the maximum detectable speed of spiking cameras corresponding to different scenario settings (e.g., brightness intensity, camera lens, and object-camera distance) based on the basic technical parameters of cameras (e.g., pixel size, spatial and temporal resolution). Thereby, the proper camera settings for various applications can be determined. Extensive experiments verify the effectiveness of the modeling algorithm. To our best knowledge, it is the first work to investigate the detection capability of spiking cameras.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124337961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/ICASSP43922.2022.9747745
A. Ogawa, Naohiro Tawara, Marc Delcroix, S. Araki
We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.
{"title":"Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models","authors":"A. Ogawa, Naohiro Tawara, Marc Delcroix, S. Araki","doi":"10.1109/ICASSP43922.2022.9747745","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747745","url":null,"abstract":"We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127627202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9747155
Lichao Zhang, Yi Ren, Liqun Deng, Zhou Zhao
Building a high-fidelity speech synthesis system with noisy speech data is a challenging but valuable task, which could significantly reduce the cost of data collection. Existing methods usually train speech synthesis systems based on the speech denoised with an enhancement model or feed noise information as a condition into the system. These methods certainly have some effect on inhibiting noise, but the quality and the prosody of their synthesized speech are still far away from natural speech. In this paper, we propose HiFiDenoise, a speech synthesis system with adversarial networks that can synthesize high-fidelity speech with low-quality and noisy speech data. Specifically, 1) to tackle the difficulty of noise modeling, we introduce multi-length adversarial training in the noise condition module. 2) To handle the problem of inaccurate pitch extraction caused by noise, we remove the pitch predictor in the acoustic model and also add discriminators on the mel-spectrogram generator. 3) In addition, we also apply HiFiDenoise to singing voice synthesis with a noisy singing dataset. Experiments show that our model outperforms the baseline by 0.36 and 0.44 in terms of MOS on speech and singing respectively.
{"title":"HiFiDenoise: High-Fidelity Denoising Text to Speech with Adversarial Networks","authors":"Lichao Zhang, Yi Ren, Liqun Deng, Zhou Zhao","doi":"10.1109/icassp43922.2022.9747155","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747155","url":null,"abstract":"Building a high-fidelity speech synthesis system with noisy speech data is a challenging but valuable task, which could significantly reduce the cost of data collection. Existing methods usually train speech synthesis systems based on the speech denoised with an enhancement model or feed noise information as a condition into the system. These methods certainly have some effect on inhibiting noise, but the quality and the prosody of their synthesized speech are still far away from natural speech. In this paper, we propose HiFiDenoise, a speech synthesis system with adversarial networks that can synthesize high-fidelity speech with low-quality and noisy speech data. Specifically, 1) to tackle the difficulty of noise modeling, we introduce multi-length adversarial training in the noise condition module. 2) To handle the problem of inaccurate pitch extraction caused by noise, we remove the pitch predictor in the acoustic model and also add discriminators on the mel-spectrogram generator. 3) In addition, we also apply HiFiDenoise to singing voice synthesis with a noisy singing dataset. Experiments show that our model outperforms the baseline by 0.36 and 0.44 in terms of MOS on speech and singing respectively.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127761644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9747415
Xuezhi Tong, Rui Wang, Chuan Wang, Sanyi Zhang, Xiaochun Cao
Scene graph generation aims to describe the contents in scenes by identifying the objects and their relationships. In previous works, visual context is widely utilized in message passing networks to generate the representations for classification. However, the noisy estimation of visual context limits model performance. In this paper, we revisit the concept of incorporating visual context via a randomly ordered bidirectional Long Short Temporal Memory (biLSTM) based baseline, and show that noisy estimation is worse than random. To alleviate the problem, we propose a new method, dubbed Progressive Message Passing Network (PMP-Net) that better estimates the visual context in a coarse to fine manner. Specifically, we first estimate the visual context with a random initiated scene graph, then refine it with multi-head attention. The experimental results on the benchmark dataset Visual Genome show that PMP-Net achieves better or comparable performance on all three tasks: scene graph generation (SGGen), scene graph classification (SGCls), and predicate classification (PredCls).
{"title":"PMP-NET: Rethinking Visual Context for Scene Graph Generation","authors":"Xuezhi Tong, Rui Wang, Chuan Wang, Sanyi Zhang, Xiaochun Cao","doi":"10.1109/icassp43922.2022.9747415","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747415","url":null,"abstract":"Scene graph generation aims to describe the contents in scenes by identifying the objects and their relationships. In previous works, visual context is widely utilized in message passing networks to generate the representations for classification. However, the noisy estimation of visual context limits model performance. In this paper, we revisit the concept of incorporating visual context via a randomly ordered bidirectional Long Short Temporal Memory (biLSTM) based baseline, and show that noisy estimation is worse than random. To alleviate the problem, we propose a new method, dubbed Progressive Message Passing Network (PMP-Net) that better estimates the visual context in a coarse to fine manner. Specifically, we first estimate the visual context with a random initiated scene graph, then refine it with multi-head attention. The experimental results on the benchmark dataset Visual Genome show that PMP-Net achieves better or comparable performance on all three tasks: scene graph generation (SGGen), scene graph classification (SGCls), and predicate classification (PredCls).","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126297600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/ICASSP43922.2022.9746658
Xinhui Rong, V. Solo
Point processes are finding increasing applications in neuroscience, genomics, and social media. But basic modelling properties are little studied. Here we consider a periodic time-varying Poisson model and develop the asymptotic Cramer-Rao bound. We also develop, for the first time, a maximum likelihood algorithm for parameter estimation.
{"title":"Cramer-Rao Bound for the Time-Varying Poisson","authors":"Xinhui Rong, V. Solo","doi":"10.1109/ICASSP43922.2022.9746658","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746658","url":null,"abstract":"Point processes are finding increasing applications in neuroscience, genomics, and social media. But basic modelling properties are little studied. Here we consider a periodic time-varying Poisson model and develop the asymptotic Cramer-Rao bound. We also develop, for the first time, a maximum likelihood algorithm for parameter estimation.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126306329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746909
Yunhe Xie, Chengjie Sun, Zhenzhou Ji
The recent surges in the open conversational data caused Emotion Recognition in Spoken Dialog (ERSD) to gain much attention. However, the existing ERSD datasets’ scale limits the model’s complete reasoning. Moreover, the artificial dialogue agent is ideally able to reference past dialogue experiences. This paper proposes a Commonsense Knowledge Enhanced Network with a retrospective loss, namely CKE-Net, to hierarchically perform dialog modeling, external knowledge integration, and historical state retrospect. Specifically, we first adopt a transformer-based encoder to model context in multi-view by elaborating different mask matrices. Then, the graph attention network is used to introduce commonsense knowledge, which benefits the complex emotional reasoning. Finally, a retrospective loss is added to utilize the model’s prior experience during training. Experiments on IEMOCAP and MELD datasets demonstrate that every designed module is consistently beneficial to the performance. Extensive experimental results show that our model outperforms the state-of-the-art models across the two benchmark datasets.
{"title":"A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog","authors":"Yunhe Xie, Chengjie Sun, Zhenzhou Ji","doi":"10.1109/icassp43922.2022.9746909","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746909","url":null,"abstract":"The recent surges in the open conversational data caused Emotion Recognition in Spoken Dialog (ERSD) to gain much attention. However, the existing ERSD datasets’ scale limits the model’s complete reasoning. Moreover, the artificial dialogue agent is ideally able to reference past dialogue experiences. This paper proposes a Commonsense Knowledge Enhanced Network with a retrospective loss, namely CKE-Net, to hierarchically perform dialog modeling, external knowledge integration, and historical state retrospect. Specifically, we first adopt a transformer-based encoder to model context in multi-view by elaborating different mask matrices. Then, the graph attention network is used to introduce commonsense knowledge, which benefits the complex emotional reasoning. Finally, a retrospective loss is added to utilize the model’s prior experience during training. Experiments on IEMOCAP and MELD datasets demonstrate that every designed module is consistently beneficial to the performance. Extensive experimental results show that our model outperforms the state-of-the-art models across the two benchmark datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126306338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/ICASSP43922.2022.9747722
Guangwei Li, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, K. Yu
Previous audio enhancement training usually requires clean signals with additive noises; hence commonly focuses on speech enhancement, where clean speech is easy to access. This paper goes beyond a broader sound event enhancement by using a weakly supervised approach via sound event detection (SED) to approximate the location and presence of a specific sound event. We propose a category-adapted system to enable enhancement on any selected sound category, where we first familiarize the model to all common sound classes and followed by a category-specific fine-tune procedure to enhance the targeted sound class. Evaluation is conducted on ten common sound classes, with a comparison to traditional and weakly supervised enhancement methods. Results indicate an average 2.86 dB SDR increase, with more significant improvement on speech (9.15 dB), music (5.01 dB), and typewriter (3.68 dB) under SNR of 0 dB. All enhancement metrics outperform previous weakly supervised methods and achieve comparable results to the state-of-the-art method that requires clean signals.
{"title":"Category-Adapted Sound Event Enhancement with Weakly Labeled Data","authors":"Guangwei Li, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, K. Yu","doi":"10.1109/ICASSP43922.2022.9747722","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747722","url":null,"abstract":"Previous audio enhancement training usually requires clean signals with additive noises; hence commonly focuses on speech enhancement, where clean speech is easy to access. This paper goes beyond a broader sound event enhancement by using a weakly supervised approach via sound event detection (SED) to approximate the location and presence of a specific sound event. We propose a category-adapted system to enable enhancement on any selected sound category, where we first familiarize the model to all common sound classes and followed by a category-specific fine-tune procedure to enhance the targeted sound class. Evaluation is conducted on ten common sound classes, with a comparison to traditional and weakly supervised enhancement methods. Results indicate an average 2.86 dB SDR increase, with more significant improvement on speech (9.15 dB), music (5.01 dB), and typewriter (3.68 dB) under SNR of 0 dB. All enhancement metrics outperform previous weakly supervised methods and achieve comparable results to the state-of-the-art method that requires clean signals.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126313138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9747015
Hadi Habibzadeh, Kevin J. Long, Allyson Atkins, Daphney-Stavroula Zois, James J. S. Norton
We present metamer identification plus (metaID+), an algorithm that enhances the performance of brain-computer interface (BCI)-based color vision assessment. BCI-based color vision assessment uses steady-state visual evoked potentials (SSVEPs) elicited during a grid search of colors to identify metamers—light sources with different spectral distributions that appear to be the same color. Present BCI-based color vision assessment methods are slow; they require extensive data collection for each color in the grid search to reduce measurement noise. metaID+ suppresses measurement noise using Gaussian process regression (i.e., a covariance function is used to replace each measurement with the weighted sum of all of the measurements). Thus, metaID+ reduces the amount of data required for each measurement. We evaluated metaID+ using data collected from ten participants and compared the sum-of-squared errors (SSE; relative to the average grid of each participant) between our algorithm and metaID (an existing algorithm). metaID+ significantly reduced the SSE. In addition, metaID+ achieved metaID’s minimum SSE while using 61.3% less data. By using less data to achieve the same level of error, metaID+ improves the performance of BCI-based color vision assessment.
{"title":"Improving BCI-based Color Vision Assessment Using Gaussian Process Regression","authors":"Hadi Habibzadeh, Kevin J. Long, Allyson Atkins, Daphney-Stavroula Zois, James J. S. Norton","doi":"10.1109/icassp43922.2022.9747015","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747015","url":null,"abstract":"We present metamer identification plus (metaID+), an algorithm that enhances the performance of brain-computer interface (BCI)-based color vision assessment. BCI-based color vision assessment uses steady-state visual evoked potentials (SSVEPs) elicited during a grid search of colors to identify metamers—light sources with different spectral distributions that appear to be the same color. Present BCI-based color vision assessment methods are slow; they require extensive data collection for each color in the grid search to reduce measurement noise. metaID+ suppresses measurement noise using Gaussian process regression (i.e., a covariance function is used to replace each measurement with the weighted sum of all of the measurements). Thus, metaID+ reduces the amount of data required for each measurement. We evaluated metaID+ using data collected from ten participants and compared the sum-of-squared errors (SSE; relative to the average grid of each participant) between our algorithm and metaID (an existing algorithm). metaID+ significantly reduced the SSE. In addition, metaID+ achieved metaID’s minimum SSE while using 61.3% less data. By using less data to achieve the same level of error, metaID+ improves the performance of BCI-based color vision assessment.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126327092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746434
Jean-Christophe Gagnon-Audet, Soroosh Shahtalebi, Frank Rudzicz, I. Rish
Machine learning models often fail to generalize to unseen domains due to the distributional shifts. A family of such shifts, “correlation shifts,” is caused by spurious correlations in the data. It is studied under the overarching topic of “domain generalization.” In this work, we employ multi-modal translation networks to tackle the correlation shifts that appear when data is sampled out-of-distribution. Learning a generative model from training domains enables us to translate each training sample under the special characteristics of other possible domains. We show that by training a predictor solely on the generated samples, the spurious correlations in training domains average out, and the invariant features corresponding to true correlations emerge. Our proposed technique, Expected Domain Translation (EDT), is benchmarked on the Colored MNIST dataset and drastically improves the state-of-the-art classification accuracy by 38% with train-domain validation model selection.
{"title":"A Remedy For Distributional Shifts Through Expected Domain Translation","authors":"Jean-Christophe Gagnon-Audet, Soroosh Shahtalebi, Frank Rudzicz, I. Rish","doi":"10.1109/icassp43922.2022.9746434","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746434","url":null,"abstract":"Machine learning models often fail to generalize to unseen domains due to the distributional shifts. A family of such shifts, “correlation shifts,” is caused by spurious correlations in the data. It is studied under the overarching topic of “domain generalization.” In this work, we employ multi-modal translation networks to tackle the correlation shifts that appear when data is sampled out-of-distribution. Learning a generative model from training domains enables us to translate each training sample under the special characteristics of other possible domains. We show that by training a predictor solely on the generated samples, the spurious correlations in training domains average out, and the invariant features corresponding to true correlations emerge. Our proposed technique, Expected Domain Translation (EDT), is benchmarked on the Colored MNIST dataset and drastically improves the state-of-the-art classification accuracy by 38% with train-domain validation model selection.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125631903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}