首页 > 最新文献

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion 基于语音后图的跨语言语音转换的领域适应和语言条件调节
Pin-Chieh Hsu, N. Minematsu, D. Saito
In this work, we examine two methods for im-proving phonetic posteriorgram (PPG) based cross-lingual voice conversion (CLV C). Previous research usually utilized a speaker encoder to characterize speakers' identity; however, the speaker embedding learned by the previous model tends to be language- dependent, degrading the performance of converted speeches. Therefore, we propose using the technique of domain-adversarial training. With this approach, the speaker embedding in different languages can be adapted into the same distribution to form a language-independent speaker embedding space. The other approach we propose is to employ external language conditioning to support our model to disentangle the language information from the speaker embedding. In our experiments, both methods are evaluated on a Japanese-English bilingual database. Besides subjective evaluation, two automatic objective assessment systems are adopted to assess the quality and speaker similarity of converted utterances. According to the experimental results, the two proposed methods can generate speaker embedding with reduced language dependency and improve the naturalness and speaker similarity of converted speeches.
在这项工作中,我们研究了两种改进基于语音后图(PPG)的跨语言语音转换(CLV C)的方法。先前的研究通常使用说话人编码器来表征说话人的身份;然而,先前的模型学习到的说话人嵌入倾向于语言依赖,降低了转换后的演讲的性能。因此,我们建议使用领域对抗训练技术。利用该方法,可以将不同语言的说话人嵌入到相同的分布中,形成与语言无关的说话人嵌入空间。我们提出的另一种方法是使用外部语言条件反射来支持我们的模型,以从说话人嵌入中分离语言信息。在我们的实验中,两种方法都在一个日英双语数据库上进行了评估。在主观评价的基础上,采用两套自动客观评价系统对转换后的话语质量和说话人相似度进行评价。实验结果表明,所提出的两种方法均能生成语言依赖性较低的说话人嵌入,提高转换后的语音的自然度和说话人相似度。
{"title":"Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion","authors":"Pin-Chieh Hsu, N. Minematsu, D. Saito","doi":"10.23919/APSIPAASC55919.2022.9979918","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979918","url":null,"abstract":"In this work, we examine two methods for im-proving phonetic posteriorgram (PPG) based cross-lingual voice conversion (CLV C). Previous research usually utilized a speaker encoder to characterize speakers' identity; however, the speaker embedding learned by the previous model tends to be language- dependent, degrading the performance of converted speeches. Therefore, we propose using the technique of domain-adversarial training. With this approach, the speaker embedding in different languages can be adapted into the same distribution to form a language-independent speaker embedding space. The other approach we propose is to employ external language conditioning to support our model to disentangle the language information from the speaker embedding. In our experiments, both methods are evaluated on a Japanese-English bilingual database. Besides subjective evaluation, two automatic objective assessment systems are adopted to assess the quality and speaker similarity of converted utterances. According to the experimental results, the two proposed methods can generate speaker embedding with reduced language dependency and improve the naturalness and speaker similarity of converted speeches.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114278561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physiological study on the effect of game events in response to player's laughter 游戏事件对玩家笑声反应的生理学研究
Mikito Fukuda, Y. Arimoto
To investigate whether computer's automatic responses to our emotional expression influences our cognitive and emotional involvement in a virtual world, this study examined to measure the player's physiological reactions to game events presented in response to the players' spontaneous laughter. Participants played two conditional virtual games in our experiments, and their electrocardiogram, electrodermal activity, and facial electromyography (corrugator supercilii muscle and zygomaticus major muscle) were recorded during the games. The experiment consisted of two conditions, namely advantageous event condition and disadvantageous event condition. In the advantageous event condition, the system responded to the player's laughter with an event that benefitted the player. In the disadvantageous event condition, the system responded to the player's laughter with an event that annoyed the player. A three-way analysis of variance was performed using these physiological signals to test the hypothesis that there is time-series variation in physiological responses between both event types and event durations. As a result, a significantly slower heart rate was observed after the presentation of an event in both the advantageous/disadvantageous event conditions. This result suggests that the players paid more attention to the game when any event was generated against their laughter. Moreover, both type of events to the player's laughter more activated electrodermal activity and corrugator supercilii muscle. In particular, the disadvantageous events to the player's laughter more activated corrugator supercilii muscle than the advantageous event. These results suggest that players were more emotionally engaged in the game when they encountered troublesome or fortunate situations while laughing.
为了研究计算机对我们情绪表达的自动反应是否会影响我们在虚拟世界中的认知和情感参与,本研究检测了玩家对游戏事件的生理反应,以回应玩家自发的笑声。在我们的实验中,参与者玩了两个条件虚拟游戏,并在游戏过程中记录了他们的心电图、皮肤电活动和面部肌电图(瓦楞纸上纤毛肌和颧大肌)。实验分为有利事件条件和不利事件条件。在有利事件条件下,系统用对玩家有利的事件回应玩家的笑声。在不利事件条件下,系统用一个让玩家恼火的事件来回应玩家的笑声。使用这些生理信号进行三向方差分析,以检验事件类型和事件持续时间之间的生理反应存在时间序列变化的假设。结果,在有利/不利事件条件下,在事件呈现后观察到明显较慢的心率。这一结果表明,当任何事件与他们的笑声相冲突时,玩家会更加关注游戏。而且,这两种类型的事件都能使玩家的笑声更活跃地激活皮肤电活动和皱襞上纤毛肌。特别是,不利事件对运动员的笑声比有利事件更能激活瓦楞肌上纤毛肌。这些结果表明,当玩家笑着遇到麻烦或幸运的情况时,他们会更加投入到游戏中。
{"title":"Physiological study on the effect of game events in response to player's laughter","authors":"Mikito Fukuda, Y. Arimoto","doi":"10.23919/APSIPAASC55919.2022.9979868","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979868","url":null,"abstract":"To investigate whether computer's automatic responses to our emotional expression influences our cognitive and emotional involvement in a virtual world, this study examined to measure the player's physiological reactions to game events presented in response to the players' spontaneous laughter. Participants played two conditional virtual games in our experiments, and their electrocardiogram, electrodermal activity, and facial electromyography (corrugator supercilii muscle and zygomaticus major muscle) were recorded during the games. The experiment consisted of two conditions, namely advantageous event condition and disadvantageous event condition. In the advantageous event condition, the system responded to the player's laughter with an event that benefitted the player. In the disadvantageous event condition, the system responded to the player's laughter with an event that annoyed the player. A three-way analysis of variance was performed using these physiological signals to test the hypothesis that there is time-series variation in physiological responses between both event types and event durations. As a result, a significantly slower heart rate was observed after the presentation of an event in both the advantageous/disadvantageous event conditions. This result suggests that the players paid more attention to the game when any event was generated against their laughter. Moreover, both type of events to the player's laughter more activated electrodermal activity and corrugator supercilii muscle. In particular, the disadvantageous events to the player's laughter more activated corrugator supercilii muscle than the advantageous event. These results suggest that players were more emotionally engaged in the game when they encountered troublesome or fortunate situations while laughing.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128655295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks 利用预先训练的声学特征提取器进行情感声乐爆发任务
Bagus Tris Atmaja, A. Sasou
Understanding humans' emotions is a challenge for computers. Nowadays, research on speech emotion recognition has been conducted progressively. Instead of a speech, affective information may lay on short vocal bursts (i.e., cry when sad). In this study, we evaluated a recent self-supervised learning model to extract acoustic embedding for affective vocal bursts tasks. There are four tasks investigated on both regression and classification problems. Using similar architectures, we found the effectiveness of using a pre-trained model over the baseline methods. The study is further expanded to evaluate the different number of seeds, patiences, and batch sizes on the performance of the four tasks.
理解人类的情感对计算机来说是一个挑战。目前,语音情感识别的研究已逐步展开。情感信息可能以短时间的声音爆发(例如,悲伤时哭泣)来代替演讲。在这项研究中,我们评估了最近的一种自监督学习模型,用于提取情感声爆发任务的声嵌入。在回归和分类问题上研究了四个任务。使用类似的架构,我们发现在基线方法上使用预训练模型的有效性。研究进一步扩展,以评估不同的种子数量,患者和批量大小对四项任务的性能。
{"title":"Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks","authors":"Bagus Tris Atmaja, A. Sasou","doi":"10.23919/APSIPAASC55919.2022.9980083","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980083","url":null,"abstract":"Understanding humans' emotions is a challenge for computers. Nowadays, research on speech emotion recognition has been conducted progressively. Instead of a speech, affective information may lay on short vocal bursts (i.e., cry when sad). In this study, we evaluated a recent self-supervised learning model to extract acoustic embedding for affective vocal bursts tasks. There are four tasks investigated on both regression and classification problems. Using similar architectures, we found the effectiveness of using a pre-trained model over the baseline methods. The study is further expanded to evaluate the different number of seeds, patiences, and batch sizes on the performance of the four tasks.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"326 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129445227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimal Deep Multi-Route Self-Attention for Single Image Super-Resolution 单幅图像超分辨率的最优深度多路径自关注
Nisawan Ngambenjavichaikul, Sovann Chen, S. Aramvith
Image restoration, such as single image super-resolution (SISR), is a long-established low-level vision issue that intends to regenerate high-resolution (HR) images from low-resolution (LR) input counterparts. While state-of-the-art image super-resolution models are based on the well-known convolutional neural network (CNN), many self-attention-based or transformer-based experiment attempts have been conducted and have shown promising performance on vision problems. A powerful baseline model based on the swin transformer adopts the shifted window approach. It enhances the capability by restricting the model to compute the self-attention function only on non-superimpose local windows while enabling cross-window relations. However, the architecture design is manually fixed. Therefore, the results are not achieving optimal performance. This paper presents an optimal deep multi-route self-attention network for single image super-resolution (ODMR-SASR). The genetic algorithm (GA) is introduced to discover the optimal number of filters and layers. Experimental results demonstrate that the proposed optimization technique can produce a progressive SR image quality.
图像恢复,如单图像超分辨率(SISR),是一个长期存在的低水平视觉问题,旨在从低分辨率(LR)输入对立物中再生高分辨率(HR)图像。虽然最先进的图像超分辨率模型是基于众所周知的卷积神经网络(CNN),但许多基于自注意力或基于变压器的实验尝试已经进行,并在视觉问题上显示出有希望的性能。基于swin变压器的强大基线模型采用移窗方法。它通过限制模型仅在非重叠的局部窗口上计算自关注函数而支持跨窗口关系来增强能力。然而,架构设计是手动固定的。因此,结果没有达到最佳性能。提出了一种用于单幅图像超分辨率(ODMR-SASR)的最优深度多路由自关注网络。引入遗传算法(GA)来发现最优滤波器和层数。实验结果表明,所提出的优化技术可以产生渐进的SR图像质量。
{"title":"Optimal Deep Multi-Route Self-Attention for Single Image Super-Resolution","authors":"Nisawan Ngambenjavichaikul, Sovann Chen, S. Aramvith","doi":"10.23919/APSIPAASC55919.2022.9979962","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979962","url":null,"abstract":"Image restoration, such as single image super-resolution (SISR), is a long-established low-level vision issue that intends to regenerate high-resolution (HR) images from low-resolution (LR) input counterparts. While state-of-the-art image super-resolution models are based on the well-known convolutional neural network (CNN), many self-attention-based or transformer-based experiment attempts have been conducted and have shown promising performance on vision problems. A powerful baseline model based on the swin transformer adopts the shifted window approach. It enhances the capability by restricting the model to compute the self-attention function only on non-superimpose local windows while enabling cross-window relations. However, the architecture design is manually fixed. Therefore, the results are not achieving optimal performance. This paper presents an optimal deep multi-route self-attention network for single image super-resolution (ODMR-SASR). The genetic algorithm (GA) is introduced to discover the optimal number of filters and layers. Experimental results demonstrate that the proposed optimization technique can produce a progressive SR image quality.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"46 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113974158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering of advertising images using electroencephalogram 广告图像的脑电图聚类
Ingon Chanpornpakdi, Motoi Noda, Toshihisa Tanaka, Yuval Harpaz, A. Geva
Packaging and advertisements of brands affect customers' decision-making on purchasing products and could lead to business loss. Hence, neuromarketing, the application of neuroscience in the marketing field, is introduced aiming to understand customers' cognitive functions toward advertisements or products. Our study focused on identifying how the brain respond to different types of advertising image of the same brand were perceived using electroencephalogram (EEG). We performed an experiment using 33 different Coca-Cola advertising images in RSVP (rapid serial visual presentation) task on 23 participants. A seven channels EEG dry headset was used to record the visual event-related potential (ERP), specifically, the positive peak found at 300 to 700 ms after image onset; P300, to compare the perception response. We applied k-means and hierarchical clustering to the obtained EEG data, and achieved the best clustering for three clusters, yielding different P300 amplitudes and latencies. The typical Coca-Cola ads, red color with Cola-cola text on the ads, induced a faster and larger response, implying better perception than the unconventional or black color ads. We conclude that ERP clustering may be a useful tool for neuromarketing. However, the relationship between the EEG-based cluster and the image-based cluster should be further investigated to confirm the suggestion.
品牌的包装和广告会影响消费者购买产品的决策,并可能导致商业损失。因此,引入神经营销学,即神经科学在营销领域的应用,旨在了解消费者对广告或产品的认知功能。我们的研究重点是利用脑电图(EEG)来确定大脑对同一品牌的不同类型广告图像的反应。我们对23名参与者进行了一项实验,在RSVP(快速连续视觉呈现)任务中使用33种不同的可口可乐广告图像。使用7通道EEG干式耳机记录视觉事件相关电位(ERP),其中在成像后300 ~ 700 ms出现阳性峰;P300,来比较感知反应。采用k-means和分层聚类方法对得到的脑电数据进行聚类,得到了3个不同P300振幅和潜伏期的最佳聚类。典型的可口可乐广告,红色的广告上有可口可乐的文字,引起了更快更大的反应,这意味着比非常规的或黑色的广告更好的感知。我们得出结论,ERP聚类可能是一个有用的神经营销工具。然而,基于脑电图的聚类和基于图像的聚类之间的关系还需要进一步研究来证实这一建议。
{"title":"Clustering of advertising images using electroencephalogram","authors":"Ingon Chanpornpakdi, Motoi Noda, Toshihisa Tanaka, Yuval Harpaz, A. Geva","doi":"10.23919/APSIPAASC55919.2022.9980161","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980161","url":null,"abstract":"Packaging and advertisements of brands affect customers' decision-making on purchasing products and could lead to business loss. Hence, neuromarketing, the application of neuroscience in the marketing field, is introduced aiming to understand customers' cognitive functions toward advertisements or products. Our study focused on identifying how the brain respond to different types of advertising image of the same brand were perceived using electroencephalogram (EEG). We performed an experiment using 33 different Coca-Cola advertising images in RSVP (rapid serial visual presentation) task on 23 participants. A seven channels EEG dry headset was used to record the visual event-related potential (ERP), specifically, the positive peak found at 300 to 700 ms after image onset; P300, to compare the perception response. We applied k-means and hierarchical clustering to the obtained EEG data, and achieved the best clustering for three clusters, yielding different P300 amplitudes and latencies. The typical Coca-Cola ads, red color with Cola-cola text on the ads, induced a faster and larger response, implying better perception than the unconventional or black color ads. We conclude that ERP clustering may be a useful tool for neuromarketing. However, the relationship between the EEG-based cluster and the image-based cluster should be further investigated to confirm the suggestion.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132213783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Control of a Muscle-skeleton Robot Elbow based on Reinforcement Learning 基于强化学习的肌肉骨骼机器人肘部设计与控制
Jianyin Fan, Haoran Xu, Yuwei Du, Jing Jin, Qiang Wang
The muscle-skeleton body structure and learning ability allow natural creatures to adapt to the complex environment. These can also make robots more adaptive in human-robot interaction scenarios. In this work, we implement a humanoid muscle-skeleton robot elbow joint actuated by two antagonistic pneumatic artificial muscles (PAMs). A reinforcement learning algorithm based on soft actor-critic (SAC) is adopted to learn the control policy of the proposed elbow joint. Lower action space and hindsight experience replay (HER) further reduce training time, and the temperature factor is fixed during the training process for small steady-state error. An elbow model is implemented in the simulation to verify the training procedure for our real robot elbow platform. The experimental results show that the RL learning procedure can learn control policies in the robot elbow prototype, and the steady-state error is within 0.64% after 1 s of control time.
肌肉骨骼的身体结构和学习能力使自然生物能够适应复杂的环境。这些还可以使机器人在人机交互场景中更具适应性。在这项工作中,我们实现了一个由两个对抗气动人造肌肉(pam)驱动的类人肌肉-骨骼机器人肘关节。采用基于软行为者评价(SAC)的强化学习算法来学习所提出的肘关节的控制策略。更小的动作空间和事后经验回放(HER)进一步缩短了训练时间,并且在训练过程中温度因子是固定的,稳态误差很小。在仿真中实现了一个肘部模型,验证了我们的真实机器人肘部平台的训练过程。实验结果表明,RL学习过程可以在机器人肘部原型中学习控制策略,控制时间1 s后的稳态误差在0.64%以内。
{"title":"Design and Control of a Muscle-skeleton Robot Elbow based on Reinforcement Learning","authors":"Jianyin Fan, Haoran Xu, Yuwei Du, Jing Jin, Qiang Wang","doi":"10.23919/APSIPAASC55919.2022.9980219","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980219","url":null,"abstract":"The muscle-skeleton body structure and learning ability allow natural creatures to adapt to the complex environment. These can also make robots more adaptive in human-robot interaction scenarios. In this work, we implement a humanoid muscle-skeleton robot elbow joint actuated by two antagonistic pneumatic artificial muscles (PAMs). A reinforcement learning algorithm based on soft actor-critic (SAC) is adopted to learn the control policy of the proposed elbow joint. Lower action space and hindsight experience replay (HER) further reduce training time, and the temperature factor is fixed during the training process for small steady-state error. An elbow model is implemented in the simulation to verify the training procedure for our real robot elbow platform. The experimental results show that the RL learning procedure can learn control policies in the robot elbow prototype, and the steady-state error is within 0.64% after 1 s of control time.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134202147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Branch Network for Few-shot Learning 少射学习的多分支网络
Kai Ren, Zijie Guo, Zhimin Zhang, Rui Zhu, Xiaoxu Li
Few-shot learning aims provide precise predictions for unseen data through learning from only one or few labelled samples of each class. However, it often suffers from the overfitting problem because of insufficient training data. In this paper, we propose a novel metric-based few-shot learning method, multi-branch network (MBN), with a new data augmentation module to improve the generalization ability of the model. Specifically, we generate different types of noise contaminated data through multiple branches in the network to simulate the real-world scenarios when noisy images are obtained. Following this novel data augmentation module, the feature embedding and similarities between the support and query samples are learned simultaneously through the embedding and metric modules, respectively. Moreover, to consider more details in the feature maps, we propose to utilize the average-pooling layer in the metric module rather than the commonly adopted max-pooling layer. The network is trained from end to end by the Kullback- Leibler (KL) divergence, to minimize the difference between the distributions of the ground truths and predictions. Extensive experiments on Standford-Dogs, Standford-Cars, CUB-200-2011 and mini-ImageNet in the 1-shot and 5-shot tasks demonstrate the superior classification performance of MBN.
few -shot学习目标通过学习每个类别的一个或几个标记样本,为未见过的数据提供精确的预测。然而,由于训练数据不足,它经常存在过拟合问题。本文提出了一种新的基于度量的小样本学习方法——多分支网络(multi-branch network, MBN),并增加了一个新的数据增强模块来提高模型的泛化能力。具体而言,我们通过网络中的多个分支生成不同类型的噪声污染数据,以模拟获得噪声图像时的真实场景。采用该数据增强模块,通过嵌入和度量模块分别学习支持样本和查询样本的特征嵌入和相似度。此外,为了考虑特征映射中的更多细节,我们建议在度量模块中使用平均池化层,而不是通常采用的最大池化层。该网络通过Kullback- Leibler (KL)散度进行从头到尾的训练,以最小化真实分布与预测之间的差异。在stanford - dogs、stanford - cars、CUB-200-2011和mini-ImageNet的1-shot和5-shot任务上进行的大量实验表明,MBN具有优越的分类性能。
{"title":"Multi-Branch Network for Few-shot Learning","authors":"Kai Ren, Zijie Guo, Zhimin Zhang, Rui Zhu, Xiaoxu Li","doi":"10.23919/APSIPAASC55919.2022.9980160","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980160","url":null,"abstract":"Few-shot learning aims provide precise predictions for unseen data through learning from only one or few labelled samples of each class. However, it often suffers from the overfitting problem because of insufficient training data. In this paper, we propose a novel metric-based few-shot learning method, multi-branch network (MBN), with a new data augmentation module to improve the generalization ability of the model. Specifically, we generate different types of noise contaminated data through multiple branches in the network to simulate the real-world scenarios when noisy images are obtained. Following this novel data augmentation module, the feature embedding and similarities between the support and query samples are learned simultaneously through the embedding and metric modules, respectively. Moreover, to consider more details in the feature maps, we propose to utilize the average-pooling layer in the metric module rather than the commonly adopted max-pooling layer. The network is trained from end to end by the Kullback- Leibler (KL) divergence, to minimize the difference between the distributions of the ground truths and predictions. Extensive experiments on Standford-Dogs, Standford-Cars, CUB-200-2011 and mini-ImageNet in the 1-shot and 5-shot tasks demonstrate the superior classification performance of MBN.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131498888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sound Reproduction with a Circular Loudspeaker Array Using Differential Beamforming Method 采用差分波束形成方法的圆形扬声器阵列的声音再现
Yankai Zhang, Jiayi Mao, Yefeng Cai, C. Ye
This paper proposes an approach to get frequency invariant, symmetric beampattern using a compact circular loudspeaker array. The Jacobi-Anger expansion method is used to approximate the target beampattern. The simulated performance is compared of the same circular loudspeaker array with and without a rigid baffle. The analytical solution of the weight and the simulation results show that the circular loudspeaker array with a rigid baffle can overcome the null problem confronting the array without a rigid baffle. The minimum-norm filter is used to improve the robustness of the system and maintain the frequency-invariant beampattern over the frequency range of interest.
本文提出了一种利用紧凑圆形扬声器阵列获得频率不变对称波束方向图的方法。采用Jacobi-Anger展开法对目标波束进行了近似。对同一圆形扬声器阵列在安装和不安装刚性挡板时的模拟性能进行了比较。重量的解析解和仿真结果表明,带有刚性隔板的圆形扬声器阵列可以克服没有刚性隔板的圆形扬声器阵列所面临的零问题。最小范数滤波器用于提高系统的鲁棒性,并在目标频率范围内保持频率不变的波束方向图。
{"title":"Sound Reproduction with a Circular Loudspeaker Array Using Differential Beamforming Method","authors":"Yankai Zhang, Jiayi Mao, Yefeng Cai, C. Ye","doi":"10.23919/APSIPAASC55919.2022.9980128","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980128","url":null,"abstract":"This paper proposes an approach to get frequency invariant, symmetric beampattern using a compact circular loudspeaker array. The Jacobi-Anger expansion method is used to approximate the target beampattern. The simulated performance is compared of the same circular loudspeaker array with and without a rigid baffle. The analytical solution of the weight and the simulation results show that the circular loudspeaker array with a rigid baffle can overcome the null problem confronting the array without a rigid baffle. The minimum-norm filter is used to improve the robustness of the system and maintain the frequency-invariant beampattern over the frequency range of interest.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"20 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131775067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Multiframe Super-resolution Pipeline for Sub-image-typed Light Field Data 一种用于子图像类型光场数据的多帧超分辨率管道
Chien-Han Hsu, Yi-Hsien Lin, Yen-Po Lin, Yi-Chang Lu
Due to the trade-off between spatial and angular resolutions in light field cameras, the obtained resolutions of synthesized 2D images are often far less than those captured by conventional digital cameras using the same image sensor. This work proposes a complete digital image processing pipeline for hand-held light field cameras to generate high-resolution all-in-focus 2D images. The flow contains refined disparity estimation, digital refocusing, and super-resolution stages in which the characteristics of light fields are considered. We adopt the efficient first-order primal-dual algorithm as our optimization tool. The results show that the proposed approach gives better image quality when compared to other existing super-resolution methods.
由于光场相机在空间分辨率和角度分辨率之间的权衡,合成的二维图像的分辨率往往远低于使用相同图像传感器的传统数码相机所捕获的分辨率。这项工作提出了一个完整的数字图像处理管道,用于手持式光场相机,以生成高分辨率的全焦二维图像。该流程包含精细视差估计,数字重聚焦和超分辨率阶段,其中考虑了光场的特性。我们采用高效的一阶原对偶算法作为优化工具。结果表明,与现有的超分辨率方法相比,该方法具有更好的图像质量。
{"title":"A Multiframe Super-resolution Pipeline for Sub-image-typed Light Field Data","authors":"Chien-Han Hsu, Yi-Hsien Lin, Yen-Po Lin, Yi-Chang Lu","doi":"10.23919/APSIPAASC55919.2022.9980305","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980305","url":null,"abstract":"Due to the trade-off between spatial and angular resolutions in light field cameras, the obtained resolutions of synthesized 2D images are often far less than those captured by conventional digital cameras using the same image sensor. This work proposes a complete digital image processing pipeline for hand-held light field cameras to generate high-resolution all-in-focus 2D images. The flow contains refined disparity estimation, digital refocusing, and super-resolution stages in which the characteristics of light fields are considered. We adopt the efficient first-order primal-dual algorithm as our optimization tool. The results show that the proposed approach gives better image quality when compared to other existing super-resolution methods.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129393268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Table Structure Recognition Based on Grid Shape Graph 基于网格形状图的表结构识别
Eunji Lee, Junhyeong Kwon, Haeyoon Yang, Jaewoo Park, Soonyoung Lee, H. Koo, N. Cho
Since tables in documents provide important information in compact form, table understanding has been an essential topic in document image processing. Researchers represented table structures in various formats for table understanding, such as simple grid structure, a graph with text/cell boxes as nodes, or a sequence of HTML tokens. However, these approaches have difficulties in handling regularities, e.g., global row and column information, and spanning cells simultaneously. In this paper, we propose a new table recognition method based on a grid shape graph and present grid localization and grid elements grouping networks. This approach is designed to exploit the grid structure and deal with spanning cells. To convert grid structure into cell structure, we only have to test adjacent pairs of grid elements, enabling efficient inference. In addition, we have discovered that predicting row/column-based relationships between grid elements improve cell-based connectivity estimation performance. We demonstrate the effectiveness of the proposed method through experiments on three benchmark datasets.
由于文档中的表以紧凑的形式提供重要信息,因此表理解一直是文档图像处理中的一个重要主题。研究人员用不同的格式来表示表结构,以便理解表,比如简单的网格结构,用文本/单元格框作为节点的图形,或者一系列HTML标记。然而,这些方法在处理规则性方面存在困难,例如,全局行和列信息,以及同时跨越单元格。本文提出了一种基于网格形状图的表格识别方法,并提出了网格定位和网格元素分组网络。该方法旨在利用网格结构并处理生成单元。为了将网格结构转换为单元结构,我们只需要测试相邻的网格元素对,从而实现有效的推理。此外,我们发现预测网格元素之间基于行/列的关系可以提高基于单元格的连接估计性能。我们通过三个基准数据集的实验证明了该方法的有效性。
{"title":"Table Structure Recognition Based on Grid Shape Graph","authors":"Eunji Lee, Junhyeong Kwon, Haeyoon Yang, Jaewoo Park, Soonyoung Lee, H. Koo, N. Cho","doi":"10.23919/APSIPAASC55919.2022.9980172","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980172","url":null,"abstract":"Since tables in documents provide important information in compact form, table understanding has been an essential topic in document image processing. Researchers represented table structures in various formats for table understanding, such as simple grid structure, a graph with text/cell boxes as nodes, or a sequence of HTML tokens. However, these approaches have difficulties in handling regularities, e.g., global row and column information, and spanning cells simultaneously. In this paper, we propose a new table recognition method based on a grid shape graph and present grid localization and grid elements grouping networks. This approach is designed to exploit the grid structure and deal with spanning cells. To convert grid structure into cell structure, we only have to test adjacent pairs of grid elements, enabling efficient inference. In addition, we have discovered that predicting row/column-based relationships between grid elements improve cell-based connectivity estimation performance. We demonstrate the effectiveness of the proposed method through experiments on three benchmark datasets.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130872479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1