首页 > 最新文献

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Learning Emotion Information for Expressive Speech Synthesis Using Multi-resolution Modulation-filtered Cochleagram 利用多分辨率调制滤波耳蜗图学习情感信息用于表达性语音合成
Emotion information plays an important role in improving the expressiveness of synthesized speech. At present, researchers mainly use style or emotion encoder to extract emotion information from mel-spectrogram extracted by mel fil-terbank. The mel filterbank does not consider the masking effect in the human auditory system, which results in mel-spectrogram not modeling complete auditory information. The multi-resolution modulation-filtered cochleagram (MMCG) simulates the auditory signal processing mechanism and reflects the function of the human auditory system. It can extract high-level auditory representations and significantly improve the emotion recognition performance. Therefore, we propose extracting emotion information from MMCG rather than mel-spectrogram to improve the expressiveness of synthesized speech. We propose three different kinds of MMCG encoders based on the characteristics of MMCG. Subjective and objective experiments demonstrate that using MMCG as an input feature can not only improve the naturalness and style transfer performance of synthesized speech but also reduce the fundamental frequency error. Our proposed MMCG encoders can extract more complete and rich emotion information from MMCG to further improve the expressiveness of synthesized speech.
情感信息对提高合成语音的表达能力起着重要作用。目前,研究人员主要使用风格或情绪编码器从mel- filterbank提取的mel-谱图中提取情绪信息。由于mel滤波器组没有考虑人类听觉系统的掩蔽效应,导致mel谱图不能模拟完整的听觉信息。多分辨率调制滤波耳蜗图(MMCG)模拟了听觉信号的处理机制,反映了人类听觉系统的功能。它可以提取高水平的听觉表征,显著提高情绪识别性能。因此,我们建议从MMCG中提取情感信息,以提高合成语音的表达能力。根据MMCG的特点,提出了三种不同的MMCG编码器。主观和客观实验表明,使用MMCG作为输入特征不仅可以提高合成语音的自然度和风格迁移性能,还可以减小基频误差。我们提出的MMCG编码器可以从MMCG中提取更完整、更丰富的情感信息,进一步提高合成语音的表现力。
{"title":"Learning Emotion Information for Expressive Speech Synthesis Using Multi-resolution Modulation-filtered Cochleagram","authors":"Kaili Zhang, M. Unoki","doi":"10.23919/APSIPAASC55919.2022.9979810","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979810","url":null,"abstract":"Emotion information plays an important role in improving the expressiveness of synthesized speech. At present, researchers mainly use style or emotion encoder to extract emotion information from mel-spectrogram extracted by mel fil-terbank. The mel filterbank does not consider the masking effect in the human auditory system, which results in mel-spectrogram not modeling complete auditory information. The multi-resolution modulation-filtered cochleagram (MMCG) simulates the auditory signal processing mechanism and reflects the function of the human auditory system. It can extract high-level auditory representations and significantly improve the emotion recognition performance. Therefore, we propose extracting emotion information from MMCG rather than mel-spectrogram to improve the expressiveness of synthesized speech. We propose three different kinds of MMCG encoders based on the characteristics of MMCG. Subjective and objective experiments demonstrate that using MMCG as an input feature can not only improve the naturalness and style transfer performance of synthesized speech but also reduce the fundamental frequency error. Our proposed MMCG encoders can extract more complete and rich emotion information from MMCG to further improve the expressiveness of synthesized speech.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114684450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BEAM - An Algorithm for Detecting Phishing Link 一种检测网络钓鱼链接的算法
This paper aims to develop an attention-based phishing detector by performing sub-word tokenization and fme-tuning the Bidirectional Encoder Representation from Transformers (BERT) model. It is called BERT embedding attention model (BEAM). Our proposed BEAM method contains five building blocks: a data pre-processing block to extract components according to the URL structure, a tokenization block to tokenize the individual URL components into a number of sub-words, an embedding block to produce a numerical sequence representation, an encoding block to give a context feature vector and a classification block for phishing URL detection. The subword tokenization allows us to characterize the relationship among connecting subwords, while the attention mechanism in the BERT allows the proposed model to focus selectively on important parts contributing to phishing behavior. We have compared our proposed BEAM method with other existing state-of-the-art phishing detection methods such as CNN, Bi-LSTM, and machine learning models (random forest and XGBoost). Experimental results confirm that our proposed BEAM method effectively detects phishing links and outperforms other existing methods.
本文的目的是开发一个基于注意的网络钓鱼检测器,通过执行子词标记和自调整双向编码器表示从变压器(BERT)模型。它被称为BERT嵌入注意模型(BEAM)。我们提出的BEAM方法包含五个构建块:根据URL结构提取组件的数据预处理块,将单个URL组件标记为多个子词的标记化块,生成数字序列表示的嵌入块,给出上下文特征向量的编码块和用于网络钓鱼URL检测的分类块。子词标记化允许我们描述连接子词之间的关系,而BERT中的注意机制允许所提出的模型选择性地关注导致网络钓鱼行为的重要部分。我们将我们提出的BEAM方法与其他现有的最先进的网络钓鱼检测方法(如CNN、Bi-LSTM和机器学习模型(随机森林和XGBoost))进行了比较。实验结果表明,本文提出的BEAM方法能够有效地检测网络钓鱼链接,并且优于现有的其他方法。
{"title":"BEAM - An Algorithm for Detecting Phishing Link","authors":"Sea Ran Cleon Liew, N. F. Law","doi":"10.23919/APSIPAASC55919.2022.9979860","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979860","url":null,"abstract":"This paper aims to develop an attention-based phishing detector by performing sub-word tokenization and fme-tuning the Bidirectional Encoder Representation from Transformers (BERT) model. It is called BERT embedding attention model (BEAM). Our proposed BEAM method contains five building blocks: a data pre-processing block to extract components according to the URL structure, a tokenization block to tokenize the individual URL components into a number of sub-words, an embedding block to produce a numerical sequence representation, an encoding block to give a context feature vector and a classification block for phishing URL detection. The subword tokenization allows us to characterize the relationship among connecting subwords, while the attention mechanism in the BERT allows the proposed model to focus selectively on important parts contributing to phishing behavior. We have compared our proposed BEAM method with other existing state-of-the-art phishing detection methods such as CNN, Bi-LSTM, and machine learning models (random forest and XGBoost). Experimental results confirm that our proposed BEAM method effectively detects phishing links and outperforms other existing methods.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Two-stage Cascading Method Based on Finetuning in Semi-supervised Domain Adaptation Semantic Segmentation 半监督域自适应语义分割中基于微调的两阶段级联方法
The traditional unsupervised domain adaptation (UDA) has achieved great success in many computer vision tasks, especially semantic segmentation, which requires high cost of pixel-wise annotations. However, the final performance of UDA method is still far behind that of supervised learning due to the lack of annotations. Researchers introduce the semi-supervised learning (SSL) and propose a more practical setting, semi-supervised domain adaptation (SSDA), that is, having labeled source domain data and a small number of labeled target domain data. To address the inter-domain gap, current researches focus on domain alignment by mixing annotated data from two domains, but we argue that adapting the target domain data distribution through model transfer is a better solution. In this paper, we propose a two-stage SSDA framework based on this assumption. Firstly, we adapt the model from the source domain to the labeled dataset in the target domain. To verify the assumption, we choose a basic transfer mode: finetuning. Then, to align the labeled subspace and the unlabeled subspace of the target domain, we choose teacher-student model with class-level data augmentation as the basis to realize online self-training. We also provide a deformation to solve overfitting on the target domain with a small number of annotated data. Extensive experiments on two synthetic-to-real benchmarks have demonstrated the correctness of our idea and the effectiveness of our method. In most SSDA scenarios, our approach can achieve supervised performance or even better.
传统的无监督域自适应(UDA)在许多计算机视觉任务中取得了巨大的成功,特别是语义分割,这需要高成本的像素级标注。然而,由于缺乏标注,UDA方法的最终性能仍然远远落后于监督学习。研究人员引入了半监督学习(SSL),并提出了一种更实用的设置,即半监督域自适应(SSDA),即具有标记的源域数据和少量标记的目标域数据。为了解决领域间的差距,目前的研究主要集中在通过混合来自两个领域的注释数据来进行领域对齐,但我们认为通过模型转移来适应目标领域的数据分布是更好的解决方案。在本文中,我们基于这一假设提出了一个两阶段的SSDA框架。首先,我们将模型从源域调整到目标域的标记数据集。为了验证这个假设,我们选择了一种基本的传递模式:微调。然后,为了对齐目标域的标记子空间和未标记子空间,我们选择班级级数据增强的师生模型作为基础,实现在线自训练。我们还提供了一种变形来解决目标域上少量注释数据的过拟合问题。在两个合成到实际的基准上进行的大量实验证明了我们的思想的正确性和我们的方法的有效性。在大多数SSDA场景中,我们的方法可以实现监督性能甚至更好。
{"title":"A Two-stage Cascading Method Based on Finetuning in Semi-supervised Domain Adaptation Semantic Segmentation","authors":"Huiying Chang, Kaixin Chen, Ming Wu","doi":"10.23919/APSIPAASC55919.2022.9980206","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980206","url":null,"abstract":"The traditional unsupervised domain adaptation (UDA) has achieved great success in many computer vision tasks, especially semantic segmentation, which requires high cost of pixel-wise annotations. However, the final performance of UDA method is still far behind that of supervised learning due to the lack of annotations. Researchers introduce the semi-supervised learning (SSL) and propose a more practical setting, semi-supervised domain adaptation (SSDA), that is, having labeled source domain data and a small number of labeled target domain data. To address the inter-domain gap, current researches focus on domain alignment by mixing annotated data from two domains, but we argue that adapting the target domain data distribution through model transfer is a better solution. In this paper, we propose a two-stage SSDA framework based on this assumption. Firstly, we adapt the model from the source domain to the labeled dataset in the target domain. To verify the assumption, we choose a basic transfer mode: finetuning. Then, to align the labeled subspace and the unlabeled subspace of the target domain, we choose teacher-student model with class-level data augmentation as the basis to realize online self-training. We also provide a deformation to solve overfitting on the target domain with a small number of annotated data. Extensive experiments on two synthetic-to-real benchmarks have demonstrated the correctness of our idea and the effectiveness of our method. In most SSDA scenarios, our approach can achieve supervised performance or even better.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127523779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Light CNN with Split Batch Normalization for Spoofed Speech Detection Using Data Augmentation 基于分割批处理归一化的轻型CNN数据增强欺骗语音检测
The vulnerability of automatic speaker verification (ASV) is exposed to the threat of rapidly developing speech synthesis and voice conversion techniques. Developing anti-spoofing systems is an urgent need. This paper proposes a novel spoofed speech detection model for better utilizing the augmented data at the training stage. This model adopts a light convolutional neural network (LCNN) with the split batch normalization (SBN) structure to alleviate the issue of data pollution caused by data augmentation. The pre-trained wav2vec 2.0 model is used to extract features from input speech waveforms. Three data augmentation strategies, including audio compression, mixup and channel simulation, are compared in our experiments. Experimental results demonstrate that our proposed method achieves the state-of-the-art equal error rate (ERR) of 0.258% on the ASVspoof2019 LA task. Further analysis also confirms the effectiveness of the pre-trained model for feature extraction, the data augmentation strategies, and our proposed SBNLCNN model on improving the performance of spoofed speech detection.
自动说话人验证(ASV)的脆弱性受到快速发展的语音合成和语音转换技术的威胁。开发反欺骗系统是迫切需要的。为了更好地利用训练阶段的增强数据,本文提出了一种新的欺骗语音检测模型。该模型采用轻型卷积神经网络(LCNN)和拆分批归一化(SBN)结构,缓解了数据扩充带来的数据污染问题。使用预训练的wav2vec 2.0模型从输入语音波形中提取特征。实验比较了音频压缩、混频和信道仿真三种数据增强策略。实验结果表明,该方法在asvspof2019 LA任务上实现了0.258%的等错误率(ERR)。进一步的分析还证实了预训练模型在特征提取、数据增强策略和我们提出的SBNLCNN模型方面的有效性,以提高欺骗语音检测的性能。
{"title":"A Light CNN with Split Batch Normalization for Spoofed Speech Detection Using Data Augmentation","authors":"Haojian Lin, Yang Ai, Zhenhua Ling","doi":"10.23919/APSIPAASC55919.2022.9980260","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980260","url":null,"abstract":"The vulnerability of automatic speaker verification (ASV) is exposed to the threat of rapidly developing speech synthesis and voice conversion techniques. Developing anti-spoofing systems is an urgent need. This paper proposes a novel spoofed speech detection model for better utilizing the augmented data at the training stage. This model adopts a light convolutional neural network (LCNN) with the split batch normalization (SBN) structure to alleviate the issue of data pollution caused by data augmentation. The pre-trained wav2vec 2.0 model is used to extract features from input speech waveforms. Three data augmentation strategies, including audio compression, mixup and channel simulation, are compared in our experiments. Experimental results demonstrate that our proposed method achieves the state-of-the-art equal error rate (ERR) of 0.258% on the ASVspoof2019 LA task. Further analysis also confirms the effectiveness of the pre-trained model for feature extraction, the data augmentation strategies, and our proposed SBNLCNN model on improving the performance of spoofed speech detection.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123642923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Sorting and Padding Multiple Targets for Sound Event Localization and Detection with Permutation Invariant and Location-based Training 基于排列不变量和位置训练的声音事件定位与检测的多目标排序与填充
We explore the performance of permutation invariant and location-based training (PIT and LBT, respectively) for sound event localization and detection (SELD). Due to being intrinsically a multi-output multi-class and multi-task problem, the design space of loss functions for SELD is large, and, as of yet, rather unexplored. Our study revolves around the multiple activity coupled direction of arrival target format which cleverly combines direction and event probability into a single mean squared error loss. While PIT, and its variant auxiliary duplicating PIT (ADPIT), have been prominently featured in recent DCASE challenges, LBT has not yet been applied to SELD. In this work, we investigate some modifications to PIT and ADPIT, as well as the application of LBT to SELD. First, the PIT loss is changed to have a variable number of tracks per event class, providing extra flexibility. Second, we propose auxiliary duplicating or silence PIT (ADPIT-S), where unused tracks are indifferently filled with a duplicate event, or nothing. Finally, we propose to use LBT with ordering of the events by Cartesian or polar coordinates. We give two ways of padding the unused tracks, with zeros or by repeating the last event. We conduct experiments using the STARSS22 dataset from the DCASE Challenge 2022. We find that ordering by Cartesian coordinates with repeat padding is best for LBT. When comparing all loss functions, we suprisingly found that PIT performed the best. In addition, LBT turned out to be competitive with PIT and ADPIT. While ADPIT-S had slightly worse overall performance, it did better in terms of error rate and F-score metrics.
我们探讨了排列不变量和基于位置的训练(分别为PIT和LBT)在声音事件定位和检测(SELD)中的性能。由于SELD本质上是一个多输出、多类、多任务的问题,因此其损失函数的设计空间很大,迄今为止尚未得到充分的研究。我们的研究围绕着多活动耦合的到达目标方向格式,巧妙地将方向和事件概率结合成单一的均方误差损失。虽然PIT及其变体辅助复制PIT (ADPIT)在最近的DCASE挑战中得到了突出的应用,但LBT尚未应用于SELD。在这项工作中,我们研究了对PIT和ADPIT的一些修改,以及LBT在SELD中的应用。首先,将PIT损失更改为每个事件类具有可变数量的音轨,从而提供额外的灵活性。其次,我们建议辅助复制或沉默PIT (ADPIT-S),其中未使用的轨道被重复事件漠不关心地填充,或者什么都不填充。最后,我们提出用笛卡尔坐标或极坐标对事件进行排序。我们给出了两种填充未使用轨道的方法,用零填充或重复最后一个事件。我们使用来自DCASE挑战2022的STARSS22数据集进行实验。我们发现用重复填充的笛卡尔坐标排序对于LBT是最好的。当比较所有的损失函数时,我们惊奇地发现PIT表现最好。此外,LBT被证明与PIT和ADPIT具有竞争力。虽然ADPIT-S的整体表现稍差,但在错误率和F-score指标方面表现较好。
{"title":"On Sorting and Padding Multiple Targets for Sound Event Localization and Detection with Permutation Invariant and Location-based Training","authors":"Robin Scheibler, Tatsuya Komatsu, Yusuke Fujita, Michael Hentschel","doi":"10.23919/APSIPAASC55919.2022.9979815","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979815","url":null,"abstract":"We explore the performance of permutation invariant and location-based training (PIT and LBT, respectively) for sound event localization and detection (SELD). Due to being intrinsically a multi-output multi-class and multi-task problem, the design space of loss functions for SELD is large, and, as of yet, rather unexplored. Our study revolves around the multiple activity coupled direction of arrival target format which cleverly combines direction and event probability into a single mean squared error loss. While PIT, and its variant auxiliary duplicating PIT (ADPIT), have been prominently featured in recent DCASE challenges, LBT has not yet been applied to SELD. In this work, we investigate some modifications to PIT and ADPIT, as well as the application of LBT to SELD. First, the PIT loss is changed to have a variable number of tracks per event class, providing extra flexibility. Second, we propose auxiliary duplicating or silence PIT (ADPIT-S), where unused tracks are indifferently filled with a duplicate event, or nothing. Finally, we propose to use LBT with ordering of the events by Cartesian or polar coordinates. We give two ways of padding the unused tracks, with zeros or by repeating the last event. We conduct experiments using the STARSS22 dataset from the DCASE Challenge 2022. We find that ordering by Cartesian coordinates with repeat padding is best for LBT. When comparing all loss functions, we suprisingly found that PIT performed the best. In addition, LBT turned out to be competitive with PIT and ADPIT. While ADPIT-S had slightly worse overall performance, it did better in terms of error rate and F-score metrics.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121532515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-Modal Knowledge Distillation with Dropout-Based Confidence 基于dropout置信度的跨模态知识蒸馏
In cross-modal distillation, e.g., from text-based inference modules to spoken language understanding module, it is difficult to determine the teacher's influence due to the different nature of both modalities that bring the heterogeneity in the aspect of uncertainty. Though error rate or entropy-based schemes have been suggested to cope with the heuristics of time-based scheduling, the confidence of the teacher inference has not been necessarily taken into deciding the teacher's influence. In this paper, we propose a dropout-based confidence that decides the teacher's confidence and to-student influence of the loss. On the widely used spoken language understanding dataset, Fluent Speech Command, we show that our weight decision scheme enhances performance in combination with the conventional scheduling strategies, displaying a maximum 20% relative error reduction concerning the model with no distillation.
在跨模态提炼中,例如从基于文本的推理模块到口语理解模块,由于两种模态的性质不同,在不确定性方面存在异质性,因此很难确定教师的影响。虽然错误率或基于熵的方案已被建议用于处理基于时间的调度的启发式,但教师推理的置信度并没有必要被考虑到决定教师的影响。在本文中,我们提出了一个基于辍学的置信度来决定教师的置信度和损失对学生的影响。在广泛使用的口语理解数据集Fluent Speech Command上,我们证明了我们的权重决策方案与传统调度策略相结合提高了性能,在没有蒸馏的情况下,模型的相对误差最大减少了20%。
{"title":"Cross-Modal Knowledge Distillation with Dropout-Based Confidence","authors":"Won Ik Cho, Jeunghun Kim, N. Kim","doi":"10.23919/APSIPAASC55919.2022.9980213","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980213","url":null,"abstract":"In cross-modal distillation, e.g., from text-based inference modules to spoken language understanding module, it is difficult to determine the teacher's influence due to the different nature of both modalities that bring the heterogeneity in the aspect of uncertainty. Though error rate or entropy-based schemes have been suggested to cope with the heuristics of time-based scheduling, the confidence of the teacher inference has not been necessarily taken into deciding the teacher's influence. In this paper, we propose a dropout-based confidence that decides the teacher's confidence and to-student influence of the loss. On the widely used spoken language understanding dataset, Fluent Speech Command, we show that our weight decision scheme enhances performance in combination with the conventional scheduling strategies, displaying a maximum 20% relative error reduction concerning the model with no distillation.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123749479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Evolving and Embedding in Transformer 变压器中的图演化与嵌入
This paper presents a novel graph representation which tightly integrates the information sources of node embed-ding matrix and weight matrix in a graph learning representation. A new parameter updating method is proposed to dynamically represent the graph network by using a specialized transformer. This graph evolved and embedded transformer is built by using the weights and node embeddings from graph structural data. The attention-based graph learning machine is implemented. Using the proposed method, each transformer layer is composed of two attention layers. The first layer is designed to calculate the weight matrix in graph convolutional network, and also the self attention within the matrix itself. The second layer is used to estimate the node embedding and weight matrix, and also the cross attention between them. Graph learning representation is enhanced by using these two attention layers. Experiments on three financial prediction tasks demonstrate that this transformer captures the temporal information and improves the Fl score and the mean reciprocal rank.
本文提出了一种将节点嵌入矩阵和权矩阵的信息源紧密结合在一起的图学习表示方法。提出了一种新的参数更新方法,利用专用的变压器动态表示图网络。利用图结构数据中的权值和节点嵌入,构建图演化和嵌入变压器。实现了基于注意力的图学习机。采用该方法,每个变压器层由两个关注层组成。第一层设计用于计算图卷积网络中的权矩阵,以及矩阵本身的自关注。第二层用于估计节点嵌入和权重矩阵,以及它们之间的交叉关注。通过使用这两个注意层,可以增强图学习的表示。在三个财务预测任务上的实验表明,该变压器捕获了时间信息,提高了Fl分数和平均倒数秩。
{"title":"Graph Evolving and Embedding in Transformer","authors":"Jen-Tzung Chien, Chia-Wei Tsao","doi":"10.23919/APSIPAASC55919.2022.9979949","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979949","url":null,"abstract":"This paper presents a novel graph representation which tightly integrates the information sources of node embed-ding matrix and weight matrix in a graph learning representation. A new parameter updating method is proposed to dynamically represent the graph network by using a specialized transformer. This graph evolved and embedded transformer is built by using the weights and node embeddings from graph structural data. The attention-based graph learning machine is implemented. Using the proposed method, each transformer layer is composed of two attention layers. The first layer is designed to calculate the weight matrix in graph convolutional network, and also the self attention within the matrix itself. The second layer is used to estimate the node embedding and weight matrix, and also the cross attention between them. Graph learning representation is enhanced by using these two attention layers. Experiments on three financial prediction tasks demonstrate that this transformer captures the temporal information and improves the Fl score and the mean reciprocal rank.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123770577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design and system implementation of a configurable optical interconnection network 一种可配置光互联网络的设计与系统实现
With the development of silicon photonics and wavelength division multiplexing, the advantages of on-chip optical interconnection, such as low loss, low delay and high bandwidth, can make up for the disadvantages of electrical interconnection. However, with the increase of network scale and complexity, a series of problems, such as communication congestion, low utilization rate of microring resonator and increase of insertion loss, appear in optical interconnection network. The traditional optical interconnection network structure is relatively fixed and cannot meet the needs of reconfigurable array processors. Therefore, this paper designs a configurable, non-blocking, scalable, low loss optical interconnection network structure ReLONEONoC. Depending on the array size, electrical interconnection is used within clusters, and optical communication is used for mass data transmission between clusters. Finally, the simulation and verification model of optical link is built by Waveshaper 500A/SP configurable optical device, and the coupling screening effect of microring resonator is simulated to verify the functional correctness of optical link. The prototype system of ReLONEONoC was designed by combining Waveshaper and $mathbf{UltraScale} +mathbf{VU}mathbf{440}$ development platform. Statistical results show that optical communication between clusters improves both delay and loss.
随着硅光子学和波分复用技术的发展,片上光互连的低损耗、低延迟和高带宽等优点可以弥补电互连的缺点。然而,随着网络规模和复杂性的增加,光互联网络中出现了通信拥塞、微环谐振器利用率低、插入损耗增加等一系列问题。传统的光互联网络结构相对固定,不能满足可重构阵列处理器的需求。为此,本文设计了一种可配置、无阻塞、可扩展、低损耗的光互联网络结构ReLONEONoC。根据阵列的大小,集群内部使用电气互连,集群之间使用光通信进行大量数据传输。最后,利用Waveshaper 500A/SP可配置光器件建立光链路仿真验证模型,并对微环谐振器的耦合筛选效果进行仿真,验证光链路功能的正确性。结合Waveshaper和$mathbf{UltraScale} +mathbf{VU}mathbf{440}$开发平台设计了ReLONEONoC的原型系统。统计结果表明,集群间的光通信既提高了时延,又降低了损耗。
{"title":"Design and system implementation of a configurable optical interconnection network","authors":"Bowen Yang, Junyong Deng, Jiaying Luo, Yu Feng","doi":"10.23919/APSIPAASC55919.2022.9979816","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979816","url":null,"abstract":"With the development of silicon photonics and wavelength division multiplexing, the advantages of on-chip optical interconnection, such as low loss, low delay and high bandwidth, can make up for the disadvantages of electrical interconnection. However, with the increase of network scale and complexity, a series of problems, such as communication congestion, low utilization rate of microring resonator and increase of insertion loss, appear in optical interconnection network. The traditional optical interconnection network structure is relatively fixed and cannot meet the needs of reconfigurable array processors. Therefore, this paper designs a configurable, non-blocking, scalable, low loss optical interconnection network structure ReLONEONoC. Depending on the array size, electrical interconnection is used within clusters, and optical communication is used for mass data transmission between clusters. Finally, the simulation and verification model of optical link is built by Waveshaper 500A/SP configurable optical device, and the coupling screening effect of microring resonator is simulated to verify the functional correctness of optical link. The prototype system of ReLONEONoC was designed by combining Waveshaper and $mathbf{UltraScale} +mathbf{VU}mathbf{440}$ development platform. Statistical results show that optical communication between clusters improves both delay and loss.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126255490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Parallel Voice Conversion Based on Free-Energy Minimization of Speaker-Conditional Restricted Boltzmann Machine 基于演讲者-条件受限玻尔兹曼机自由能量最小化的非并行语音转换
In this paper, we propose a non-parallel voice conversion method based on the minimization of the free energy of a restricted Boltzmann machine (RBM). The proposed method uses an RBM that learns the generative probability of acoustic features conditioned on a target speaker, and it iteratively updates the input acoustic features until their free energy reaches a local minimum to obtain converted features. Since it is based on the RBM, only a few hyperparameters need to be set, and the number of training parameters is very small. Therefore, training is stable. In determining the step size of the update formula in accordance with the Newton-Raphson method to obtain the feature that gives the local minimum of the free energy, we found that the Hesse matrix of the free energy can be approximated by a diagonal matrix, and the update can be performed efficiently with a small amount of calculation. In objective evaluation experiments, the proposed method outperforms StarGAN-VC in Mel-cepstral distortions. In subjective evaluation experiments, the performance of the proposed method is comparable to that of StarGAN-VC in similarity MOS.
本文提出了一种基于受限玻尔兹曼机(RBM)自由能最小化的非并行语音转换方法。该方法使用RBM学习目标说话者条件声特征的生成概率,并迭代更新输入声特征,直到其自由能达到局部最小值以获得转换后的特征。由于它是基于RBM的,所以只需要设置很少的超参数,训练参数的数量非常少。因此,训练是稳定的。在根据Newton-Raphson方法确定更新公式的步长以获得给出自由能局部最小值的特征时,我们发现自由能的Hesse矩阵可以用对角矩阵近似,并且可以用少量的计算高效地进行更新。在客观评价实验中,该方法在mel -倒谱失真方面优于StarGAN-VC。在主观评价实验中,该方法的性能与相似MOS中的StarGAN-VC相当。
{"title":"Non-Parallel Voice Conversion Based on Free-Energy Minimization of Speaker-Conditional Restricted Boltzmann Machine","authors":"Takuya Kishida, Toru Nakashika","doi":"10.23919/APSIPAASC55919.2022.9980151","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980151","url":null,"abstract":"In this paper, we propose a non-parallel voice conversion method based on the minimization of the free energy of a restricted Boltzmann machine (RBM). The proposed method uses an RBM that learns the generative probability of acoustic features conditioned on a target speaker, and it iteratively updates the input acoustic features until their free energy reaches a local minimum to obtain converted features. Since it is based on the RBM, only a few hyperparameters need to be set, and the number of training parameters is very small. Therefore, training is stable. In determining the step size of the update formula in accordance with the Newton-Raphson method to obtain the feature that gives the local minimum of the free energy, we found that the Hesse matrix of the free energy can be approximated by a diagonal matrix, and the update can be performed efficiently with a small amount of calculation. In objective evaluation experiments, the proposed method outperforms StarGAN-VC in Mel-cepstral distortions. In subjective evaluation experiments, the performance of the proposed method is comparable to that of StarGAN-VC in similarity MOS.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"C-31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126486092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Tracking of Indoor Human Targets Based on Millimeter Wave Radar 基于毫米波雷达的室内人体目标连续跟踪
The effect of target tracking based on millime-ter wave radar is susceptible to multi path effect and target crossover. Most existing methods are unsatisfactory in high-noise, complex environments. In contrast, we propose a method covering target positioning, tracking, and track re-association by using top-mounted millimeter-wave radar, achieving stable and accurate counting and tracking of multiple targets. First, a polar-coordinate- based tracking is performed using an extended Kalman filter with linear regression correction. Then, a density-based classification algorithm with group signal-to-noise ratio analysis is performed to remove ghost targets. In terms of the track fracture problem caused by the target intersection, we propose to use the Hankel matrix to solve this situation. Our experiments prove the robustness of the proposed method, which not only has a high tracking precision within O.lm but also successfully handles most target crossover situations considered. At the same time, in the cases within six people, the ratio between the number of frames in which personnel counting error is less than or equal to 1 and the total number of frames is more than 95%.
基于毫米波雷达的目标跟踪效果容易受到多径效应和目标交叉的影响。大多数现有的方法在高噪声、复杂的环境中都不能令人满意。本文提出了一种利用顶置毫米波雷达覆盖目标定位、跟踪和航迹重关联的方法,实现了对多目标的稳定精确计数和跟踪。首先,利用扩展卡尔曼滤波进行线性回归校正,实现了基于极坐标的跟踪。然后,采用基于密度的分类算法,结合群信噪比分析去除鬼影目标。对于目标交叉口导致的轨道断裂问题,我们提出使用汉克尔矩阵来解决这一问题。实验证明了该方法的鲁棒性,不仅在o.m内具有较高的跟踪精度,而且能够成功处理所考虑的大多数目标交叉情况。同时,在6人以内的情况下,人员计数错误的帧数小于等于1的帧数与总帧数的比值大于95%。
{"title":"Continuous Tracking of Indoor Human Targets Based on Millimeter Wave Radar","authors":"Meiqiu Jiang, Shisheng Guo, Haolan Luo, G. Cui","doi":"10.23919/APSIPAASC55919.2022.9979904","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979904","url":null,"abstract":"The effect of target tracking based on millime-ter wave radar is susceptible to multi path effect and target crossover. Most existing methods are unsatisfactory in high-noise, complex environments. In contrast, we propose a method covering target positioning, tracking, and track re-association by using top-mounted millimeter-wave radar, achieving stable and accurate counting and tracking of multiple targets. First, a polar-coordinate- based tracking is performed using an extended Kalman filter with linear regression correction. Then, a density-based classification algorithm with group signal-to-noise ratio analysis is performed to remove ghost targets. In terms of the track fracture problem caused by the target intersection, we propose to use the Hankel matrix to solve this situation. Our experiments prove the robustness of the proposed method, which not only has a high tracking precision within O.lm but also successfully handles most target crossover situations considered. At the same time, in the cases within six people, the ratio between the number of frames in which personnel counting error is less than or equal to 1 and the total number of frames is more than 95%.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125515749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1